Can COUNT(*) be used in MySQL on InnoDB tables?

COUNT() function returns a number of rows returned by a query. In a popular opinion COUNT(*) should not be used on InnoDB tables, but this is only half true.

If a query performs filtering on any column, there is no relevant difference in how COUNT(*) will be executed regardless of the storage engine. In any such case MySQL has to look for matching rows and then count them.

In the following queries COUNT(*) can be used without any negative impact on performance:

SELECT COUNT(*) FROM mytable WHERE id = 12345
SELECT COUNT(*) FROM mytable WHERE is_enabled = 1
SELECT COUNT(*) FROM mytable WHERE username LIKE 'a%' AND is_enabled = 1

The real difference is when no filter is specified in WHERE clause, i.e. when query counts all rows in a table. MyISAM maintains cached row count for each table, so it can always return the value instantly without even accessing any rows. InnoDB, on the other hand, cannot easily keep such statistics as it enables customizable views on data and changes to different transactions. It means the engine has to read and count all rows in a table every time such query is executed.

Example query where COUNT(*) works slower on an InnoDB table:

SELECT COUNT(*) FROM mytable;

Can COUNT(*) be used on InnoDB tables? Yes, it can. However it should be avoided for counting how many rows a table has, especially if it is expected to perform as good as on MyISAM tables.

[MySQL Health Check]
About Thomas

LAMP consultant with 12+ years of experience in online media industry. Enthusiast of modern web technologies, networking and databases.

Comments

  1. You mention that count(*) on a table with no where should not be used to count the rows of a table, what is a good way to count the rows in an innodb table then?

    • KLD,

      Of course, you may still use SELECT COUNT(*) to count rows in a table on an ad hoc basis. For performance reasons you should not do it from an application that may be called a lot (e.g. from a publicly available web page).

      Row count also matters. Counting a few hundred rows should not do any harm, even if done very frequently, but doing that on thousands or many more definitely will.

      Where SELECT COUNT(*) FROM table cannot be used, you typically need to rely on caching the value. Keep the row count for a table stored elsewhere, e.g. in Memcache, or even in another MySQL table. The cached value may be updated synchronously (as rows are added or removed) or asynchronously (in a cron job from time to time), depending on how accurate you need the count to be.

  2. @Maciej: if you need to count all the rows in an InnoDB table just do a count on the PRIMARY KEY table, like so: COUNT(id) FROM table; where id is the PK column. That way it’s hitting the index.

    • Matt: right, but for 50M rows it still means quite a big scan and potential performance impact. No such problem with MyISAM and its internal counter.

    • Matt,

      Actually counting by the primary key column is no different than counting all rows without any index, because InnoDB holds row contents together with the primary key, i.e. reading a primary key value implies reading the full row. It is why the optimizer often chooses a short secondary index to execute such COUNT(). But even so, it does not really address the problem that all these rows have to be accessed during execution.

  3. Kathy Mazur Worden says:

    For innodb tables with millions of rows I’ve found the table status’ ‘rows’ value quite helpful. It’s been within a reasonable amount of the real number of rows for my needs.

    There’s also INFORMATION_SCHEMA.TABLES for an estimate:
    select * from INFORMATION_SCHEMA.TABLES where table_Name=’table’ and table_schema=’database’\G

    • Kathy,

      It is possible to do it this way, but I’d call it risky to pull that information if you want to show it to users. Not necessarily because it may be very inaccurate, but rather because it may not be consistent. Because it’s heavily approximated, subsequent reads may return values below or over the real row count quite randomly.

      I should also mention that reading from this table too often, may also hurt performance. For one, the information_schema tables weren’t designed for high performance. But even more importantly, unless innodb_stats_on_metadata is disabled, every such read triggers table statistics recalculation, which may end up badly given sufficient load and concurrency.

Speak Your Mind

*