Can MySQL use primary key values from a secondary index?

In the article about the role of a primary key, I mentioned that a secondary index in an InnoDB table consists not only of the values of its member columns, but also values of the table’s primary key are concatenated to the index. I.e. the primary key contents is part of every other index.

Assuming the following table structure:

CREATE TABLE `bets` (
  `id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `game_id` int(10) unsigned NOT NULL,
...
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB

Here is the visualization:

If MySQL could use in queries these implicitly added values, it would maybe allow to save some space on listing the primary key columns at the end of an index explicitly. Let’s check various cases.

Row filtering
mysql> EXPLAIN
    -> SELECT *
    -> FROM   bets
    ->        JOIN games
    ->        ON     games.id = bets.id
    -> WHERE  bets.user_id = 111
    ->        AND bets.id > 3476\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bets
         type: ref
possible_keys: user_id
          key: user_id
      key_len: 4
          ref: const
         rows: 22
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: games
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: game.bets.id
         rows: 1
        Extra: 

Both key_len and ref fields indicate that only one four bytes long column is used from the user_id index. MySQL cannot use the primary key values in a secondary index for filtering in WHERE clause.

Sorting with ORDER BY
mysql> EXPLAIN
    -> SELECT   *
    -> FROM     bets
    ->          JOIN games
    ->          ON       games.id = bets.game_id
    -> WHERE    bets.user_id = 111
    -> ORDER BY bets.id DESC\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bets
         type: ref
possible_keys: user_id
          key: user_id
      key_len: 4
          ref: const
         rows: 22
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: games
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: game.bets.game_id
         rows: 1
        Extra: 

Extra only returns Using where, but there is no Using filesort. It means ORDER BY will be optimized using the hidden primary key data from the secondary index.

Aggregating with GROUP BY
mysql> EXPLAIN
    -> SELECT   *
    -> FROM     bets
    ->          JOIN games
    ->          ON       games.id = bets.game_id
    -> WHERE    bets.user_id = 111
    -> GROUP BY bets.id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bets
         type: ref
possible_keys: user_id
          key: user_id
      key_len: 4
          ref: const
         rows: 22
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: games
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: game.bets.game_id
         rows: 1
        Extra: 

Also in this case Extra neither shows Using filesort nor Using temporary, which would indicate no index is used for grouping. Therefore MySQL can optimize GROUP BY on the concatenated primary key values.

Covering index
mysql> EXPLAIN
    -> SELECT bets.id
    -> FROM   bets
    -> WHERE  bets.user_id = 111\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bets
         type: ref
possible_keys: user_id
          key: user_id
      key_len: 4
          ref: const
         rows: 22
        Extra: Using index

The query execution plan confirms through Using index that it will only need index contents to return result. MySQL can read and return the hidden primary key values to avoid the additional data lookup.

Summary

In InnoDB tables each entry of a secondary index always contains a copy of the corresponding primary key value. These values may in some cases be used to the benefit of query execution plan:

  • for ORDER BY on the primary key column(s)
  • for GROUP BY on the primary key column(s)
  • when returning the primary key column(s) values in the SELECT list

MySQL cannot use them, however, to optimize filtering in WHERE.

[MySQL Health Check]
About Maciej Dobrzanski

A MySQL consultant with the primary focus on systems, databases and application stacks performance and scalability. Expert on open source technologies such as Linux, BSD, Apache, nginx, MySQL, and many more. @linkedin

Comments

  1. Jacky Shu says:

    Hi Maciej,

    If your data set is small enough to fit into memory, should it still show “Using filesort”?

    • Jacky,

      Filesort is a bit misleading name for it as in reality it may or may not involve the actual files for the sorting operation. It generally means that MySQL has to perform the extra step to sort, through a dedicated algorithm, rows that were earlier read by a query into a temporary storage – whether it is in memory or on disk. This is as opposed to reading rows through an index in a correct order in the first place as then no explicit sorting is necessary. MySQL shows Using filesort whenever it has to sort explicitly.

    • Using filesort means that MySQL loads sort_buffer_size worth of data into the sort buffer and sort it. If the data is small enough the disk will never be needed. If it’s larger MySQL will do a merge sort that can be expected to use three temporary files, merging from two into the third. It’ll use as many passes as required to sort. This is also quite fast for moderate data sizes.

      This is changing significantly in 5.6 In 5.6 with LIMIT the sort buffer will be used to hold the rows needed for the LIMIT condition if they fit in the sort buffer. So no need to sort all rows. Also in 5.6 the temporary file space used can use variable instead of fixed length sizes for strings, so the temporary space use needed will often decrease significantly. Both should significantly improve sorting performance.

      For optimisations using the InnoDB primary key in a secondary index the version can be significant. It’s the sort of detail that we are sometimes willing to change even in monthly updates. I haven’t checked whether 5.6 will use the PK for where clauses in 5.6, though I know it’s been a known limitation.

      Views are my own, not necessarily those of Oracle. For an official view consult a PR person.

      James Day, MySQL Senior Principal Support Engineer, Oracle

  2. Sergei Golubchik says:

    > MySQL cannot use them, however, to optimize filtering in WHERE.

    But MariaDB can. Starting from 5.5.21

  3. > MySQL cannot use them, however, to optimize filtering in WHERE

    This would require index condition pushdown. It’s implemented in MariaDB GA (see Sergei above). And MySQL 5.6 will implement this too.

Trackbacks

  1. […] MySQL use primary key values from a secondary index? Maciej Dobrzanski has the […]

Speak Your Mind

*