Why do threads sometimes stay in 'killed' state in MySQL?

Have you ever tried to kill a query, but rather than just go away, it remained among the running ones for an extended period of time? Or perhaps you have noticed some threads makred with killed showing up from time to time and not actually dying. What are these zombies? Why does MySQL sometimes seem to fail to terminate queries quickly? Is there any way to force the kill command to actually work instantaneously? This article sheds some light on it.

Threads and connections

MySQL uses a separate thread for each client connection. A query sent to MySQL is handled by a thread that was previously associated with the connection over which the query arrived. Anyone with sufficient privileges can see the list of currently active threads, along with some additional details, by running SHOW PROCESSLIST command, which returns a table-like view where each connection becomes a separate row:

+-----------+-------------+-------------------+--------+-------------+----------+-------------+---------+
| Id        | User        | Host              | db     | Command     | Time     | State       | Info    |
+-----------+-------------+-------------------+--------+-------------+----------+-------------+---------+
| 827044892 | production  | 10.0.1.100:48596  | proddb | Sleep       |        1 |             |    NULL |
| 827044893 | production  | 10.0.1.100:39181  | proddb | Sleep       |        1 |             |    NULL |
| 827044894 | production  | 10.0.1.100:48598  | proddb | Sleep       |        1 |             |    NULL |
| 827044895 | production  | 10.0.1.100:39183  | proddb | Sleep       |        1 |             |    NULL |

More advanced techniques of dealing with this information are described in the posts titled "Anohter way to work with MySQL process list" and "How to selectively kill queries in MySQL?".

Any running query or any existing connection from the list can be terminated by using KILL command.

What actually happens when you run KILL?

The command sytnax is KILL [QUERY | CONNECTION] <thread_id>, where thread_id is the value from the first column of the process list output. The optional argument determines whether only running query or should the entire session be terminated. It defaults to the latter, so specifying CONNECTION is not required.

Running the command doesn't actually do anything except for setting a special flag inside the selected thread. Therefore the kill operation doesn't happen synchronously with the corresponding request. In many cases it takes some time for a thread or a query to stop. The flag is checked at various stages of statement execution. When it happens exactly, or how frequently, depends on the work a thread is actually doing.

For example:

During ALTER TABLE it is checked before each block of rows is read from the original table while MySQL is rewriting data into a new temporary table. In the process, the temporary table is deleted and the original structure remains unchanged.
For UPDATE and DELETE there is also an additional check after each updated or deleted row. If a query managed to make any changes prior to noticing the request to terminate, they have to be rolled back. Note: if a table's storage engine does not support transactions (e.g. MyISAM), the changes cannot be rolled back, so the operation will result in partial update!
SELECT also checks it after reading a block of rows.

Of course, these were just basic examples and the behavior can be different in different situations. It is even possible that some queries will be unkillable in certain circumstances. A case of this happened once while reading from INFORMATION_SCHEMA.INNODB_BUFFER_POOL_PAGES_INDEX table. The database was having performance problems at the time and the query execution basically stopped on a lock somewhere deep inside InnoDB and it never got back to the point where the flag value could be checked again.

Threads stay in killed state for a long time. What does it mean?

Actually, there can be two different cases. If KILL <thread_id> was issued, there would be killed in the process list. KILL QUERY <thread_id> doesn't kill a connection, but rather it only stops a running query within a connection, so in that case query end text may appear instead.

What if you see something like this?

+----+------+-----------+------+---------+------+----------+------------------------------------+
| Id | User | Host      | db   | Command | Time | State    | Info                               |
+----+------+-----------+------+---------+------+----------+------------------------------------+
| 10 | root | localhost | NULL | Query   |    0 | NULL     | show processlist                   |
| 14 | root | localhost | NULL | Killed  |   27 | Updating | update testdb.sometable set cc=sqrt(id) |
+----+------+-----------+------+---------+------+----------+------------------------------------+

It can either be an effect of a bug (e.g. Bug #52528), or more likely it means the database is performing some work internally to clean up after a task that was terminated.

There is of course no easy way to confirm if this is a bug. So in order to figure it out, you should rather look for evidence that it is not a bug. That what you see is just the effect of a standard operation, which MySQL has to perform to clean up after a query or a transaction.

Probably the most common reason for a thread to stay with either killed or query end for a longer period of time is waiting for a transaction rollback on InnoDB tables. This sometimes can take a lot of time to complete, especially when hundreds of thousands or millions of changes have to be removed.

How to verify that?

Check is the output of SHOW ENGINE INNODB STATUSG. It can simply print the information if there is a rollback currently in progress:

---TRANSACTION 0 10411, ACTIVE 28 sec, process no 15506, OS thread id 140732309711184 rollback
mysql tables in use 1, locked 1
ROLLING BACK 7585 lock struct(s), heap size 751600, undo log entries 798854
MySQL thread id 14, query id 206 localhost root end
update testdb.sometable set cc=sqrt(id)

The example shows how easily the information can be found. If a thread is marked with killed, or with query end, and the InnoDB engine status reports a rollback for the same thread, just wait until it ends.

What if it isn't that?

It can be related to removing some temporary table from disk. An ALTER TABLE may need to discard a very large temporary table, while large file removal on some filesystems (e.g. ext3 or ext4) can be rather slow, so it may need a few seconds or sometimes even longer than that. A temporary table can also be created by any DML statement, but usually not nearly as big in size.

It should not usually be necessary as even in the most extreme cases deleting a file should not take more than ten or twenty seconds, but under heavy I/O load it could be much longer, so is there any way to see whether any temporary table were created or not?

In Percona Server or MariaDB you check the contents of INFORMATION_SCHEMA.GLOBAL_TEMPORARY_TABLE and INFORMATION_SCHEMA.TEMPORARY_TABLE. It will only work for manually established temporary tables with CREATE TEMPORARY TABLE statement, not for those created implicitly by MySQL to execute complex queries.

mysql> select * from INFORMATION_SCHEMA.GLOBAL_TEMPORARY_TABLES;

SESSION_ID is the same as the thread identifier in the process list, so you can connect any killed threads to their temporary tables.

The other type of temporary tables cannot be easily traced. Sometimes it's possible to spot the information in the SHOW PROCESSLIST output:

mysql> show processlist;
+----------+------+-----------+------+---------+------+----------------------+---------------------------------------------------+
| Id       | User | Host      | db   | Command | Time | State                | Info                                              |
+----------+------+-----------+------+---------+------+----------------------+---------------------------------------------------+
| 1934     | root | localhost | test | Killed  |    4 | Copying to tmp table | INSERT INTO test.tbl SELECT * FROM test.testtable |
..

Sometimes you can check what temporary files the instance keeps open:

server ~ # lsof -c mysqld | grep #sql
mysqld  5626 mysql  138u   REG              253,1       1024 43843585  /vol/vol1/mysql/#sql_95fa_0.MYI
mysqld  5626 mysql  139u   REG              253,1  227262885 43843605  /vol/vol1/mysql/#sql_95fa_0.MYD

The colored value is the file's size. These may help you to make an assumption that database could be removing a temporary table.

Conclusions

From time to time it may be normal to see a database thread that was killed, but didn't die immediately, or a query that can't seem to end. The real operation is not be performed synchronously with the kill command and afterwards MySQL may still need some time to clean things up properly.

No matter what the real cause is, it is impossible to get rid of such hanging threads without a database restart. They should, however, simply be allowed to go away on their own.

There is also no way of forcing a kill that would execute instantly like kill -9 <pid> in Unix systems.

Blog