Blog

Why a statement can be unsafe when it uses LIMIT clause?

MySQL 5.1 or newer can sometimes start throwing a strange message into an error log. The message states that a query was unsafe for binary logging along with some additional information. What does it mean? Is it a problem?

From time to time you might spot MySQL error log filling with the following warning:

"[Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' LIMIT 1"

If binary logging is enabled and the log format is set to STATEMENT, MySQL generates such message when it considers that a query is ambiguous and could behave differently each time it executes against the same data set. Such situation could happen, for example, on a replication slave, or on a restored backup where binary logs are used for point-in-time recovery.

Why? The reason lies in the relational database theory, which defines table as a set of rows, while a set does not assume any particular order. What if there were a few rows matching user_id = 12345 and created = '2012-04-15'? Well, in theory, such DELETE statement could read and remove a different row each time, because if all share the same values in user_id and created, how to decide which one comes first or last? Of course, a database implementation, such as MySQL, typically relies on some implicit order even if the theory behind it does not. But then again, there may be a different problem. Even if MySQL uses some implicit sort order for storing rows, does it guarantee the order is always the same? No.

row-order-e1334616686398.png

Two rows appear in different order on Master and Slave. The two databases are still identical from the SQL point of view, although the way they organized data is different. Now, if our statement started deleting rows from the top, it would remove two different rows from each instance and the two databases would drift out of sync. This is what MySQL is warning you about by saying This is unsafe because the set of rows included cannot be predicted. The problem does not exist when there is no LIMIT clause, because then MySQL simply removes all rows matching the criteria and it does not matter to the final result in which order it does that.

So is seeing such warning message a problem? Definitely not, if binary logs aren't used for anything - not replication, not recovery. Otherwise the answer would have to be yes, although it does not imply there is anything wrong with the data already. And although chances are that nothing bad will happen even when MySQL keeps complaining, you should never rely on luck - fix the problem.

The real option to address this is moving to either MIXED or ROW format for binary logging, which is defined by binlog_format MySQL option. And what is cool, it can also be set at runtime with SET (for current session only) or SET GLOBAL (globally), so no restart is even necessary. However don't rush to change it just yet. Be sure to read and understand what the differences between all these formats are first as making this change might break something that your applications rely on. But in the end, although it seems like a nasty workaround, it's even possible to use ROW format for just one specific query by using the following sequence:

SET SESSION binlog_format = ROW;
DELETE ...;
SET SESSION binlog_format = STATEMENT;

An alternative approach to alleviate the potential problem could be increasing the chances that the same row gets chosen every time, however it can only be done under the assumption that all instances contain identical data.

To make the example a little bit more complete, here is the query again along with the underlying data structure:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' LIMIT 1

CREATE TABLE `score` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`created` date NOT NULL,
`score` int(11) NOT NULL,
[..]
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`, `created`)
)

If data on master and slave is identical, to prevent potential problems, one could use ORDER BY on a unique and non-nullable column, such as id for example. That could help ensuring that such query would always pick the youngest or the oldest row out of the matching ones. There should be no confusion anymore as the order would be always the same regardless of the circumstances. Again, this may fail to work if some instances involved are not identical to others (e.g. when master keeps only recent data, while some slave holds the full archive). It may also not work in more complex cases (e.g. when JOIN is used in a query). Finally, the warning message may still be printed into the log as MySQL may not be smart enough to recognize such fix.

The modified query would look like this:

DELETE FROM score WHERE user_id = 12345 AND created = '2012-04-15' ORDER BY id ASC LIMIT 1

The warning message discussed in this articles isn't necessarily a nasty problem that needs urgent attention, however it might be a sign of problems ahead. It is definitely worth an investigation to verify whether it may or may not be causing any damage to data. A fix is relatively trivial, although it should be done carefully and with analysis of the possible consequences of implementing the changes.

Take care of your MySQL performance.

MySQL audits available from only $129 per server. Learn More
blog comments powered by Disqus