Interesting behavior of a MySQL benchmark on EC2

I had to benchmark an EC2 instance to see whether a database could be safely moved to it. It is a good practice, which helps avoiding surprises when an instance or its storage are allocated in a noisy neighborhood, where the neighbors use so much resources that it affects the performance of our MySQL database. It is understandable that one can never get very reliable results on EC2, this is a shared environment after all, and that some fluctuations should be expected, however it is still good to know the numbers. I started my benchmarks and everything seemed fine at first, but then sometimes statistics I was getting started looking quite odd.

I was running the benchmarks on a High-CPU Extra Large Instance and couldn’t see any reliability in the results at all. I mean, in one moment I was getting poor throughput and horrible response times only to see it improve a lot a few minutes later. I ruled out a possibility that it could be some kind of problems with the storage performance causing this, so I started suspecting MySQL itself – perhaps some sort of contention. But no system statistics, nor any information from MySQL wanted to confirm that theory.

Finally, I went for oprofile and started it during one of the benchmarks and what I saw surprised me:

Note: The Y axis on the right has values in the reverse order for context switches and interrupts.

When oprofile was running MySQL performance nearly doubled. It was also reflected in the CPU utilization which increased accordingly and also got a lot more stable. After a while, I stopped oprofile and the performance dropped again.

At this point I am not sure how to explain this, however this was repeatable. Anyone?

[MySQL Health Check]
About Maciej Dobrzanski

A MySQL consultant with the primary focus on systems, databases and application stacks performance and scalability. Expert on open source technologies such as Linux, BSD, Apache, nginx, MySQL, and many more. @linkedin

Comments

  1. Drop in interrupts and switches and increase in CPU utilization – almost like a change in affinity or interrupt balance?

  2. Are you on EBS? If so, are you testing with the largest possible volume size?

    I’ve seen problems with a size smaller than max b/c of issues with co-tenantcy.

    • Igor,

      The workload was mostly CPU-bound as data size was real small there. But I checked I/O statistics and they showed nothing bad. I’d say something that oprofile does, affects the way a guest is granted CPU time. Or perhaps the benchmarks I was running weren’t welcome for whatever reason and my instance was somehow “throttled”, but then running oprofile inexplicably fixed that.

  3. Vincent Janelle says:

    Saw much of the same with some CPU-bound tasks we were trying to run on an 8 cpu instance – trying to use all 8 cores resulted in increased processing times, but up to 5 or 6 it was a fairly linear increase in performance.

    This probably has something to do with the hypervisor needing to schedule all 8 cpus for your guest and having issues with this. You’ll see much of the same on VMware – this won’t get much better until the NUMA scheduling patches make it into the kernel and Amazon can deploy them.

    • Vincent,

      I tried sysbench –test=cpu with 8 threads, but that worked as expected and all 8 CPUs were saturated at ~100% most of the time (except for some occasional spikes in %steal). I also tried 100% CPU-bound benchmark with MySQL, but that just followed the pattern I was seeing earlier:

      [ 10s] threads: 8, tps: 322.43, reads/s: 4521.33, writes/s: 0.00, response time: 458.09ms (99%)
      [ 20s] threads: 8, tps: 454.40, reads/s: 6364.31, writes/s: 0.00, response time: 491.91ms (99%)
      [ 30s] threads: 8, tps: 319.80, reads/s: 4477.00, writes/s: 0.00, response time: 468.49ms (99%)
      [ 40s] threads: 8, tps: 319.00, reads/s: 4466.00, writes/s: 0.00, response time: 430.69ms (99%)
      [ 50s] threads: 8, tps: 397.20, reads/s: 5558.90, writes/s: 0.00, response time: 355.92ms (99%)
      [ 60s] threads: 8, tps: 590.50, reads/s: 8267.20, writes/s: 0.00, response time: 193.73ms (99%)
      [ 70s] threads: 8, tps: 1538.80, reads/s: 21543.98, writes/s: 0.00, response time: 10.85ms (99%)
      [ 80s] threads: 8, tps: 1542.00, reads/s: 21586.79, writes/s: 0.00, response time: 10.55ms (99%)
      [ 90s] threads: 8, tps: 1537.80, reads/s: 21530.99, writes/s: 0.00, response time: 10.78ms (99%)
      [ 100s] threads: 8, tps: 1556.99, reads/s: 21796.79, writes/s: 0.00, response time: 10.33ms (99%)
      [ 110s] threads: 8, tps: 1524.40, reads/s: 21341.76, writes/s: 0.00, response time: 12.09ms (99%)
      [ 120s] threads: 8, tps: 1576.40, reads/s: 22071.06, writes/s: 0.00, response time: 10.57ms (99%)

      There are obvious differences between the two tests. CPU benchmark continuously runs calculations and nothing else and this works fine. While MySQL benchmark, even if we remove all I/O, still involves various system calls, often interacts with the threading library, so the problem has to be related to that.

      But the question remains, why simply starting oprofile changes this so dramatically? Does it mean we should run oprofile continuously to performance-boost EC2? :-)

  4. IgorM says:

    Did you check CPU frequencies ( cat /proc/cpuinfo | grep MHz) both with and without oprofile running?

  5. I’ve already observed a similar “gaps” on the real systems too. You have to monitor for MySQL/InnoDB mutex contentions during both tests – it’s possible that profiling by adding its own overhead is decreasing some internal contentions within MySQL.. As well try another profiler like “perf” to be sure it’s not only related to “oprofile”.

    Rgds,
    -Dimitri

    • Dimitri,

      I would totally understand the problem if I was able to see any noticeable contention. But I didn’t. That’s why it was a bit mind-boggling. I will give it a try again some day and look at smaller details.

Trackbacks

  1. […] Maciej Dobrzanski shares an interesting behavior of a MySQL benchmark on EC2. […]

Speak Your Mind

*