Recently we have helped our customer to migrate their entire application stack from one data center to another. Before we were brought on-board, customer had already placed an order for a new set of servers with the new hosting provider. All of them were suppose to be high-end systems – many CPU cores, plenty of RAM and RAID array build on top of SSD drives. As the new machines started being available to us, we began setting up the new environment. At some point it turned out that the new machines were actually slower compared to the several year old systems and their load was much higher under comparable traffic.
We examined several of the new servers and each time the conclusion was that the problems were related poor I/O performance. In the benchmarks a RAID 10 array on Intel SSD 330 Series drives was barely able to achieve 200-300 IOPS in random writes and even that at the cost of insanely high response times. This was a pathetic result even for a single 15k RPM HDD, but even if it wasn’t, we knew the approximate I/O capacity of a Intel SSD 330 device and what sort of performance to expect from the RAID array.
As it turned out, the hosting company in their purchase form only allowed choosing only basic system specs such as CPU model, RAM size, disk type and capacity and (optionally) RAID level. However, regardless of the selected disks, they always installed the archaic 3ware 9650SE RAID controller and with equally archaic firmware. 3ware has a history of providing mediocre performance in random I/O operations, even with standard disks. We would never dare to recommend these products whenever performance or reliability were at stake. Clearly these controllers can behave even worse when you plug SSDs into them.
At first, the hosting provider didn’t see a problem. They claimed they had been successfully using the controller in thousands of servers like ours (likely quite a few of them have SSDs) and they saw no complaints thus far. However as the servers were performing poorly and our customer paid extra to get the SSDs, we kept pushing to have the 3ware controllers replaced with something we knew would do a much better job. After some back and forth eventually we were given three other different options:
Even with updated firmware the 3ware unit couldn’t come anywhere near the performance offered by more appropriate storage controllers.
What lesson should you take from this post?
Whenever you install a new server, make sure to test it thoroughly before moving it into production. The extra effort will benefit you in several ways:
- you will verify that the server performs up to your expectations,
- you may be able to find hardware problems such as storage misconfiguration, incorrectly installed CPU fan, or a failing disk,
- you should be able to avoid facing a nasty surprise and a likely downtime once server starts handling production traffic,
- and you will ensure that you were given exactly what you were promised or have paid for.
Finally, by measuring raw hardware performance and keeping the metrics on file, you can use the information later for comparison when you are adding servers again.
If you ever need any assistance with testing your servers, we’ll be more than happy to help you!