In the previous blog post we have discussed a couple of best practices related to benchmarking. This time we would like to talk about the process itself. What can go wrong, what are the potential issues you have to consider. We will take a look at some of the real-world scenarios.
What do you want to test?
First thing we have to keep in mind is: what do we want to test? This will allow us to configure the benchmarks properly. Generally speaking we are talking about two main types of workload. CPU-bound workload, which will manifest itself in the high CPU utilization with very low disk activity. The other type is I/O-bound, which puts the pressure on the I/O subsystem. Typically, the real-world workload is a mix of those two, somewhere in between. The CPU-bound traffic usually can be tested if we have all in-memory workload, ideally read-only. What it means is that all data fits in the memory so reads do not hit disk and there are no writes that hit the disk as well. The I/O-bound workload can be easily tested by reducing the size of the in-memory buffers or disabling the impact of the filesystem-level caching, for example by using direct_io. Alternatively, you can just provision a data set big enough to reach the active data-to-memory ratio that you want to have. Ratio, that will ensure that the majority of the data has to be read from disk.
Initial tests of a new VM size
Let’s say that you are running a sizable database environment in a cloud. As it happens every now and then, a new type of VMs has been introduced. You would like to compare the performance of the old and the new one instance. How can one approach it? The typical case is that you probably do not want to build a long and detailed benchmarking process based on the production data and query mix. Sure, that’d be perfect but it is a bit of an overkill given the time needed to set it all up. What can be done instead is to use a synthetic benchmark that somehow resembles your application. Pro tip: some of the benchmarks, like SysBench, give you an option to customize the query mix. Sure, it will never be the same as running a real life production workload but you can tweak the (still) synthetic benchmark so that it will better resemble the traffic that you run on the production.
How do we want to approach the problem? First, we need to keep in mind that we want to test new hardware so that is what should change. Other variables should stay untouched as much as it is possible. Of course, you may want to tweak some of them to accommodate changes to the hardware - more CPU cores, more memory, faster disk but this should be the only change performed. Otherwise you should be using the same configuration, the same OS and database version. You also should prepare the benchmark data according to the requirements. If you want to test CPU performance, you probably want to run CPU-bound workload where data is stored in-memory. If you want to test the overall performance, you may want to tweak the data size in a way that the active data-to-memory ratio becomes similar to what you have in production.
Obviously, you want to automate the tests and execute them in several passes. This will help you to avoid common pitfalls as a busy neighbour in the virtualized environment. Some cloud providers, for larger instances, let you pick an option of sharing a single host across all your virtual machines. Some even let you choose dedicated hardware for a given VM. This will ensure that the interferences from the outside your VM are minimized and do not impact the performance of the VM itself.
Application tests
Another typical use case is to run application load tests. This is quite an important aspect as it allows you to determine how good your application scales and how big traffic your application can handle. From the database standpoint there are a couple of things to consider. First, are you doing a full stack application load test or is it going to be just a database-side test? The first, actually, is way easier for the database administrator because it does not really involve any additional work to set it up. Such a test, typically, is executed on the application level, going through several scenarios related to the application functionality. The database just accepts the traffic and deals with it. By running several passes you may be able to tweak database configuration to make the most out of it. Preparations, typically, are limited to setting up the database using a snapshot of the production data (latest backup, for example).
The database-side tests are more complex to prepare mostly because you have to prepare the queries. This may be quite tricky to do. In general, almost every database allows you to log the queries. This would be your starting point. Once you have the queries, you need to modify them so that they are able to be executed on the database. You are, most likely, using production data from a backup, therefore those cannot be “some” queries but the queries must have made sense somehow. The main challenge will be to sync the time queries have been executed and when backup has been created. If we have SELECT in the query, the database has to contain that table or row. If we have INSERT INTO…, the database cannot have a row created. Once you are done with it, the last step will be to execute the queries in the database. Again, to make it more realistic, you probably want to use a multi-threaded approach here. All depends on the database - in some cases, like MySQL, you can track the transaction ID in the logs, allowing you to group all the queries executed by a single transaction and then execute multiple transactions at the same time, emulating the real traffic.
I/O vs. network
Let’s talk about network attached storage. This typically is related to the cloud, public or private, but any kind of NAS may be affected. If we are looking at the I/O performance of a block storage, what you usually see is I/O operations per second (IOPS). This makes sense because the throughput depends on the number of operations per second and the size of data that is read in a single operation. What people tend to forget is that throughput for network attached storage is quite important. This is because the data will be sent, well, over the network. If you are going to run the benchmark over the network, for example, because you want to simulate the flow of the real application, where you connect to the database over the network, this may become a problem. We are not saying that it commonly is an issue but you should keep in mind that:
Application will be sending the requests over the network to the database
Database will be reading data over the network, from disk
Database will be sending the result set over the network to the application server
All of this is happening over the network and, potentially, it may saturate it thus introducing a bottle neck. Of course, all depends on the network speed, NAS performance and several other factors but if it happens and goes unnoticed, it will skew the benchmark results and, ultimately, may result in incorrect takeaways from the benchmark.
Soft IRQ
This is another gotcha that can be easily overlooked. Let’s say you are interested in CPU performance so you are setting up a read-only, in-memory workload. You do not want to “waste” your CPU cores for the benchmark itself, as it also requires CPU to run, therefore you set up a separate “benchmark” node where you run the benchmark and connect over the TCP to the database server. In such a setup you have 32 cores available for the database and, let’s say, 16 cores for the benchmark. You proceed to execute the benchmark and you see only 2/3 of the cores on the database server fully utilized. What is going on? You tune the database configuration, nothing is changing. The problem you are experiencing is related to the network traffic. Network traffic is handled by the kernel through softirq. Every packet that is sent or received requires some CPU cycles to be dealt with. If all of the packets will be processed by a single CPU core, this may become a bottleneck. There are ways to deal with such issues through CPU affinity and irqbalance. On a new system, most likely, you will not experience this problem. It’s still always worth checking if you will notice that the database server is not utilizing all the cores it has available.
As you can see, benchmarking may become a tricky process but, just like we mentioned several times already, it is something that you do on a regular basis. Not daily, maybe, but it happens frequently enough to give it a bit of thought and make yourself familiar with potential problems and issues. If you have encountered difficulties through the benchmarking process we would love to hear from you. You can use the comments section below to share your stories.