Wednesday, October 27, 2010

Analyzing Linux write performance problems with iotop

A common performance problem for servers is writing data. When the server writes too much data, the storage sub-channel isn't able to catch up. But, it can be difficult to discover the performance problems, so here we'll walk through how to analyze performance issues on Linux servers.

The problem with write performance-related issues, is that often they hide behind other parameters. In all cases however, the top command is a good place to start. The wa parameter in the CPU line indicates the amount of time that your system has been waiting for the I/O-channel. Typically, this indicates a slow storage channel.


A high value at the wa parameter indicates that the storage channel is suffering

Just looking at top however is not good enough. Let's have a look at a small test. In this test, we have written the current memory state to disk, using dd if=/dev/kcore of=/kcore.img bs=4096. On a 4 GB machine, that is a lot of work for the machine to do. While performing the job, everything else really slows down, which means there is a performance problem.

But, the problem with top is that it's not really easy to see the write performance problem. It all depends on the amount of CPU cores you have. On a 16-core server, the write problem may claim all CPU cycles on one CPU. You would however not see that from the generic top overview as this gives you the average for all CPU's together. So with a CPU core that is completely claimed by writes, you wouldn't see much more than 6% at the wa parameter in top. To get more detail, the first thing to do, is to press the 1 key in the top interface, which gives you a line for each CPU core. On the test system used for writing this article, there are two cores only, so the results are not spectacular. But, on a multi-core system the differences displayed may be important.


From top, press 1 to see performance details for each CPU core.

So, if only one core out of 16 cores is completely busy waiting for the slow storage channel, then the 15 remaining cores can do the work, right? Too often, the answer is "no." If you just have one storage channel, then all CPU's need to go over that one single channel. If one CPU is completely busy waiting for a storage channel, then the other cores won't be able to get prompt reactions from the storage channel either. So it may look all right from the top window, but performance could be terribly off.

Fortunately, there is iotop and it gives information about the most active processes with regard to I/O. Most Linux distributions don't install it by default, so make sure you install it manually, using your distributions meta package handler (for instance: yum install iotop if you're using RHEL). The good thing about iotop, is that it shows you which is the most active process with regard to I/O at the moment and how much I/O it is generating. If you compare the I/O load caused by this process with the capacity of your storage channel, you'll know immediately if you have a storage problem and if so, where it comes from. Then you can troubleshoot this issue, and optimize your Linux server write performance.


With iotop you can see exactly what process is claiming your valuable resources.

No comments:

Post a Comment