Wednesday, July 21, 2010

Munin's Performance

Munin's CPU usage

As seen before, munin is run by a cron job every five minutes. So, every five minutes, it connects to all the servers it has to monitor, fetches all the data, writes the data in hundreds of RRD files, and recreates all the HTML files and hundreds of PNG files; the more servers monitored, the more CPU munin will use.

Some other tests are also rather interesting:

/usr/share/munin$ time sudo -u munin ./munin-update

real    0m27.453s
user    0m0.152s
sys     0m0.036s

/usr/share/munin$ time sudo -u munin ./munin-limits

real    0m0.179s

user    0m0.132s
sys     0m0.016s

/usr/share/munin$ time sudo -u munin ./munin-html

real    0m0.270s

user    0m0.176s
sys     0m0.020s

/usr/share/munin$ time sudo -u munin ./munin-graph

real    0m11.376s

user    0m10.465s
sys     0m0.500s

This test (made on my desktop, one node monitored only) shows two interesting things: first, the generation of the PNGs is the heaviest part of the process (10.965 seconds of cpu usage vs 0.532 for the three other processes); second, the munin-update process takes nearly 30 seconds to complete, but barely uses the CPU - probably because it is waiting for the node to run all its plugins. That's why when munin starts, it forks, and run a process for each node, and why you should not prevent it from forking (there is an option for that - don't use it).

If now I was monitoring 10 nodes, it would take approx. 110 seconds on my desktop (if nothing else is running), every five minutes. In other words: as you add nodes to munin, it tends to become quite heavy.

Run Munin as a CGI

One of the ways to improve the performances is to change the way Munin creates the graphs; instead of recreating the graphs every five minutes, we can create them only when a user has requested them, by displaying one of the webpages. This is made possible with CGI.

So, how does it work? When installed, Munin creates a script in /usr/lib/cgi-bin/, munin-cgi-graph. When configured as CGI, Munin changes the links to the pictures in the HTML files, making them point to munin-cgi-graph:

img src="/cgi-bin/munin-cgi-graph/localdomain/localhost.localdomain/df_inode-day.png" ...

Depending on the path, munin-cgi-graph will create the appropriate graph, which will then be displayed. There is also a caching system, so that if you reload the page within five minutes, the graphs won't be regenerated again; therefore, as munin will write the files to the disk, the directory /var/www/munin must be writeable by the apache process. Making the files belong to the user munin and the group www-data, and giving the group write access, is one solution:

/var/www$ sudo chown -R munin:www-data /var/www/munin
/var/www$ sudo chmod -R g+w /var/www/munin

The performance gain is huge; but one of the drawbacks to this method is that it takes a lot more time to display a page containing several graphs, like the node view.

To configure Munin as CGI you need to add the following lines to your /etc/munin/munin.conf:

graph_strategy cgi
cgiurl /cgi-bin
cgiurl_graph /cgi-bin/munin-cgi-graph

These lines help Munin to create correct links to the graphs. Now, assuming you are using Apache, you need to edit your main apache configuration file, to allow /usr/lib/cgi-bin to run CGI scripts:

        AllowOverride None
        Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
        Order allow,deny
        Allow from all

Finally, you need to tell Apache that your website is going to use CGI. If you have a special virtual host set up for munin, then add that line there; else add it somewhere in the main apache configuration file:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

Munin-cgi-graph also uses the perl module Date::Manip; which you need to install. Your Munin is now running as CGI!

Move Munin's RRD databases to a TMPFS

On an install of approximatively 30 servers monitored, I have over 2000 RRD databases in /var/lib/munin. This number can vary depending on the number of services you monitor per server, but what we can remember is: every time munin runs (every 5 minutes), hundreds if not thousands of databases are written to and read from. If your disks aren't very fast, this can prove quite costly as the number of servers monitored grows.

This can be improved by moving the files contained in /var/lib/munin to a tmpfs. In my example, with 30 servers monitored and 2250 RRD files, only 115MB are used on the disk - considering the amount of RAM in servers nowadays, it may be worth saving some disk i/o at the cost of some RAM. As all the data would be lost in case of a server restart, we will back the data up every hour/day/week depending on how much data you are willing to lose.

Make a backup of the folder:

cd /var/lib
cp -ra munin/ munin-cache

Add this to your /etc/fstab:

tmpfs /var/lib/munin tmpfs rw,size=512M 0 0

Mount it:

sudo rm -rf /var/lib/munin/*
sudo mount /var/lib/munin

Copy the data back from the backup:

sudo cp -ra /var/lib/munin-cache/* /var/lib/munin

Create an hourly (or daily) cronjob that copies the files from munin to munin-cache:

ServerX $ cd /etc/cron.hourly
ServerX $ ls -l
total 4
-rwxr-xr-x 1 root root 57 2009-03-10 18:55 munin-cache
ServerX $ cat munin-cache
cp -ra /var/lib/munin/* /var/lib/munin-cache/

And then to restore the files from the backup automatically after a reboot, add this at the end of /etc/rc.local:

cp -ra /var/lib/munin-cache/* /var/lib/munin

Frequently asked questions

Is there a munin-node for Windows?
Yes, but it is unofficial. It is maintained by TOCOMPLETE? and is written in C++. Search Google for a download link.


  1. HI! Im using munin to monitor a couple of servers (more than 50 servers actually) and is getting really bad. Update processes takes more than 2 or 3 minutes, and as you can imagen munin-graph takes longer... so graphs start to break and is completly useless. Im running munin in an phisical server, an hp proliant with 8 gb of ram and 4 cpus, with a really bad performance, so a few days ago i was asked to migrate this munin install to a virtual server. Today i migrated it, and... it's not better. Tomorrow i will tell my partners about this improves you've mentioned and i will try them on the Phisycal server, i guess is going to work better than the virtual one. Thanks!!! i guess this is the info i needed!!

  2. Install munin 2.0 and use cgi strategy fro graphs and html and rrdcached for munin-update like this: