Of late the networking graphs in our Ganglia monitoring have suffered from irritating, improbable spikes (30PB…) that effectively render them meaningless. At first I tried the removespikes.pl script that I saw mentioned by other people with the same problem. This didn’t work all that well, either over- or under-shooting what was required. It also felt like solving the symptoms rather than the cause. After all, Ganglia is just plotting what it receives from rrdtool.
Eventually I found a suggestion of applying a maximum value in the header of RRD files with rrdtool. This way, I could rule out these (pretty much) impossible values. Here’s an example command:
rrdtool tune bytes_in.rrd --maximum sum:9.0000000000e+09
Clearly care is needed that legitimate values aren’t excluded e.g. interfaces running at 10 gigabit or higher speeds. It’s been working well for the past week and the network graphs are now meaningful again (after manually removing the outlying values).