Why Gnmond?

Nagios and Ganglia are both well know cluster monitoring systems and are used in many different cluster systems. Unfortunately those use of those systems have big drawbacks:

Nagios strengths are the monitoring of service and alerting administrator. Furthermore it’s highly configurable and extendable through plugins. But if you want to use Nagios to monitor the state of your cluster, especially metrics like load, memory & disk usage and network traffic, it will get problems, since to collect those metrics, it has to connect to every node in you cluster periodically, asking him for it’s status. This can cause raising network traffic and corresponding timeout lading to false alarms.
Furthermore Nagios cannot combine different service into a consistent view of the cluster.

Ganglias strength are the low network traffic and the load for the nodes, caused by it’s bottom-to-top desing. New metric states will be delivered only inside of the cluster and only if necessary, but you are not able to set boundaries for the metrics, thus Ganglia cannot decide weather the cluster needs you attention or not.

The goal of Gnmond is to show you a single consistent health status telling you weather everything is OK or not, and if something went wrong telling you want.

So Gnmond analyzes the metric collected by Ganglia, aggregates them into a single status per cluster, and showing those states in Nagios. Now by looking at those states you can decide with very less effort if you have to intervene.