Gnmond Documentation

Introduction
1. Network Topology
2. Data Flow
Getting Started
Gnmond in Detail
Configure Gnmond
Extend Gnmond
Gnmond Plugins
1. Input
2. Output
Conclusions
Pydocs
Examples

Introduction

Gnmond is a monitoring tool for computer clusters, that collect data originating from individual hosts, analyzes it and provides a concise view of the individual cluster's state to other monitoring systems.

Gnmond was originally designed to collect data from the Ganglia Gmond daemons, aggregate and analyze the date on a per cluster basis and supply the cluster state summary to Nagios; therefore the name of the tool "Ganglia Nagios MONitoring Daemon".

Thanks to the plug-in implementation of Gnmond, its functionality can be easily extended beyond the original design and its data processing logic is highly customizable.

Value	Name	Description
7	DEBUG	Debug information
6	INFO	A normal information, or additional information to errors
4	WARNING	An error happens. Gnmond will try to work around
2	CRITICAL	A critical error. Gnmond will exit

Value	Name	Description
0	NAGIOS_OK	Everything looks good
1	NAGIOS_WARNING	Something is not good. May need attention, but not immediately
2	NAGIOS_CRITICAL	Something goes badly wrong. Needs attention immediately
3	NAGIOS_UNKNOWN	Gnmond cannot compute a state

Function	Description
addAllowedServer(SERVER)	Add the host SERVER to the list of allowed servers. Only those servers are allowed to connect to one of the Gnmond output plug-ins. You should for example add your Nagios server, or your local PC to connect to Telnet. The list of allowed servers is shared by all plug-ins.
setExecutingInterval(TIME)	Sets the execution interval. This health plugins will be executed every TIME minutes. Default is 1. This value will be used only by this health plug-in
setMaxExecutingTime(TIME)	Sets the maximal execution time. After TIME seconds, the analyze() and the clusterFailure() function are assumed to be crashed. Default is 1. This values will be used only by this health plug-in.

Name	Description
name	The name of your cluster (has to be a string). If you want to use Gmond as input, the name has to be exactly the same as in Ganglia, otherwise Gnmond is not able to find the cluster.
initialHosts	A list of some nodes in the cluster. They are used to get the metrics for this cluster. Gmond is checking only one node to get all metrics for the cluster. Thus it it sufficient to give only few nodes, Gnmond will get a list of nodes in this cluster later by himself.
refreshTime	Sets the checking interval. Every TIME minutes Gnmond will try to get new metrics form this cluster. Default is 1. Note that it normally makes no sense to set this to something different than the executing interval of the health plug-in
checkWith	Chooses the input plug-in that will be used to get new metrics. Default is Gmond. If you want to set another plug-in, you have to import this plug-in first. See also the health examples

Name	Description
name	The name of the record. The name should not consists spaces or special characters. The name should not be to long (an optimal name is between 4 and 15 characters long). The name is not allowed to begin with gnmond_
status	A default status value. Has to be 0,1,2 or 3
short	A default short status message, is not allowed to consist a new line character. Should be less than 70 characters long
long	(optional) A default long status message
perf	(optional)Default perf data.

Name	Description
getMetrics()	Gets metrics form a source and stores them in cluster.values and cluster.nodes. Gets cluster as argument
maximalExecutionTime	The maximal execution time (integer) in seconds. After this time getMetrics will be stopped and reported as dead. maximalExecutionTime is optional. Default is 3

Name	Description
serverList	A list of servers who are allowed to access data provided by Gnmond. You should only allow connections from this servers. Servers are added to this list by the health plugins
logger	A GnmondLogger object, to perform some logging

Gnmond Documentation

Introduction

Network Topology

Data Flow

Getting Started

Requirements

RPM Installation

Manual Installation

Configure Gnmond for a Cluster

Troubleshooting

Configure Nagios

Further Steps

Gnmond in Detail

Controll Flow

Logging

Records

Configure Gnmond

General Configuration

Add Cluster

Group Management

Metric Storage

The Analyze Function

The Records

Lost Clusters

Default Checks

Logging

Adding Health Plugins

Extend Gnmond

Gnmond's Plug-in Design

Input Plugins

The getMetrics() function

Exceptions

Output Plugins

Gnmond Plugins

Input

Gmond

Collectl

Output

Nagios

Telnet

XML

Conclusions

Pydocs

Examples

Health Plugins

Output Plugins

Input Plugins

Nagios Configuration