CheckCPU

CheckCPU is part of the wiki:CheckSystem module.

This check calculates an average of CPU usage for a specified period of time. The data is always collected in the background and the size and interval is configured from the CPUBufferSize and CheckResolution? options. A request has one or more options described in the table below.

OptionValuesDescription
warnload in %Load to go above to generate a warning.
critload in %Load to go above to generate a critical state.
Timetime with optional prefixThe time to calculate average over.
Multiple time= entries can be given - generating multiple CPU usage summaries and multiple warn/crits.
nsclientFlag to make the plug in run in NSClient compatibility mode
ShowAllnone, longAdd this option to show info even if no errors are detected. Set it to long to show detailed information.

Time can use any of the following postfixes. w=week, d=day, h=hour, m=minute and s=second.

Configuration

The size and frequency of sampled CPU data can be configured and for details refer to the configuration section for the CheckSystem module

FAQ

  • Q: "NSClient - ERROR: Could not get data for 60 perhaps we don"t collect data this far back?"
  • A: See the configuration section on how to configure the "CPUBufferSize" it has to be LARGER then your collection time here.
  • Q: How does it handle multi CPU machines?
  • A: The returned value is the average value of the CPU load of all the processors.

Examples

Sample Command

Check that the CPU load for various times is below 80%:

Sample Command:

CheckCPU warn=80 crit=90 time=20m time=10s time=4
OK: CPU Load ok.
Nagios Configuration:
define command {
  command_name <<CheckCPU>>
  command_line check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckCPU -a warn=$ARG1$ crit=$ARG2$ time=20m time=10s time=4
}
<<CheckCPU>> 80!90
From Commandline (with NRPE):
check_nrpe -H IP -p 5666 -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4

Multiple Time entry

Showing multiple time entry usage and returned data

Sample Command:

CheckCPU warn=2 crit=80 time=5m time=1m time=10s
WARNING: WARNING: 5m: average load 8% > warning
Nagios Configuration:
define command {
  command_name <<CheckCPU>>
  command_line check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckCPU -a warn=2 crit=$ARG1$ time=5m time=1m time=10s
}
<<CheckCPU>> 80
From Commandline (with NRPE):
check_nrpe -H IP -p 5666 -c CheckCPU -a warn=2 crit=80 time=5m time=1m time=10s

check_load

Check CPU load with intervals like known from Linux/Unix? (with example thresholds):

Sample Command:

CheckCPU warn=100 crit=100 time=1 warn=95 crit=99 time=5 warn=90 crit=95 time=15
OK: ...
Nagios Configuration:
define command {
  command_name <<CheckCPU>>
  command_line check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckCPU -a warn=100 crit=100 time=1 warn=95 crit=99 time=5 warn=90 crit=95 time=15
}
<<CheckCPU>> 
From Commandline (with NRPE):
check_nrpe -H IP -p 5666 -c CheckCPU -a warn=100 crit=100 time=1 warn=95 crit=99 time=5 warn=90 crit=95 time=15