Skip to content

CheckSystem

Various system related checks, such as CPU load, process state and memory.

Enable module

To enable this module and and allow using the commands you need to ass CheckSystem = enabled to the [/modules] section in nsclient.ini:

[/modules]
CheckSystem = enabled

Queries

A quick reference for all available queries (check commands) in the CheckSystem module.

List of commands:

A list of all available queries (check commands)

Command Description
check_cpu Check that the load of the CPU(s) are within bounds.
check_memory Check free/used memory on the system.
check_os_updates Check for available OS package updates via the system package manager (apt/dnf/yum/zypper/pacman).
check_os_version Check the version of the underlying OS.
check_pagefile Check the size of the system pagefile(s).
check_process Check state/metrics of one or more of the processes running on the computer.
check_service Check the state of one or more of the computer services.
check_uptime Check time since last server re-boot.

check_cpu

Check that the load of the CPU(s) are within bounds.

How CPU load is measured (historical buffer)

check_cpu does not measure the CPU load at the moment the check is executed. Instead, NSClient++ runs a background collector thread that samples the CPU load roughly once per second and pushes each sample into an in-memory ring buffer. Whenever you run check_cpu the values reported are averages computed from this buffer for one or more time windows.

The time windows are controlled by the time= option. The default is to compute three averages: 5m, 1m and 5s (which is why the default output contains rows like total 5m load, total 1m load and total 5s load). You can override this with one or more time= arguments, for example time=10m or time=30s time=2m.

Buffer size and configuration

The size of the historical buffer is controlled by the default buffer length setting on the CheckSystem section. The default is 1h, meaning the last hour of samples is retained. The buffer size puts an upper bound on the time windows you can use:

  • If you ask for a window that is shorter than or equal to the buffer length, the result is the average of all samples collected during that window.
  • If you ask for a window that is longer than the buffer length, the result will only cover the samples that are actually present in the buffer (effectively capped to the buffer length).
  • If NSClient++ was started less time ago than the requested window, the result will only reflect the samples collected since startup. Right after start-up 5m and 1m averages will therefore be based on fewer samples than they normally would be.

If you need to check on longer windows (for example 2h or 6h) you must increase default buffer length accordingly. Note that a larger buffer uses more memory, so only increase it as far as you actually need.

Impact on measurements

Because every value reported by check_cpu is an average over a time window, the choice of time= has a direct impact on what the check sees:

  • Short windows (e.g. 5s, 10s) are very reactive and will show short spikes in CPU load, but they also produce a lot of noise. They are useful for catching transient bursts but can also generate flapping alerts.
  • Medium windows (e.g. 1m, 5m) are a good compromise for most monitoring use cases. They smooth out short spikes while still reacting to sustained load within a few minutes.
  • Long windows (e.g. 15m, 1h) smooth out almost all transients and only fire when the CPU has been busy for an extended period of time. They are well suited to detecting sustained load but will be slow to react and slow to recover.

A common pattern is to combine windows, for example warning on a long window and critical on a short one (or vice versa), so that the check both catches sustained problems and ignores brief spikes. The default check (5m, 1m, 5s) is an example of this approach.

Because the values are averages, they will not match the instantaneous CPU load shown by tools such as top at the moment the check is executed, and very short spikes that fall between collection ticks may be missed entirely.

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_cpu
CPU Load ok
'total 5m load'=0%;80;90 'total 1m load'=0%;80;90 'total 5s load'=7%;80;90

Checking all cores by adding filter=none (disabling the default filter):

check_cpu filter=none "warn=load > 80" "crit=load > 90"
CPU Load ok
'core 0 5m kernel'=1%;10;0 'core 0 5m load'=3%;80;90 'core 1 5m kernel'=0%;10;0 'core 1 5m load'=0%;80;90 ...  'core 7 5s load'=15%;80;90 'total 5s kernel'=3%;10;0 'total 5s load'=7%;80;90

Adding kernel times to the check:

check_cpu filter=none "warn=kernel > 10 or load > 80" "crit=load > 90" "top-syntax=${list}"
core 0 > 3, core 1 > 0, core 2 > 0, core  ... , core 7 > 15, total > 7
'core 0 5m kernel'=1%;10;0 'core 0 5m load'=3%;80;90 'core 1 5m kernel'=0%;10;0 'core 1 5m load'=0%;80;90 ...  'core 7 5s load'=15%;80;90 'total 5s kernel'=3%;10;0 'total 5s load'=7%;80;90

Default check via NRPE:

check_nscp --host 192.168.56.103 --command check_cpu
CPU Load ok|'total 5m'=16%;80;90 'total 1m'=13%;80;90 'total 5s'=13%;80;90

Customizing the output syntax to include CPU load in text:

check_cpu "top-syntax=%(status): %(list)"
L        cli OK: OK: 5m: 16%, 1m: 30%, 5s: 23%

Customizing the output syntax to only show CPU load as text:

check_cpu "top-syntax=%(status): Cpu usage is %(list)" time=5m "detail-syntax=%(load) %"
L        cli OK: OK: Cpu usage is 26 %

Command-line Arguments

Option Default Value Description
filter core = 'total' Filter which marks interesting items.
warning load > 80 Filter which marks items which generates a warning state.
warn Short alias for warning
critical load > 90 Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ignored Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${problem_list} Top level syntax.
ok-syntax %(status): CPU load is ok. ok syntax.
empty-syntax Empty syntax.
detail-syntax ${time}: ${load}% Detail level syntax.
perf-syntax ${core} ${time} Performance alias syntax.
time The time to check
cores N/A This will remove the filter to include the cores, if you use filter don't use this as well.
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

Default Value: core = 'total'

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: load > 80

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: load > 90

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ignored

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${problem_list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

Default Value: %(status): CPU load is ok.

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${time}: ${load}%

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: ${core} ${time}

Filter keywords

Option Description
core The core to check (total or core ##)
core_id The core to check (total or core_##)
idle The current idle load for a given core
kernel deprecated (use system instead)
load The current load for a given core (deprecated, use total)
system The current load used by the system (kernel)
time The time frame to check
user The current load used by user applications

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_memory

Check free/used memory on the system.

Kinds of memory

There are several different kinds of memory that a computer system uses to manage data and processes. Here are the main types:

  • physical Memory (RAM): This is the actual, tangible memory chips installed in your computer. It's often referred to as RAM (Random Access Memory).
  • committed Memory: Committed memory refers to the amount of virtual memory that has been reserved by processes. When a program requests memory from the operating system, that memory is "committed." This committed memory is guaranteed to be available to the process, meaning Windows has set aside enough resources (either physical RAM or space in the page file) to back that memory.
  • virtual Memory: Virtual memory is an abstraction layer created by the operating system (Windows) to provide a larger, contiguous address space to each process than the physical RAM actually available.

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_memory
OK memory within bounds.
'page used'=8G;19;21 'page used %'=33%;79;89 'physical used'=7G;9;10 'physical used %'=65%;79;89

Using --show-all to show the result:

check_memory "warn=free < 20%" "crit=free < 10G" --show-all
page = 8.05G, physical = 7.85G
'page free'=15G;4;2 'page free %'=66%;19;9 'physical free'=4G;2;1 'physical free %'=34%;19;9

Changing the return syntax to include more information::

check_memory "top-syntax=${list}" "detail-syntax=${type} free: ${free} used: ${used} size: ${size}"
page free: 16G used: 7.98G size: 24G, physical free: 4.18G used: 7.8G size: 12G

Default check via NRPE::

check_nrpe --host 192.168.56.103 --command check_memory
OK memory within bounds.|'page'=531G;3;3;0;3 'page %'=12%;79;89;0;100 'physical'=530G;1;1;0;1 'physical %'=25%;79;89;0;100
Overriding the unit:

Most "byte" checks such as memory have an auto scaling feature which means values will go from 800M to 1.2G between checks. Some graphing systems does not honor the units in performance data in which case you can get unexpected large values (such as 800G). To remedy this you can lock the unit by adding perf-config=*(unit:G)

check_memory perf-config=*(unit:G)
page = 8.05G, physical = 7.85G
'page free'=15G;4;2 'page free %'=66%;19;9 'physical free'=4G;2;1 'physical free %'=34%;19;9

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning used > 80% Filter which marks items which generates a warning state.
warn Short alias for warning
critical used > 90% Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ignored Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${list} Top level syntax.
ok-syntax ok syntax.
empty-syntax Empty syntax.
detail-syntax ${type} = ${used} Detail level syntax.
perf-syntax ${type} Performance alias syntax.
type The type of memory to check (physical = Physical memory (RAM), committed = total memory (RAM+PAGE)
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: used > 80%

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: used > 90%

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ignored

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${type} = ${used}

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: ${type}

Filter keywords

Option Description
free Free memory in bytes (g,m,k,b) or percentages %
size Total size of memory
type The type of memory to check
used Used memory in bytes (g,m,k,b) or percentages %

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_os_updates

Check for available OS package updates via the system package manager (apt/dnf/yum/zypper/pacman).

Checking for Windows Updates

The check_os_updates command allows you to monitor for missing Windows updates via the Windows Update Agent (WUA) API. You can filter the results based on severity, reboot requirements, and other attributes.

Basic usage

To simply check if there are any pending updates:

check_os_updates

If there are any pending updates, this will return a warning state by default (because the default warning filter is count > 0).

Checking for critical updates

Often, you only want to be alerted if there are security or critical updates missing. You can configure this using the warning and critical filters:

check_os_updates "warning=important > 0" "critical=security > 0 or critical > 0"

This will return WARNING if there are updates with the 'Important' severity, and CRITICAL if there are any security updates or updates explicitly marked 'Critical'.

Checking if a reboot is required

If you want to know if the system needs a reboot after installing updates:

check_os_updates "warning=reboot_required > 0"

Customizing the output

You can use the syntax options to format the output string:

check_os_updates "top-syntax=${status}: Found ${count} missing updates. Security: ${security}, Critical: ${critical}" "detail-syntax=${titles}" show-all

Jump to section:

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning count > 0 Filter which marks items which generates a warning state.
warn Short alias for warning
critical security > 0 Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ok Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${count} updates available (${security} security) via ${manager} Top level syntax.
ok-syntax %(status): No updates available. ok syntax.
empty-syntax Empty syntax.
detail-syntax ${count} updates (${security} security) via ${manager} Detail level syntax.
perf-syntax updates Performance alias syntax.
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: count > 0

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: security > 0

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ok

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${count} updates available (${security} security) via ${manager}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

Default Value: %(status): No updates available.

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${count} updates (${security} security) via ${manager}

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: updates

Filter keywords

Option Description
manager Package manager used to query updates
packages Comma separated list of available package updates
security Number of available security updates

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_os_version

Check the version of the underlying OS.

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_os_Version
L     client CRITICAL: Windows 7 (6.1.7601)
L     client  Performance data: 'version'=61;50;50

Making sure the OS version is Windows 8:

check_os_Version "warn=version < 62"
L     client WARNING: Windows 7 (6.1.7601)
L     client  Performance data: 'version'=61;62;0

Default check via NRPE:

check_nrpe --host 192.168.56.103 --command check_os_version
Windows 2012 (6.2.9200)|'version'=62;50;50

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning Filter which marks items which generates a warning state.
warn Short alias for warning
critical Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ignored Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${list} Top level syntax.
ok-syntax ok syntax.
empty-syntax Empty syntax.
detail-syntax ${kernel_name} (${kernel_release}) Detail level syntax.
perf-syntax kernel_release Performance alias syntax.
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ignored

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${kernel_name} (${kernel_release})

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: kernel_release

Filter keywords

Option Description
kernel_name Kernel name
kernel_release Kernel release
kernel_version Kernel version
machine Machine hardware name
nodename Network node hostname
os Operating system
processor Processor type or unknown

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_pagefile

Check the size of the system pagefile(s).

Jump to section:

Sample Commands

To edit these sample please edit this page

Default options:

check_pagefile
L     client WARNING: \Device\HarddiskVolume2\pagefile.sys 24.3M (32M)
L     client  Performance data: '\??\D:\pagefile.sys'=1G;14;19;0;23 '\??\D:\pagefile.sys %'=6%;59;79;0;100 '\Device\HarddiskVolume2\pagefile.sys'=24M;19;25;0;32 '\Device\HarddiskVolume2\pagefile.sys %'=75%;59;79;0;100 'total'=1G;14;19;0;23 'total %'=6%;59;79;0;100

Only showing the total amount of pagefile usage::

check_pagefile "filter=name = 'total'" "top-syntax=${list}"
OK: total 1.66G (24G)
Performance data: 'total'=1G;14;19;0;23 'total %'=6%;59;79;0;100

Getting help on available options::

check_pagefile help
...
  filter=ARG           Filter which marks interesting items.
                       Interesting items are items which will be included in
                       the check.
                       They do not denote warning or critical state but they
                       are checked use this to filter out unwanted items.
                           Available options:
                       free          Free memory in bytes (g,m,k,b) or percentages %
                       name          The name of the page file (location)
                       size          Total size of pagefile
                       used          Used memory in bytes (g,m,k,b) or percentages %
                       count         Number of items matching the filter
                       total         Total number of items
                       ok_count      Number of items matched the ok criteria
                       warn_count    Number of items matched the warning criteria
                       crit_count    Number of items matched the critical criteria
                       problem_count Number of items matched either warning or critical criteria
...

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning used > 60% Filter which marks items which generates a warning state.
warn Short alias for warning
critical used > 80% Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ignored Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${list} Top level syntax.
ok-syntax ok syntax.
empty-syntax Empty syntax.
detail-syntax ${name} ${used} (${size}) Detail level syntax.
perf-syntax ${name} Performance alias syntax.
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: used > 60%

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: used > 80%

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ignored

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${name} ${used} (${size})

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: ${name}

Filter keywords

Option Description
free Free memory in bytes (g,m,k,b) or percentages %
name The name of the page file (swap)
size Total size of pagefile/swap
used Used memory in bytes (g,m,k,b) or percentages %

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_process

Check state/metrics of one or more of the processes running on the computer.

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_process
SetPoint.exe=hung
Performance data: 'taskhost.exe'=1;1;0 'dwm.exe'=1;1;0 'explorer.exe'=1;1;0 ... 'chrome.exe'=1;1;0 'vcpkgsrv.exe'=1;1;0 'vcpkgsrv.exe'=1;1;0 

Default check via NRPE::

check_nrpe --host 192.168.56.103 --command check_process
SetPoint.exe=hung|'smss.exe state'=1;0;0 'csrss.exe state'=1;0;0...

Check that specific process are running::

check_process process=explorer.exe process=foo.exe
foo.exe=stopped
Performance data: 'explorer.exe'=1;1;0 'foo.exe'=0;1;0

Check memory footprint from specific processes::

check_process process=explorer.exe "warn=working_set > 70m"
explorer.exe=started
Performance data: 'explorer.exe ws_size'=73M;70;0

Extend the syntax to display the attributes we are interested in::

check_process process=explorer.exe "warn=working_set > 70m" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
WARNING: Explorer.EXE ws:431.812MB, handles: 5639, user time:2535s
Performance data: 'explorer.exe ws_size'=73M;70;0

List all processes which use more then 200m virtual memory Default check via NRPE::

check_nrpe --host 192.168.56.103 --command check_process --arguments "filter=virtual > 200m"
OK all processes are ok.|'csrss.exe state'=1;0;0 'svchost.exe state'=1;0;0 'AvastSvc.exe state'=1;0;0 ...

Command-line Arguments

Option Default Value Description
filter state != 'unreadable' Filter which marks interesting items.
warning state not in ('started') Filter which marks items which generates a warning state.
warn Short alias for warning
critical state = 'stopped', count = 0 Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state unknown Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${problem_list} Top level syntax.
ok-syntax %(status): all processes are ok. ok syntax.
empty-syntax UNKNOWN: No processes found Empty syntax.
detail-syntax ${exe}=${state} Detail level syntax.
perf-syntax ${exe} Performance alias syntax.
process The process to check, set this to * to check all processes
total N/A Include the total of all matching processes
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

Default Value: state != 'unreadable'

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: state not in ('started')

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: state = 'stopped', count = 0

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: unknown

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${problem_list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

Default Value: %(status): all processes are ok.

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

Default Value: UNKNOWN: No processes found

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${exe}=${state}

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: ${exe}

Filter keywords

Option Description
command_line Command line of process
error Any error messages associated with fetching info
exe The name of the executable
filename Name of process (with path)
kernel Kernel time in seconds
page_faults Page fault count
pid Process id
started Process is started
state The current state (started, stopped, hung)
stopped Process is stopped
time User + kernel time in seconds
user User time in seconds
virtual Virtual size in bytes
working_set Working set (RSS) in bytes

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_service

Check the state of one or more of the computer services.

state_is_ok

Helper function that checks if the state of a service is "OK". It returns True if the state is "OK" and False otherwise. This can be used in filter expressions to warn about services that are not running properly.

Configured State exit_code Result of state_is_ok
auto-start running any ✅ ok
delayed auto-start stopped any ✅ ok
auto-start + triggers stopped any ✅ ok
auto-start stopped 0 ✅ ok
auto-start stopped non zero ❌ not ok
demand-start any state any ✅ ok

state_is_perfect

Helper function that checks if the state of a service is "perfect". It returns True if the state is "perfect" and False otherwise. This can be used in filter expressions to warn about services that are not running perfectly.

Configured State Result of state_is_perfect
auto-start running ✅ perfect
auto-start stopped ❌ not perfect
auto-start + triggers stopped ✅ perfect
demand-start any state ✅ perfect
disabled stopped ✅ perfect

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_service
OK all services are ok.

Excluding services using exclude::

check_service "exclude=clr_optimization_v4.0.30319_32"  "exclude=clr_optimization_v4.0.30319_64"
WARNING: gupdate=stopped (auto), Net Driver HPZ12=stopped (auto), NSClientpp=stopped (auto), nscp=stopped (auto), Pml Driver HPZ12=stopped (auto), SkypeUpdate=stopped (auto), sppsvc=stopped (auto)

Show all service by changing the syntax::

check_service "top-syntax=${list}" "detail-syntax=${name}:${state}"
AdobeActiveFileMonitor10.0:running, AdobeARMservice:running, AdobeFlashPlayerUpdateSvc:stopped, ..., WwanSvc:stopped

Excluding services using the filter::

check_service "filter=start_type = 'auto' and name not in ('Bonjour Service', 'Net Driver HPZ12')"
AdobeActiveFileMonitor10.0: running, AdobeARMservice: running, AMD External Events Utility: running,  ... wuauserv: running

Exclude versus filter::

You can use both exclude and filter to exclude services the befnefit of exclude is that it is faster with the obvious drawback that it only works on the service name. The upside to filters are that they are richer in terms of functionality i.e. substring matching (as below).

Regular check

check_service
L        cli CRITICAL: CRITICAL: nfoo=stopped (auto), nscp=stopped (auto), nscp2=stopped (auto), ...

Excluding nfoo service with exclude:

check_service exclude=nfoo
L        cli CRITICAL: CRITICAL: nscp=stopped (auto), nscp2=stopped (auto), ...

Excluding nscp2 with substring like matching filter:

check_service exclude=nfoo "filter=name not like 'nscp'"
L        cli CRITICAL: CRITICAL: ...

Default check via NRPE::

check_nrpe --host 192.168.56.103 --command check_service
WARNING: DPS=stopped (auto), MSDTC=stopped (auto), sppsvc=stopped (auto), UALSVC=stopped (auto)

Check that a service is not started::

check_service service=nscp "crit=state = 'started'" warn=none

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning not state_is_perfect() Filter which marks items which generates a warning state.
warn Short alias for warning
critical not state_is_ok() Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state unknown Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${crit_list}, delayed (${warn_list}) Top level syntax.
ok-syntax %(status): All %(count) service(s) are ok. ok syntax.
empty-syntax %(status): No services found Empty syntax.
detail-syntax ${name}=${state} (${start_type}) Detail level syntax.
perf-syntax ${name} Performance alias syntax.
service The service to check, set this to * to check all services
exclude A list of services to ignore (mainly useful in combination with service=*)
state all The state of services to enumerate: active, inactive, failed, or all
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: not state_is_perfect()

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: not state_is_ok()

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: unknown

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${crit_list}, delayed (${warn_list})

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

Default Value: %(status): All %(count) service(s) are ok.

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

Default Value: %(status): No services found

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${name}=${state} (${start_type})

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: ${name}

state:

The state of services to enumerate: active, inactive, failed, or all

Default Value: all

Filter keywords

Option Description
desc Service description
name Service name
pid Process id
start_type The configured start type (enabled, disabled, static, masked)
started Service is started/active
state The current state (active, inactive, failed)
state_is_ok() Check if the state is ok (enabled services running or starting, disabled services can be any state)
state_is_perfect() Check if the state is perfect (enabled services running, disabled services stopped)
stopped Service is stopped/inactive
sub_state Service sub-state (running, dead, exited, etc.)

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.

check_uptime

Check time since last server re-boot.

Jump to section:

Sample Commands

To edit these sample please edit this page

Default check:

check_uptime
uptime: -9:02, boot: 2013-aug-18 08:29:13
'uptime uptime'=1376814553s;1376760683;1376803883

Adding warning and critical thresholds::

check_uptime "warn=uptime < -2d" "crit=uptime < -1d"
...

Default check via NRPE::

check_nrpe --host 192.168.56.103 --command check_uptime
uptime: -0:3, boot: 2013-sep-08 18:41:06 (UCT)|'uptime'=1378665666;1378579481;1378622681

Command-line Arguments

Option Default Value Description
filter Filter which marks interesting items.
warning uptime < 2d Filter which marks items which generates a warning state.
warn Short alias for warning
critical uptime < 1d Filter which marks items which generates a critical state.
crit Short alias for critical.
ok Filter which marks items which generates an ok state.
debug N/A Show debugging information in the log
show-all N/A Show details for all matches regardless of status (normally details are only showed for warnings and criticals).
empty-state ignored Return status to use when nothing matched filter.
perf-config Performance data generation configuration
escape-html N/A Escape any < and > characters to prevent HTML encoding
help N/A Show help screen (this screen)
help-pb N/A Show help screen as a protocol buffer payload
show-default N/A Show default values for a given command
help-short N/A Show help screen (short format).
top-syntax ${status}: ${list} Top level syntax.
ok-syntax ok syntax.
empty-syntax Empty syntax.
detail-syntax uptime: ${uptime}h, boot: ${boot} (UTC) Detail level syntax.
perf-syntax uptime Performance alias syntax.
filter:

Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.

warning:

Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.

Default Value: uptime < 2d

critical:

Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.

Default Value: uptime < 1d

ok:

Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.

empty-state:

Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.

Default Value: ignored

perf-config:

Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)

top-syntax:

Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: ${status}: ${list}

ok-syntax:

ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).

empty-syntax:

Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.

detail-syntax:

Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).

Default Value: uptime: ${uptime}h, boot: ${boot} (UTC)

perf-syntax:

Performance alias syntax. This is the syntax for the base names of the performance data.

Default Value: uptime

Filter keywords

Option Description
boot System boot time
uptime Time since last boot

Common options for all checks:

Option Description
count Number of items matching the filter.
crit_count Number of items matched the critical criteria.
crit_list A list of all items which matched the critical criteria.
detail_list A special list with critical, then warning and finally ok.
list A list of all items which matched the filter.
ok_count Number of items matched the ok criteria.
ok_list A list of all items which matched the ok criteria.
problem_count Number of items matched either warning or critical criteria.
problem_list A list of all items which matched either the critical or the warning criteria.
status The returned status (OK/WARN/CRIT/UNKNOWN).
total Total number of items.
warn_count Number of items matched the warning criteria.
warn_list A list of all items which matched the warning criteria.