CheckSystem
Various system related checks, such as CPU load, process state and memory.
Enable module
To enable this module and and allow using the commands you need to ass CheckSystem = enabled to the [/modules] section in nsclient.ini:
[/modules]
CheckSystem = enabled
Queries
A quick reference for all available queries (check commands) in the CheckSystem module.
List of commands:
A list of all available queries (check commands)
| Command | Description |
|---|---|
| check_cpu | Check that the load of the CPU(s) are within bounds. |
| check_memory | Check free/used memory on the system. |
| check_os_updates | Check for available OS package updates via the system package manager (apt/dnf/yum/zypper/pacman). |
| check_os_version | Check the version of the underlying OS. |
| check_pagefile | Check the size of the system pagefile(s). |
| check_process | Check state/metrics of one or more of the processes running on the computer. |
| check_service | Check the state of one or more of the computer services. |
| check_uptime | Check time since last server re-boot. |
check_cpu
Check that the load of the CPU(s) are within bounds.
How CPU load is measured (historical buffer)
check_cpu does not measure the CPU load at the moment the check is executed. Instead, NSClient++
runs a background collector thread that samples the CPU load roughly once per second and pushes
each sample into an in-memory ring buffer. Whenever you run check_cpu the values reported are
averages computed from this buffer for one or more time windows.
The time windows are controlled by the time= option. The default is to compute three averages:
5m, 1m and 5s (which is why the default output contains rows like total 5m load,
total 1m load and total 5s load). You can override this with one or more time= arguments,
for example time=10m or time=30s time=2m.
Buffer size and configuration
The size of the historical buffer is controlled by the default buffer length setting on the
CheckSystem section. The default is 1h, meaning the last hour of samples is retained. The buffer
size puts an upper bound on the time windows you can use:
- If you ask for a window that is shorter than or equal to the buffer length, the result is the average of all samples collected during that window.
- If you ask for a window that is longer than the buffer length, the result will only cover the samples that are actually present in the buffer (effectively capped to the buffer length).
- If NSClient++ was started less time ago than the requested window, the result will only
reflect the samples collected since startup. Right after start-up
5mand1maverages will therefore be based on fewer samples than they normally would be.
If you need to check on longer windows (for example 2h or 6h) you must increase
default buffer length accordingly. Note that a larger buffer uses more memory, so only increase
it as far as you actually need.
Impact on measurements
Because every value reported by check_cpu is an average over a time window, the choice of time=
has a direct impact on what the check sees:
- Short windows (e.g.
5s,10s) are very reactive and will show short spikes in CPU load, but they also produce a lot of noise. They are useful for catching transient bursts but can also generate flapping alerts. - Medium windows (e.g.
1m,5m) are a good compromise for most monitoring use cases. They smooth out short spikes while still reacting to sustained load within a few minutes. - Long windows (e.g.
15m,1h) smooth out almost all transients and only fire when the CPU has been busy for an extended period of time. They are well suited to detecting sustained load but will be slow to react and slow to recover.
A common pattern is to combine windows, for example warning on a long window and critical on a
short one (or vice versa), so that the check both catches sustained problems and ignores brief
spikes. The default check (5m, 1m, 5s) is an example of this approach.
Because the values are averages, they will not match the instantaneous CPU load shown by tools such
as top at the moment the check is executed, and very short spikes that fall between collection
ticks may be missed entirely.
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_cpu
CPU Load ok
'total 5m load'=0%;80;90 'total 1m load'=0%;80;90 'total 5s load'=7%;80;90
Checking all cores by adding filter=none (disabling the default filter):
check_cpu filter=none "warn=load > 80" "crit=load > 90"
CPU Load ok
'core 0 5m kernel'=1%;10;0 'core 0 5m load'=3%;80;90 'core 1 5m kernel'=0%;10;0 'core 1 5m load'=0%;80;90 ... 'core 7 5s load'=15%;80;90 'total 5s kernel'=3%;10;0 'total 5s load'=7%;80;90
Adding kernel times to the check:
check_cpu filter=none "warn=kernel > 10 or load > 80" "crit=load > 90" "top-syntax=${list}"
core 0 > 3, core 1 > 0, core 2 > 0, core ... , core 7 > 15, total > 7
'core 0 5m kernel'=1%;10;0 'core 0 5m load'=3%;80;90 'core 1 5m kernel'=0%;10;0 'core 1 5m load'=0%;80;90 ... 'core 7 5s load'=15%;80;90 'total 5s kernel'=3%;10;0 'total 5s load'=7%;80;90
Default check via NRPE:
check_nscp --host 192.168.56.103 --command check_cpu
CPU Load ok|'total 5m'=16%;80;90 'total 1m'=13%;80;90 'total 5s'=13%;80;90
Customizing the output syntax to include CPU load in text:
check_cpu "top-syntax=%(status): %(list)"
L cli OK: OK: 5m: 16%, 1m: 30%, 5s: 23%
Customizing the output syntax to only show CPU load as text:
check_cpu "top-syntax=%(status): Cpu usage is %(list)" time=5m "detail-syntax=%(load) %"
L cli OK: OK: Cpu usage is 26 %
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | core = 'total' | Filter which marks interesting items. |
| warning | load > 80 | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | load > 90 | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ignored | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${problem_list} | Top level syntax. |
| ok-syntax | %(status): CPU load is ok. | ok syntax. |
| empty-syntax | Empty syntax. | |
| detail-syntax | ${time}: ${load}% | Detail level syntax. |
| perf-syntax | ${core} ${time} | Performance alias syntax. |
| time | The time to check | |
| cores | N/A | This will remove the filter to include the cores, if you use filter don't use this as well. |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
Default Value: core = 'total'
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: load > 80
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: load > 90
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ignored
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${problem_list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
Default Value: %(status): CPU load is ok.
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${time}: ${load}%
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: ${core} ${time}
Filter keywords
| Option | Description |
|---|---|
| core | The core to check (total or core ##) |
| core_id | The core to check (total or core_##) |
| idle | The current idle load for a given core |
| kernel | deprecated (use system instead) |
| load | The current load for a given core (deprecated, use total) |
| system | The current load used by the system (kernel) |
| time | The time frame to check |
| user | The current load used by user applications |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_memory
Check free/used memory on the system.
Kinds of memory
There are several different kinds of memory that a computer system uses to manage data and processes. Here are the main types:
physicalMemory (RAM): This is the actual, tangible memory chips installed in your computer. It's often referred to as RAM (Random Access Memory).committedMemory: Committed memory refers to the amount of virtual memory that has been reserved by processes. When a program requests memory from the operating system, that memory is "committed." This committed memory is guaranteed to be available to the process, meaning Windows has set aside enough resources (either physical RAM or space in the page file) to back that memory.virtualMemory: Virtual memory is an abstraction layer created by the operating system (Windows) to provide a larger, contiguous address space to each process than the physical RAM actually available.
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_memory
OK memory within bounds.
'page used'=8G;19;21 'page used %'=33%;79;89 'physical used'=7G;9;10 'physical used %'=65%;79;89
Using --show-all to show the result:
check_memory "warn=free < 20%" "crit=free < 10G" --show-all
page = 8.05G, physical = 7.85G
'page free'=15G;4;2 'page free %'=66%;19;9 'physical free'=4G;2;1 'physical free %'=34%;19;9
Changing the return syntax to include more information::
check_memory "top-syntax=${list}" "detail-syntax=${type} free: ${free} used: ${used} size: ${size}"
page free: 16G used: 7.98G size: 24G, physical free: 4.18G used: 7.8G size: 12G
Default check via NRPE::
check_nrpe --host 192.168.56.103 --command check_memory
OK memory within bounds.|'page'=531G;3;3;0;3 'page %'=12%;79;89;0;100 'physical'=530G;1;1;0;1 'physical %'=25%;79;89;0;100
Most "byte" checks such as memory have an auto scaling feature which means values will go from 800M to 1.2G between checks.
Some graphing systems does not honor the units in performance data in which case you can get unexpected large values (such as 800G).
To remedy this you can lock the unit by adding perf-config=*(unit:G)
check_memory perf-config=*(unit:G)
page = 8.05G, physical = 7.85G
'page free'=15G;4;2 'page free %'=66%;19;9 'physical free'=4G;2;1 'physical free %'=34%;19;9
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | used > 80% | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | used > 90% | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ignored | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${list} | Top level syntax. |
| ok-syntax | ok syntax. | |
| empty-syntax | Empty syntax. | |
| detail-syntax | ${type} = ${used} | Detail level syntax. |
| perf-syntax | ${type} | Performance alias syntax. |
| type | The type of memory to check (physical = Physical memory (RAM), committed = total memory (RAM+PAGE) |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: used > 80%
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: used > 90%
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ignored
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${type} = ${used}
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: ${type}
Filter keywords
| Option | Description |
|---|---|
| free | Free memory in bytes (g,m,k,b) or percentages % |
| size | Total size of memory |
| type | The type of memory to check |
| used | Used memory in bytes (g,m,k,b) or percentages % |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_os_updates
Check for available OS package updates via the system package manager (apt/dnf/yum/zypper/pacman).
Checking for Windows Updates
The check_os_updates command allows you to monitor for missing Windows updates via the Windows Update Agent (WUA) API. You can filter the results based on severity, reboot requirements, and other attributes.
Basic usage
To simply check if there are any pending updates:
check_os_updates
If there are any pending updates, this will return a warning state by default (because the default warning filter is count > 0).
Checking for critical updates
Often, you only want to be alerted if there are security or critical updates missing. You can configure this using the warning and critical filters:
check_os_updates "warning=important > 0" "critical=security > 0 or critical > 0"
This will return WARNING if there are updates with the 'Important' severity, and CRITICAL if there are any security updates or updates explicitly marked 'Critical'.
Checking if a reboot is required
If you want to know if the system needs a reboot after installing updates:
check_os_updates "warning=reboot_required > 0"
Customizing the output
You can use the syntax options to format the output string:
check_os_updates "top-syntax=${status}: Found ${count} missing updates. Security: ${security}, Critical: ${critical}" "detail-syntax=${titles}" show-all
Jump to section:
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | count > 0 | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | security > 0 | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ok | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${count} updates available (${security} security) via ${manager} | Top level syntax. |
| ok-syntax | %(status): No updates available. | ok syntax. |
| empty-syntax | Empty syntax. | |
| detail-syntax | ${count} updates (${security} security) via ${manager} | Detail level syntax. |
| perf-syntax | updates | Performance alias syntax. |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: count > 0
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: security > 0
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ok
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${count} updates available (${security} security) via ${manager}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
Default Value: %(status): No updates available.
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${count} updates (${security} security) via ${manager}
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: updates
Filter keywords
| Option | Description |
|---|---|
| manager | Package manager used to query updates |
| packages | Comma separated list of available package updates |
| security | Number of available security updates |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_os_version
Check the version of the underlying OS.
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_os_Version
L client CRITICAL: Windows 7 (6.1.7601)
L client Performance data: 'version'=61;50;50
Making sure the OS version is Windows 8:
check_os_Version "warn=version < 62"
L client WARNING: Windows 7 (6.1.7601)
L client Performance data: 'version'=61;62;0
Default check via NRPE:
check_nrpe --host 192.168.56.103 --command check_os_version
Windows 2012 (6.2.9200)|'version'=62;50;50
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | Filter which marks items which generates a warning state. | |
| warn | Short alias for warning | |
| critical | Filter which marks items which generates a critical state. | |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ignored | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${list} | Top level syntax. |
| ok-syntax | ok syntax. | |
| empty-syntax | Empty syntax. | |
| detail-syntax | ${kernel_name} (${kernel_release}) | Detail level syntax. |
| perf-syntax | kernel_release | Performance alias syntax. |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ignored
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${kernel_name} (${kernel_release})
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: kernel_release
Filter keywords
| Option | Description |
|---|---|
| kernel_name | Kernel name |
| kernel_release | Kernel release |
| kernel_version | Kernel version |
| machine | Machine hardware name |
| nodename | Network node hostname |
| os | Operating system |
| processor | Processor type or unknown |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_pagefile
Check the size of the system pagefile(s).
Jump to section:
Sample Commands
To edit these sample please edit this page
Default options:
check_pagefile
L client WARNING: \Device\HarddiskVolume2\pagefile.sys 24.3M (32M)
L client Performance data: '\??\D:\pagefile.sys'=1G;14;19;0;23 '\??\D:\pagefile.sys %'=6%;59;79;0;100 '\Device\HarddiskVolume2\pagefile.sys'=24M;19;25;0;32 '\Device\HarddiskVolume2\pagefile.sys %'=75%;59;79;0;100 'total'=1G;14;19;0;23 'total %'=6%;59;79;0;100
Only showing the total amount of pagefile usage::
check_pagefile "filter=name = 'total'" "top-syntax=${list}"
OK: total 1.66G (24G)
Performance data: 'total'=1G;14;19;0;23 'total %'=6%;59;79;0;100
Getting help on available options::
check_pagefile help
...
filter=ARG Filter which marks interesting items.
Interesting items are items which will be included in
the check.
They do not denote warning or critical state but they
are checked use this to filter out unwanted items.
Available options:
free Free memory in bytes (g,m,k,b) or percentages %
name The name of the page file (location)
size Total size of pagefile
used Used memory in bytes (g,m,k,b) or percentages %
count Number of items matching the filter
total Total number of items
ok_count Number of items matched the ok criteria
warn_count Number of items matched the warning criteria
crit_count Number of items matched the critical criteria
problem_count Number of items matched either warning or critical criteria
...
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | used > 60% | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | used > 80% | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ignored | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${list} | Top level syntax. |
| ok-syntax | ok syntax. | |
| empty-syntax | Empty syntax. | |
| detail-syntax | ${name} ${used} (${size}) | Detail level syntax. |
| perf-syntax | ${name} | Performance alias syntax. |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: used > 60%
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: used > 80%
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ignored
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${name} ${used} (${size})
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: ${name}
Filter keywords
| Option | Description |
|---|---|
| free | Free memory in bytes (g,m,k,b) or percentages % |
| name | The name of the page file (swap) |
| size | Total size of pagefile/swap |
| used | Used memory in bytes (g,m,k,b) or percentages % |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_process
Check state/metrics of one or more of the processes running on the computer.
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_process
SetPoint.exe=hung
Performance data: 'taskhost.exe'=1;1;0 'dwm.exe'=1;1;0 'explorer.exe'=1;1;0 ... 'chrome.exe'=1;1;0 'vcpkgsrv.exe'=1;1;0 'vcpkgsrv.exe'=1;1;0
Default check via NRPE::
check_nrpe --host 192.168.56.103 --command check_process
SetPoint.exe=hung|'smss.exe state'=1;0;0 'csrss.exe state'=1;0;0...
Check that specific process are running::
check_process process=explorer.exe process=foo.exe
foo.exe=stopped
Performance data: 'explorer.exe'=1;1;0 'foo.exe'=0;1;0
Check memory footprint from specific processes::
check_process process=explorer.exe "warn=working_set > 70m"
explorer.exe=started
Performance data: 'explorer.exe ws_size'=73M;70;0
Extend the syntax to display the attributes we are interested in::
check_process process=explorer.exe "warn=working_set > 70m" "detail-syntax=${exe} ws:${working_set}, handles: ${handles}, user time:${user}s"
WARNING: Explorer.EXE ws:431.812MB, handles: 5639, user time:2535s
Performance data: 'explorer.exe ws_size'=73M;70;0
List all processes which use more then 200m virtual memory Default check via NRPE::
check_nrpe --host 192.168.56.103 --command check_process --arguments "filter=virtual > 200m"
OK all processes are ok.|'csrss.exe state'=1;0;0 'svchost.exe state'=1;0;0 'AvastSvc.exe state'=1;0;0 ...
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | state != 'unreadable' | Filter which marks interesting items. |
| warning | state not in ('started') | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | state = 'stopped', count = 0 | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | unknown | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${problem_list} | Top level syntax. |
| ok-syntax | %(status): all processes are ok. | ok syntax. |
| empty-syntax | UNKNOWN: No processes found | Empty syntax. |
| detail-syntax | ${exe}=${state} | Detail level syntax. |
| perf-syntax | ${exe} | Performance alias syntax. |
| process | The process to check, set this to * to check all processes | |
| total | N/A | Include the total of all matching processes |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
Default Value: state != 'unreadable'
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: state not in ('started')
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: state = 'stopped', count = 0
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: unknown
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${problem_list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
Default Value: %(status): all processes are ok.
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
Default Value: UNKNOWN: No processes found
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${exe}=${state}
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: ${exe}
Filter keywords
| Option | Description |
|---|---|
| command_line | Command line of process |
| error | Any error messages associated with fetching info |
| exe | The name of the executable |
| filename | Name of process (with path) |
| kernel | Kernel time in seconds |
| page_faults | Page fault count |
| pid | Process id |
| started | Process is started |
| state | The current state (started, stopped, hung) |
| stopped | Process is stopped |
| time | User + kernel time in seconds |
| user | User time in seconds |
| virtual | Virtual size in bytes |
| working_set | Working set (RSS) in bytes |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_service
Check the state of one or more of the computer services.
state_is_ok
Helper function that checks if the state of a service is "OK". It returns True if the state is "OK" and False otherwise.
This can be used in filter expressions to warn about services that are not running properly.
| Configured | State | exit_code | Result of state_is_ok |
|---|---|---|---|
| auto-start | running | any | ✅ ok |
| delayed auto-start | stopped | any | ✅ ok |
| auto-start + triggers | stopped | any | ✅ ok |
| auto-start | stopped | 0 | ✅ ok |
| auto-start | stopped | non zero | ❌ not ok |
| demand-start | any state | any | ✅ ok |
state_is_perfect
Helper function that checks if the state of a service is "perfect". It returns True if the state is "perfect" and False otherwise.
This can be used in filter expressions to warn about services that are not running perfectly.
| Configured | State | Result of state_is_perfect |
|---|---|---|
| auto-start | running | ✅ perfect |
| auto-start | stopped | ❌ not perfect |
| auto-start + triggers | stopped | ✅ perfect |
| demand-start | any state | ✅ perfect |
| disabled | stopped | ✅ perfect |
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_service
OK all services are ok.
Excluding services using exclude::
check_service "exclude=clr_optimization_v4.0.30319_32" "exclude=clr_optimization_v4.0.30319_64"
WARNING: gupdate=stopped (auto), Net Driver HPZ12=stopped (auto), NSClientpp=stopped (auto), nscp=stopped (auto), Pml Driver HPZ12=stopped (auto), SkypeUpdate=stopped (auto), sppsvc=stopped (auto)
Show all service by changing the syntax::
check_service "top-syntax=${list}" "detail-syntax=${name}:${state}"
AdobeActiveFileMonitor10.0:running, AdobeARMservice:running, AdobeFlashPlayerUpdateSvc:stopped, ..., WwanSvc:stopped
Excluding services using the filter::
check_service "filter=start_type = 'auto' and name not in ('Bonjour Service', 'Net Driver HPZ12')"
AdobeActiveFileMonitor10.0: running, AdobeARMservice: running, AMD External Events Utility: running, ... wuauserv: running
Exclude versus filter::
You can use both exclude and filter to exclude services the befnefit of exclude is that it is faster with the obvious drawback that it only works on the service name. The upside to filters are that they are richer in terms of functionality i.e. substring matching (as below).
Regular check
check_service
L cli CRITICAL: CRITICAL: nfoo=stopped (auto), nscp=stopped (auto), nscp2=stopped (auto), ...
Excluding nfoo service with exclude:
check_service exclude=nfoo
L cli CRITICAL: CRITICAL: nscp=stopped (auto), nscp2=stopped (auto), ...
Excluding nscp2 with substring like matching filter:
check_service exclude=nfoo "filter=name not like 'nscp'"
L cli CRITICAL: CRITICAL: ...
Default check via NRPE::
check_nrpe --host 192.168.56.103 --command check_service
WARNING: DPS=stopped (auto), MSDTC=stopped (auto), sppsvc=stopped (auto), UALSVC=stopped (auto)
Check that a service is not started::
check_service service=nscp "crit=state = 'started'" warn=none
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | not state_is_perfect() | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | not state_is_ok() | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | unknown | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${crit_list}, delayed (${warn_list}) | Top level syntax. |
| ok-syntax | %(status): All %(count) service(s) are ok. | ok syntax. |
| empty-syntax | %(status): No services found | Empty syntax. |
| detail-syntax | ${name}=${state} (${start_type}) | Detail level syntax. |
| perf-syntax | ${name} | Performance alias syntax. |
| service | The service to check, set this to * to check all services | |
| exclude | A list of services to ignore (mainly useful in combination with service=*) | |
| state | all | The state of services to enumerate: active, inactive, failed, or all |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: not state_is_perfect()
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: not state_is_ok()
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: unknown
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${crit_list}, delayed (${warn_list})
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
Default Value: %(status): All %(count) service(s) are ok.
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
Default Value: %(status): No services found
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${name}=${state} (${start_type})
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: ${name}
state:
The state of services to enumerate: active, inactive, failed, or all
Default Value: all
Filter keywords
| Option | Description |
|---|---|
| desc | Service description |
| name | Service name |
| pid | Process id |
| start_type | The configured start type (enabled, disabled, static, masked) |
| started | Service is started/active |
| state | The current state (active, inactive, failed) |
| state_is_ok() | Check if the state is ok (enabled services running or starting, disabled services can be any state) |
| state_is_perfect() | Check if the state is perfect (enabled services running, disabled services stopped) |
| stopped | Service is stopped/inactive |
| sub_state | Service sub-state (running, dead, exited, etc.) |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |
check_uptime
Check time since last server re-boot.
Jump to section:
Sample Commands
To edit these sample please edit this page
Default check:
check_uptime
uptime: -9:02, boot: 2013-aug-18 08:29:13
'uptime uptime'=1376814553s;1376760683;1376803883
Adding warning and critical thresholds::
check_uptime "warn=uptime < -2d" "crit=uptime < -1d"
...
Default check via NRPE::
check_nrpe --host 192.168.56.103 --command check_uptime
uptime: -0:3, boot: 2013-sep-08 18:41:06 (UCT)|'uptime'=1378665666;1378579481;1378622681
Command-line Arguments
| Option | Default Value | Description |
|---|---|---|
| filter | Filter which marks interesting items. | |
| warning | uptime < 2d | Filter which marks items which generates a warning state. |
| warn | Short alias for warning | |
| critical | uptime < 1d | Filter which marks items which generates a critical state. |
| crit | Short alias for critical. | |
| ok | Filter which marks items which generates an ok state. | |
| debug | N/A | Show debugging information in the log |
| show-all | N/A | Show details for all matches regardless of status (normally details are only showed for warnings and criticals). |
| empty-state | ignored | Return status to use when nothing matched filter. |
| perf-config | Performance data generation configuration | |
| escape-html | N/A | Escape any < and > characters to prevent HTML encoding |
| help | N/A | Show help screen (this screen) |
| help-pb | N/A | Show help screen as a protocol buffer payload |
| show-default | N/A | Show default values for a given command |
| help-short | N/A | Show help screen (short format). |
| top-syntax | ${status}: ${list} | Top level syntax. |
| ok-syntax | ok syntax. | |
| empty-syntax | Empty syntax. | |
| detail-syntax | uptime: ${uptime}h, boot: ${boot} (UTC) | Detail level syntax. |
| perf-syntax | uptime | Performance alias syntax. |
filter:
Filter which marks interesting items. Interesting items are items which will be included in the check. They do not denote warning or critical state instead it defines which items are relevant and you can remove unwanted items.
warning:
Filter which marks items which generates a warning state. If anything matches this filter the return status will be escalated to warning.
Default Value: uptime < 2d
critical:
Filter which marks items which generates a critical state. If anything matches this filter the return status will be escalated to critical.
Default Value: uptime < 1d
ok:
Filter which marks items which generates an ok state. If anything matches this any previous state for this item will be reset to ok.
empty-state:
Return status to use when nothing matched filter. If no filter is specified this will never happen unless the file is empty.
Default Value: ignored
perf-config:
Performance data generation configuration TODO: obj ( key: value; key: value) obj (key:valuer;key:value)
top-syntax:
Top level syntax. Used to format the message to return can include text as well as special keywords which will include information from the checks. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: ${status}: ${list}
ok-syntax:
ok syntax. DEPRECATED! This is the syntax for when an ok result is returned. This value will not be used if your syntax contains %(list) or %(count).
empty-syntax:
Empty syntax. DEPRECATED! This is the syntax for when nothing matches the filter.
detail-syntax:
Detail level syntax. Used to format each resulting item in the message. %(list) will be replaced with all the items formated by this syntax string in the top-syntax. To add a keyword to the message you can use two syntaxes either ${keyword} or %(keyword) (there is no difference between them apart from ${} can be difficult to escape on linux).
Default Value: uptime: ${uptime}h, boot: ${boot} (UTC)
perf-syntax:
Performance alias syntax. This is the syntax for the base names of the performance data.
Default Value: uptime
Filter keywords
| Option | Description |
|---|---|
| boot | System boot time |
| uptime | Time since last boot |
Common options for all checks:
| Option | Description |
|---|---|
| count | Number of items matching the filter. |
| crit_count | Number of items matched the critical criteria. |
| crit_list | A list of all items which matched the critical criteria. |
| detail_list | A special list with critical, then warning and finally ok. |
| list | A list of all items which matched the filter. |
| ok_count | Number of items matched the ok criteria. |
| ok_list | A list of all items which matched the ok criteria. |
| problem_count | Number of items matched either warning or critical criteria. |
| problem_list | A list of all items which matched either the critical or the warning criteria. |
| status | The returned status (OK/WARN/CRIT/UNKNOWN). |
| total | Total number of items. |
| warn_count | Number of items matched the warning criteria. |
| warn_list | A list of all items which matched the warning criteria. |