Version 8 (modified by mickem, 4 years ago) (diff)

--

Using NSClient++ from nagios with check_nrpe

NRPE is the preferred way over NSClient (check_nt) and you get the most out of NSClient++ choosing this mode (NSCA and what not will support the same commands but are more complex to setup). NRPE works much like NRPE for unix (if you are familiar with it) and in short you can say it relays a plugin request to a remote server. NRPE acts like a simple transport layer allowing remote execution. The difference between regular NRPE and NSClient++ is that NSClient++ has built-in checks. So with NSClient++ you get a lot of ready-to-use checks that wont require you to have scripts. But if you choose you can disable all "modules" and stick with a pure NRPE installation and only external scripts.

1. Overview of NRPE

For those not familiar with NRPE (Nagios Remote Plugin Execution) here is a quick introduction.

NRPE works much like SSH or telnet etc. It relays a command and awaits the result. In the above diagram what happens is:

  1. Nagios executes check_nrpe with the proper arguments.
  2. NSClient++ receives the command to be executed
  3. NSClient++ will execute the command and get a result on the form of <status>, <message> and optionally <performance data>
  4. NSClient++ sends the result back to Nagios
  5. Nagios gets the result from check_nrpe (and uses it much like any other plugin)

So in essence NRPE is merely a transport mechanism to send the result of a check command over the network.

2. Nagios command line

NRPE require you to install a special plug-in on your nagios server called NRPE. The unix-side of NRPE consists of a server and a client on nagios you only need the client so you can skip any "servers" or what not that it want to start when you install it.

The client is (generally) called check_nrpe and works like so:

./check_nrpe -H <nsclient++ server ip> -c <command> [-a <a> <list> <of> <arguments>]
  • <command> = The command (script) you want to run (often this is a pre-built command from within NSClient++)
  • <a> <list> <of> <arguments> = a list of arguments for the command.

So the simplest way to see if things are a-working just run it without a command and you should get a response specifying the version of "NRPE" (in this case NSClient++) like so:

./check_nrpe -H <nsclient++ server ip>
I (0.3.3.19 2008-07-02) seem to be doing fine...

And again like in the NSClient example don't worry if you get a timeout here since we have to configure NSClient++ before it actually works so this is expected.

3. NSClient++ configuration

Configuring NRPE is a bit more involved but not overly so. The first thing you need to do to get things working is add the NRPEListener module.

[modules]
...
NRPEListener.dll
...

If you have not already done so (above) you also need to set which computers are allowed to query the agent. This is set either under the [Settings] section (globally) or under the [NRPE] section (locally). If you when you configured NSClient above set this globally you are already set to go. If not the key you need to change is the allowed_hosts. There is no password for NRPE.

  • allowed_hosts = A list of addresses that is allowed to ask questions (i.e. your nagios ip).

The result should look like this (assuming your nagios server ip address is 10.0.0.2):

[Settings]
allowed_hosts=10.0.0.2

After this restart the service.

nsclient++ /stop
nsclient++ /start
... or ...
net stop nsclientpp
net start nsclientpp

Now feel free to try the command line agent again and hopefully things should work out perfectly. Run the following command from your nagios server.

./check_nrpe -H 10.0.0.1
I (0.3.3.19 2008-07-02) seem to be doing fine...

4. Finding and solving problems

A good way to find and solve problems is to run nsclient++ in "test" mode this is done by stopping the service and starting it in "test" mode.

nsclient++ /stop
nsclient++ /test
... test mode ... (quit with: exit)
nsclient++ /start

When in test mode you will get a lot of interesting log messages when things are happening so it is fairly simple to figure out what is wrong. So lets try this now: Start NSClient++ in test mode like so:

nsclient++ /stop
nsclient++ /test

And you should see something along the following lines (it will look different depending on your setup):

Launching test mode - client mode
d NSClient++.cpp(1106) Enabling debug mode...
d NSClient++.cpp(494) Attempting to start NSCLient++ - 0.3.7.7 2009-07-05
d NSClient++.cpp(897) Loading plugin: NRPE server (w/ SSL)...
d \NRPEListener.cpp(91) Loading all commands (from NRPE)
d \NRPEListener.cpp(121) Starting NRPE socket...
l NSClient++.cpp(600) NSCLient++ - 0.3.7.7 2009-07-05 Started!
d \Socket.h(675) Bound to: 0.0.0.0:5666
l NSClient++.cpp(402) Using settings from: INI-file
l NSClient++.cpp(403) Enter command to inject or exit to terminate...

Now you can run the the command again from Nagios like so:

./check_nrpe -H 10.0.0.1
I (0.3.7.7 2009-07-05) seem to be doing fine...

And if you check the log of NSClient++ /test you will this time not see anything and this is because the "check version" is an internal command so lets try with something slightly more interesting:

./check_nrpe -H 10.0.0.1 -c foobar
UNKNOWN: No handler for that command

And don't worry there is no foobar command but we will see how this looks in NSClient++

d NSClient++.cpp(1034) Injecting: foobar:
l NSClient++.cpp(1085) No handler for command: 'foobar'
l \NSCHelper.cpp(238) No handler for command 'foobar'.

We shall get back a bit to this later on when we have configure NSClient++ more so lets leave this for now.

5. NSClient++ configuration (revisited)

As we said before it is a bit more involved to configure NRPE and yet thus far it has actually been simpler? This is because we have not configured anything yet all we can do now is talk to NSClient++ but not actually use it. So in this section we shall cover the basics and first off are some of the configuration options available for NRPE

5.1 NRPE specific setting in NSClient++

  • use_ssl

If this is 1 (true) we will use SSL encryption on the transport. Notice this flag has to be the same on both ends or you will end up with strange errors. The flag is set on check_nrpe with the -n option (if you use -n no SSL will be used).

  • allow_arguments

Since arguments can be potentially dangerous (it allows your users to control the execution) there is a flag (which defaults to off) to enable arguments. So if you plan on configure NSClient++ from the Nagios end you need to enable this. But be warned this is a security issue you need to think about. If you do not want to allow arguments you can instead configure all checks in the NSC.ini file and just execute the aliases from nagios.

One important issue with the allow_arguments is that there are more then one! Yes, more then one! The reason for this is that you can allow arguments from NRPE and you can allow arguments for external scripts (it is not the same option) which might seem a bit confusing at first.

  • allow_nasty_meta_chars

This flag allows arguments to contain "dangerous" characters such as redirection and pipe (<>|) and makes things a tad more dangerous. But if you decide to use arguments you most likely want to use this flag as well. But again this is a security risk

So this if you enable this in the INI file you will end up with something like this (extract):

[NRPE]
;# COMMAND ARGUMENT PROCESSING
;  This option determines whether or not the NRPE daemon will allow clients to specify arguments to commands that are executed.
allow_arguments=1
;
;# COMMAND ALLOW NASTY META CHARS
;  This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow_nasty_meta_chars=1
;
;# USE SSL SOCKET
;  This option controls if SSL should be used on the socket.
use_ssl=1

There are a lot of other options as well but these are the most used ones.

5.2 Modules

The other thing which you should configure is which modules to use. There is (at time of writing) 16 modules to choose from of which 9 will give you more "checks to run" so choosing which you need can be a bit of work. Here we shall start out with the basic ones and for details on the rest check out the Modules section in the wiki.

ModuleDescriptionCommands
CheckSystem.dllHandles many system checksCheckCPU, CheckMEM etc
CheckDisk.dllHandles Disk related checksCheckDisk
CheckExternalScripts.dllHandles aliases (which is what we will use) and external scripts.N/A
FileLogger.dllLogs errors to a file so you can see what is going onN/A
NRPEListener.dllListens and responds to incoming requests from Nagios via NRPEN/A

The finished modules section from the INI file will look like so:

[modules]
CheckSystem.dll
CheckDisk.dll
CheckExternalScripts.dll
FileLogger.dll
NRPEListener.dll

Now we have done some basic setup of NSClient++ and we can continue to try using it a bit more before we continue with configuring Nagios.

6. Nagios command line (revisited)

Now that we have the agent up and running (if not probably want to go back over the previous sections to get it up and running before reading on) what can we do with it?. From here on we will assume you have allow arguments and metchars enabled since it makes it simpler to try things out BEWARED that there are security implication to this so might wanna read up before rolling this configuration into production.

As we stated before check_nrpe is a lot more powerful then the legacy check_nt and there is a lot of built in commands as well as a lot of external ones you can use. The built in ones are listed below. ListTagged(check)? Lets start with a simple one CheckCPU and see how to use it.

If we check the docs for it it has an example like so:

checkCPU warn=80 crit=90 time=20m time=10s time=4
CPU Load ok.|'20m average'=11%;80;90; '10s average'=7%;80;90; '4 average'=10%;80;90;

Now this is a "NSCLient++ /test mode command" so it is not usable in it self for you instead you need to change it slightly. The first word is the command and the rest are arguments. check_nrpe has two options for settings commands (-c) and arguments (-a) and is used like so:

check_nrpe ... -c <command> [-a <argument> <argument> <argument>]

in this case (CheckCPU) this translates to:

check_nrpe ... -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4
CPU Load ok.|'20m average'=11%;80;90; '10s average'=7%;80;90; '4 average'=10%;80;90;

And that is as hard as it gets all you need to do is figure out which arguments you want to use for the command and stack them all in a long line.

7. Nagios configuration

7.1 Template

First, its best practice to create a new template for each different type of host you'll be monitoring. Let's create a new template for windows servers.

define host{
	name					tpl-windows-servers ; Name of this template
	use						generic-host ; Inherit default values
	check_period			24x7
	check_interval			5
	retry_interval			1
	max_check_attempts		10
	check_command			check-host-alive
	notification_period		24x7
	notification_interval	30
	notification_options	d,r
	contact_groups			admins
	register				0 ; DONT REGISTER THIS - ITS A TEMPLATE
}

Notice that the tpl-windows-servers template definition is inheriting default values from the generic-host template, which is defined in the sample localhost.cfg file that gets installed when you follow the Nagios quickstart installation guide.

7.2 Host definition

Next we need to define a new host for the remote windows server that references the newly created tpl-windows-servers host template.

define host{
	use			tpl-windows-servers ; Inherit default values from a template
	host_name	windowshost ; The name we're giving to this server
	alias		My First Windows Server ; A longer name for the server
	address		10.0.0.2 ; IP address of the server
}

Defining a service for monitoring the remote Windows server. These example service definitions will use the sample commands that are defined in the default NSC.ini file which ships with NSClient 0.3.7 or newer.

7.3 Service definitions

The following service will monitor the CPU load on the remote host. The "alias_cpu" argument which is passed to the check_nrpe command definition tells NSClient++ to run the "alias_cpu" command as defined in the alias section of the NSC.ini file.

define service{
	use					generic-service
	host_name			windowshost 
	service_description	CPU Load
	check_command		check_nrpe!alias_cpu
}

The following service will monitor the free drive space on /dev/hda1 on the remote host.

define service{
	use					generic-service
	host_name			windowshost 
	service_description	Free Space
	check_command		check_nrpe!alias_disk
}

Attachments (5)

Download all attachments as: .zip