NSClient++ Help (#1) - NSClient stop accepting requests. (#1020) - Message List

NSClient stop accepting requests.

I'm trying the newest nightly build of nsclient++, I (0,4,1,5 2012-07-12). I have a few problems but the biggest is that nsclient++ will stop randomly or return 0 bytes when check_nrpe from the nagios server. It returns

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

This is now happening maybe a few hours after the service is started.

nsclient.ini config

[/settings/default]
cache allowed hosts=1
allowed hosts=nagioshost1,nagioshost2
use ssl=0
timeout=300
[/modules]
CheckDisk = 1
CheckEventLog = 0
CheckExternalScripts = 1
CheckHelpers = 1
CheckSystem = 1
CheckWMI = 1
NRPEServer = 1
CauseCrashes = 1
[/settings/log]
file name = nsclient.log
level=trace
[/settings/log/file]
max size = 204800000
[/settings/NRPE/server]
port=5666
timeout=300
allow arguments=1
allow nasty characters=1
performance data=1
[/settings/external scripts]
timeout=300
allow arguments=1
allow nasty characters = 1
[/settings/external scripts/scripts]
check_es_ok="scripts\\check_ok.bat"
[/settings/external scripts/alias]
check_ok=CheckOK "EVERYTHING IS OK"

Here is some output from the logs

2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:937: PROC>>> find_crashed_pids
2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:934: PROC::: pid: 1712 was hung
2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:940: PROC<<<find_crashed_pids
2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:940: PROC<<<enumerate_processes
2012-07-12 18:37:38: d:..\..\..\trunk\service\NSClient++.cpp:947: Result checkprocstate: OK
2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\NRPEServer\handler_impl.cpp:36: Running command: CheckProcState = OK: sqlagent.exe: running
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop()
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036)
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop()
2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop()
Constant messages of
2012-07-12 17:21:23: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.
2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\nrpe/server/protocol.hpp:61: Accepting connection from: ::ffff:10.101.51.108
2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start()
2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: tcp::start_read_request()
2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_read_request(1036)
2012-07-12 18:37:41: e:D:\source\nscp\trunk\include\nrpe/server/protocol.hpp:91: Digester failed to parse chunk, giving up.
2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop()

This code block i believe has the last messages when I was actually getting valid responses from the Daemon, then I was getting the constanct messages about the digest failed to parse the chunk.

Lately the server ins't crashing but whenever i run the check_nrpe command from my nagios instance all i get is

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

Win2k8 SP2 64 bit

Any help would be appreciated. Let me know if you need more info or more logs

  • Message #2700

    Never mind I went to the lastest stable version. I also disabled the CheckEvent? viewer module. This seems to have stabilize everything.

    • Message #2701

      I've seen problems with the check_event viewer before...on x64 systems it would randomly bring the cpu up to a 100% usage...

      Never got around to really test what's going on. the x86 version doesn't have this problem so I went with that one instead...

      • Message #2704

        By check_event viewer I guess you mean check_eventlog ?

        If so it would be interesting to understand why it eats CPU:

        I have always assumed this was due to people having large logs which takes a while to process (and using active checks). IF that is the case you can use the active mechanism in conjunction with the cache to achieve the same result (ish) without the need to scan the log each time.

        But if installing 32-bit version resolves the issue it seems something else is broken.

        Could someone provide details on what checks causes eventlog to go berserk on x64 but not w32?

        Michael Medin

  • Message #2703

    Sounds a bit odd... IN theory this should only happen if it gets more data then it expects (one option could be SSL versus no SSL, but 1036 sounds right so not sure what is amiss...

    I shall look into to this a bit...

Subscriptions