NSClient++ Help (#1) - NSClient stop accepting requests. (#1020) - Message List
I'm trying the newest nightly build of nsclient++, I (0,4,1,5 2012-07-12). I have a few problems but the biggest is that nsclient++ will stop randomly or return 0 bytes when check_nrpe from the nagios server. It returns
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
This is now happening maybe a few hours after the service is started.
nsclient.ini config
[/settings/default] cache allowed hosts=1 allowed hosts=nagioshost1,nagioshost2 use ssl=0 timeout=300 [/modules] CheckDisk = 1 CheckEventLog = 0 CheckExternalScripts = 1 CheckHelpers = 1 CheckSystem = 1 CheckWMI = 1 NRPEServer = 1 CauseCrashes = 1 [/settings/log] file name = nsclient.log level=trace [/settings/log/file] max size = 204800000 [/settings/NRPE/server] port=5666 timeout=300 allow arguments=1 allow nasty characters=1 performance data=1 [/settings/external scripts] timeout=300 allow arguments=1 allow nasty characters = 1 [/settings/external scripts/scripts] check_es_ok="scripts\\check_ok.bat" [/settings/external scripts/alias] check_ok=CheckOK "EVERYTHING IS OK"
Here is some output from the logs
2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:937: PROC>>> find_crashed_pids 2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:934: PROC::: pid: 1712 was hung 2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:940: PROC<<<find_crashed_pids 2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\CheckSystem\CheckSystem.cpp:940: PROC<<<enumerate_processes 2012-07-12 18:37:38: d:..\..\..\trunk\service\NSClient++.cpp:947: Result checkprocstate: OK 2012-07-12 18:37:38: d:..\..\..\..\trunk\modules\NRPEServer\handler_impl.cpp:36: Running command: CheckProcState = OK: sqlagent.exe: running 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop() 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start_write_request(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_write_response(1036) 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop() 2012-07-12 18:37:38: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop() Constant messages of 2012-07-12 17:21:23: e:..\..\..\..\trunk\modules\CheckSystem\PDHCollector.cpp:148: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return. 2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\nrpe/server/protocol.hpp:61: Accepting connection from: ::ffff:10.101.51.108 2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: start() 2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: tcp::start_read_request() 2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: handle_read_request(1036) 2012-07-12 18:37:41: e:D:\source\nscp\trunk\include\nrpe/server/protocol.hpp:91: Digester failed to parse chunk, giving up. 2012-07-12 18:37:41: d:D:\source\nscp\trunk\include\socket/connection.hpp:48: stop()
This code block i believe has the last messages when I was actually getting valid responses from the Daemon, then I was getting the constanct messages about the digest failed to parse the chunk.
Lately the server ins't crashing but whenever i run the check_nrpe command from my nagios instance all i get is
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.
Win2k8 SP2 64 bit
Any help would be appreciated. Let me know if you need more info or more logs
-
Message #2700
Never mind I went to the lastest stable version. I also disabled the CheckEvent? viewer module. This seems to have stabilize everything.
swright@…07/19/12 23:19:14 (11 months ago)-
Message #2701
I've seen problems with the check_event viewer before...on x64 systems it would randomly bring the cpu up to a 100% usage...
Never got around to really test what's going on. the x86 version doesn't have this problem so I went with that one instead...
mike2k07/20/12 08:05:21 (11 months ago)-
Message #2704
By check_event viewer I guess you mean check_eventlog ?
If so it would be interesting to understand why it eats CPU:
I have always assumed this was due to people having large logs which takes a while to process (and using active checks). IF that is the case you can use the active mechanism in conjunction with the cache to achieve the same result (ish) without the need to scan the log each time.
But if installing 32-bit version resolves the issue it seems something else is broken.
Could someone provide details on what checks causes eventlog to go berserk on x64 but not w32?
Michael Medin
mickem07/23/12 09:19:14 (11 months ago)
-
-
-
Message #2703
Sounds a bit odd... IN theory this should only happen if it gets more data then it expects (one option could be SSL versus no SSL, but 1036 sounds right so not sure what is amiss...
I shall look into to this a bit...
mickem07/23/12 09:16:07 (11 months ago)








