Opened 2 years ago

Last modified 6 months ago

#464 new defect

nsclient++ 0.3.8 and 0.3.9rc3 crash on exchange server.

Reported by: mariog Owned by: mickem
Priority: 1 Milestone: 0.4.2
Component: CheckSystem Version: 0.3.9
Severity: Bugs Keywords:
Cc:

Description (last modified by mickem)

Hello,
this began happening very often, we monitor a large number of performance counter on a windows 2008 64bits exchange 2010.

the nsclient crashes and we found this on the log:

011-06-24 00:05:28: error:modules\CheckSystem\CheckSystem.cpp:1115: ERROR: Failed to get mutex for PdhValidatePath (\MSExchangeAB\NSPI RPC Requests Average Latency|\MSExchangeAB\NSPI RPC Requests Average Latency)
2011-06-24 00:05:28: debug:NSClient++.cpp:1180: Injected Result: WARNING 'ERROR: Failed to get mutex for PdhValidatePath (\MSExchangeAB\NSPI RPC Requests Average Latency|\MSExchangeAB\NSPI RPC Requests Average Latency)'
2011-06-24 00:05:28: debug:NSClient++.cpp:1181: Injected Performance Result: ''

then the nsclient service stops.
the event viewer has this:

Log Name:      Application
Source:        Application Error
Date:          26/06/2011 23:59:52
Event ID:      1000
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MAILSRV-01.snba.be
Description:
Faulting application name: nsclient++.exe, version: 0.0.0.0, time stamp: 0x4df77982
Faulting module name: CheckSystem.dll, version: 0.0.0.0, time stamp: 0x4df77a0d
Exception code: 0x40000015
Fault offset: 0x00000000000be22e
Faulting process id: 0x26e4
Faulting application start time: 0x01cc344c51c5cf27
Faulting application path: C:\Program Files\NSClient++\nsclient++.exe
Faulting module path: C:\Program Files\NSClient++\modules\CheckSystem.dll
Report Id: 9dbe14a7-a03f-11e0-8ea3-005056aa19a5
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Application Error" />
    <EventID Qualifiers="0">1000</EventID>
    <Level>2</Level>
    <Task>100</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-06-26T21:59:52.000000000Z" />
    <EventRecordID>225116</EventRecordID>
    <Channel>Application</Channel>
    <Computer>MAILSRV-01.snba.be</Computer>
    <Security />
  </System>
  <EventData>
    <Data>nsclient++.exe</Data>
    <Data>0.0.0.0</Data>
    <Data>4df77982</Data>
    <Data>CheckSystem.dll</Data>
    <Data>0.0.0.0</Data>
    <Data>4df77a0d</Data>
    <Data>40000015</Data>
    <Data>00000000000be22e</Data>
    <Data>26e4</Data>
    <Data>01cc344c51c5cf27</Data>
    <Data>C:\Program Files\NSClient++\nsclient++.exe</Data>
    <Data>C:\Program Files\NSClient++\modules\CheckSystem.dll</Data>
    <Data>9dbe14a7-a03f-11e0-8ea3-005056aa19a5</Data>
  </EventData>
</Event>

Change History (6)

comment:1 Changed 15 months ago by mickem

  • Milestone set to 0.4.1

comment:2 Changed 10 months ago by RandyJames

Similar log information here.
Version: 0.3.9.330 64-bit

I have NSClientpp setup to restart on crash...

[crash]
archive=1
;submit=0
restart=1

But the restart doesn't happen and there are no crash dumps.

It seems there is a fault outside of the try{}

I can post more information if need be. The servers we have are all VMs under VMWare ESX. Not sure if that would be useful or not.

comment:3 Changed 10 months ago by mickem

  • Description modified (diff)

Off the top of my head I could imagine this being related to broken performance counters (I know some HP counters has caused this in the past).

Can someone confirm if it is always a given counter or a given set of counters which causes this?

comment:4 Changed 10 months ago by mickem

Notice a potential work around for this would be to externalize the counter checking and run them "outside" of NSClient++ (which can be don with both 0.3.9 and 0.4.0)

comment:5 Changed 10 months ago by RandyJames

I'm not certain that the crash is related to the NSClient++ log entries.
While we have the same entries, the logs happen far more often than the crash.
For instance...

2012-07-28 03:57:50: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: Failed to get mutex for PdhCollectQueryData
2012-07-28 03:57:56: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: Failed to get mutex for PdhCollectQueryData
2012-07-28 03:58:02: error:modules\CheckSystem\PDHCollector.cpp:215: Failed to query performance counters: Failed to get mutex for PdhCollectQueryData

But the event log for the crash is a few minutes later.

7/28/2012 4:02:07 AM

And there are plenty of other identical entries in the NSClient++ log that don't result in a crash.

Also... if there is a piece of code that could result in an uncaught exception which is known... could it be moved inside of a try that would allow NSClient++ to catch it and auto-restart while dumping the debug information? Because... the important thing is that it keeps running... right?

comment:6 Changed 6 months ago by mickem

  • Milestone changed from 0.4.1 to 0.4.2

please provide crash dump files from nsclient++

Note: See TracTickets for help on using tickets.