NSClient++ Help (#1) - What does everybody think of this enhancement to get more "stable" counter values from NSClient ? (#607) - Message List
We're intending to use "checkCounter" within NSClient++ as a means of gathering data to replace commercial monitoring tools - however a side by side comparison of the two sets of data showed the data from NSClient was extremely volatile and a much less predictable measure of system load. For example %total CPU would be a tiny value like 0.0025% for several samples, then suddenly jump to 80%, then back to the tiny value again etc etc. In contrast, the commercial tools showed a steady 2.5% or so.
After some detailed work we reckoned the commercial tools were averaging the data over a longish period, say 5 minutes. Whereas NSClient averages data over 1 second. NSClient also makes the client wait during that 1 second period which is a problem if there are a lot of counters to collect.
So here's what we did: Change CheckSystem.cpp so that each measurement of a counter spans two client invocations. The very first call to "checkCounter" for a particular counter starts the process of collecting data but doesn't attempt to read the value - just returns zero. The next call will finish off the collection process and read the value which is then averaged over whatever the period is between the two calls. The same call will also start off a new query ready for the next call. The queries and their associated listener objects are held in a static map which will hold one entry for every counter instance queried throughout the life of the process.
The code change is below in the form of an svn diff against the 0.3.8 tag.
I only got this change working today but the data coming out looks a whole lot better already. The checkCounter command is also much faster because there is no sleep. The sleep is replaced by the delay between client calls.
It's not a big change. The main headache is that the query and listener objects have to be heap-allocated so that they stay in scope.
One thing to note: There's no attempt to implement any kind of session. If multiple users call the NRPE daemon, for example, then the counter values returned will be averaged over the time between any two calls. But that's still better than the 1 second behaviour.
$ svn diff modules/CheckSystem/ Index: modules/CheckSystem/CheckSystem.cpp =================================================================== --- modules/CheckSystem/CheckSystem.cpp (revision 274) +++ modules/CheckSystem/CheckSystem.cpp (working copy) @@ -38,6 +38,8 @@
CheckSystem gCheckSystem;
+std::map<std::wstring, CheckSystem::QueryInfo*> CheckSystem::checkCounterQueriesTable; +
/
- DLL Entry point
- @param hModule
@@ -1088,17 +1090,74 @@
return NSCAPI::returnCRIT;
}
}
- PDH::PDHQuery pdh;
- PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE> cDouble;
- pdh.addCounter(counter.data, &cDouble);
- pdh.open();
- if (bCheckAverages) {
- pdh.collect();
- Sleep(1000);
- }
- pdh.gatherData();
- pdh.close();
- double value = cDouble.getValue();
+ + Start of change by DL 28/06/2010. + To make data returned by checkCounter less volatile, start + data collection in call n and retrieve value in call n+1. + To do this we must maintain a static table of active queries + and associated listener objects. + PDH::PDHQuery pdh; + PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE> cDouble; + pdh.addCounter(counter.data, &cDouble); + pdh.open(); + if (bCheckAverages) { + pdh.collect(); + Sleep(1000); + } + pdh.gatherData(); + pdh.close(); + double value = cDouble.getValue(); + PDH::PDHQuery* ppdh=new PDH::PDHQuery; + PDH::PDHQuery& pdh=*ppdh; + PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>* pcDouble=new PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>; +double value; +if (bCheckAverages) +{ + CheckSystem::QueryInfo?* pqi= + checkCounterQueriesTable.count(counter.data) ? + checkCounterQueriesTable[counter.data] : 0; + if (pqi) + { + Have a query running already. + Collect second sample and get result... + pqi->pQuery->gatherData(); + pqi->pQuery->close(); + value = pqi->pListener->getValue(); + Delete old query and associated listener, + replace with the new ones we've just created... + delete pqi->pQuery; + delete pqi->pListener; + pqi->pQuery=ppdh; + pqi->pListener=pcDouble; + } + else + { + Don't yet have a query running for this counter. + Return a dummy value 0 and add the query and + listener we've just created to the table... + value=0; Dummy for first call + pqi=new CheckSystem::QueryInfo?; + pqi->pQuery=ppdh; + pqi->pListener=pcDouble; + checkCounterQueriesTable[counter.data]=pqi; + } + + Collect first sample in new query... + pdh.addCounter(counter.data, pcDouble); + pdh.open(); + pdh.collect(); +} +else +{ + pdh.addCounter(counter.data, pcDouble); + pdh.open(); + pdh.gatherData(); + pdh.close(); + delete ppdh; + value = pcDouble->getValue(); + delete pcDouble; +} + End of change by DL 28/06/2010.
if (bNSClient) {
if (!msg.empty())
msg += _T(",");
Index: modules/CheckSystem/CheckSystem.h =================================================================== --- modules/CheckSystem/CheckSystem.h (revision 274) +++ modules/CheckSystem/CheckSystem.h (working copy) @@ -23,6 +23,7 @@
#include <pdh.hpp> #include "PDHCollector.h" #include <CheckMemory?.h>
+#include <map>
NSC_WRAPPERS_MAIN(); NSC_WRAPPERS_CLI();
@@ -31,6 +32,12 @@
private:
CheckMemory? memoryChecker; PDHCollectorThread pdhThread;
+ struct QueryInfo? + { + PDH::PDHQuery* pQuery; + PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>* pListener; + }; + static std::map<std::wstring, QueryInfo*> checkCounterQueriesTable;
public:
typedef enum { started, stopped } states;
@@ -84,4 +91,4 @@
NSCAPI::nagiosReturn checkSingleRegEntry(const unsigned int argLen, TCHAR char_args, std::wstring &message, std::wstring &perf);
-}; \ No newline at end of file +}; $
-
Message #1947
We've been running with that change for a good while now and haven't seen any crashes or obvious memory leaks.
I suppose a better solution would be to do something like what you do for Check_CPU - starting a collector thread. That gives more control over the collection interval. The above patch is simpler I think but the collection interval is dependent on the timing between client calls (OK in our case as we usually have a single client per host).
lovedada08/25/10 11:41:37 (3 years ago) -
Message #1890
Interesting...
This is like the third patch ever, and a pretty nice one as well... I shall investigate it but if it works out I guess it makes sense...
Michael Medin
mickem07/27/10 21:22:20 (3 years ago) -
Message #1847
Oops - code change, this time with line breaks
$ svn diff modules/CheckSystem/
Index: modules/CheckSystem/CheckSystem.cpp
===================================================================
--- modules/CheckSystem/CheckSystem.cpp (revision 274)
+++ modules/CheckSystem/CheckSystem.cpp (working copy)
@@ -38,6 +38,8 @@
CheckSystem gCheckSystem;
+std::map<std::wstring, CheckSystem::QueryInfo*> CheckSystem::checkCounterQueriesTable;
+
/
- DLL Entry point
- @param hModule
@@ -1088,17 +1090,74 @@
return NSCAPI::returnCRIT;
}
}
- PDH::PDHQuery pdh;
- PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE> cDouble;
- pdh.addCounter(counter.data, &cDouble);
- pdh.open();
- if (bCheckAverages) {
- pdh.collect();
- Sleep(1000);
- }
- pdh.gatherData();
- pdh.close();
- double value = cDouble.getValue();
+
+ Start of change by DL 28/06/2010.
+ To make data returned by checkCounter less volatile, start
+ data collection in call n and retrieve value in call n+1.
+ To do this we must maintain a static table of active queries
+ and associated listener objects.
+ PDH::PDHQuery pdh;
+ PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE> cDouble;
+ pdh.addCounter(counter.data, &cDouble);
+ pdh.open();
+ if (bCheckAverages) {
+ pdh.collect();
+ Sleep(1000);
+ }
+ pdh.gatherData();
+ pdh.close();
+ double value = cDouble.getValue();
+ PDH::PDHQuery* ppdh=new PDH::PDHQuery;
+ PDH::PDHQuery& pdh=*ppdh;
+ PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>* pcDouble=new PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>;
+double value;
+if (bCheckAverages)
+{
+ CheckSystem::QueryInfo?* pqi=
+ checkCounterQueriesTable.count(counter.data) ?
+ checkCounterQueriesTable[counter.data] : 0;
+ if (pqi)
+ {
+ Have a query running already.
+ Collect second sample and get result...
+ pqi->pQuery->gatherData();
+ pqi->pQuery->close();
+ value = pqi->pListener->getValue();
+ Delete old query and associated listener,
+ replace with the new ones we've just created...
+ delete pqi->pQuery;
+ delete pqi->pListener;
+ pqi->pQuery=ppdh;
+ pqi->pListener=pcDouble;
+ }
+ else
+ {
+ Don't yet have a query running for this counter.
+ Return a dummy value 0 and add the query and
+ listener we've just created to the table...
+ value=0; Dummy for first call
+ pqi=new CheckSystem::QueryInfo?;
+ pqi->pQuery=ppdh;
+ pqi->pListener=pcDouble;
+ checkCounterQueriesTable[counter.data]=pqi;
+ }
+
+ Collect first sample in new query...
+ pdh.addCounter(counter.data, pcDouble);
+ pdh.open();
+ pdh.collect();
+}
+else
+{
+ pdh.addCounter(counter.data, pcDouble);
+ pdh.open();
+ pdh.gatherData();
+ pdh.close();
+ delete ppdh;
+ value = pcDouble->getValue();
+ delete pcDouble;
+}
+ End of change by DL 28/06/2010.
if (bNSClient) {
if (!msg.empty())
msg += _T(",");
Index: modules/CheckSystem/CheckSystem.h
===================================================================
--- modules/CheckSystem/CheckSystem.h (revision 274)
+++ modules/CheckSystem/CheckSystem.h (working copy)
@@ -23,6 +23,7 @@
#include <pdh.hpp>
#include "PDHCollector.h"
#include <CheckMemory?.h>
+#include <map>
NSC_WRAPPERS_MAIN();
NSC_WRAPPERS_CLI();
@@ -31,6 +32,12 @@
private:
CheckMemory? memoryChecker;
PDHCollectorThread pdhThread;
+ struct QueryInfo?
+ {
+ PDH::PDHQuery* pQuery;
+ PDHCollectors::StaticPDHCounterListener<double, PDH_FMT_DOUBLE>* pListener;
+ };
+ static std::map<std::wstring, QueryInfo*> checkCounterQueriesTable;
public:
typedef enum { started, stopped } states;
@@ -84,4 +91,4 @@
NSCAPI::nagiosReturn checkSingleRegEntry(const unsigned int argLen, TCHAR char_args, std::wstring &message, std::wstring &perf);
-};
\ No newline at end of file
+};
$
lovedada06/29/10 00:58:44 (3 years ago) - DLL Entry point








