NSClient++ Help (#1) - Roundtrip respone time from CheckCounter seems slower then other commands [SOLVED] (#583) - Message List

Roundtrip respone time from CheckCounter seems slower then other commands [SOLVED]

I'm trying to determine why NSClient++ takes about a second to run a CheckCounter commands. If i'm calling nsclient++ from check_nrpe the command typically takes about 1.2 seconds to run. Here are some of the response times i'm getting.

time check_nrpe -n -H myhost -c CheckCounter -a "Counter:read=\\System\\File Read Bytes/sec" OK all counters within bounds.|'read=200.124296;0;0;

real 0m1.077s user 0m0.000s sys 0m0.003s

time check_nrpe -n -H myhost -c CheckCounter -a "Counter:proc=\\LogicalDisk(D:)
% Free Space" OK all counters within bounds.|'proc'=96.959773;0;0;

real 0m1.088s user 0m0.001s sys 0m0.002s

Other commands seem to run much quicker even with SSL enabled

time check_nrpe -H myhost -c CheckCPU -a time=5m OK CPU Load ok.|'5m'=35%;0;0;

real 0m0.349s user 0m0.004s sys 0m0.001s

time check_nrpe -H myhosts -u -t 60 -c CheckWMI -a "Query:name=SELECT Name FROM Win32_PerfRawData_PerfDisk_LogicalDisk" Name=C:Name=D:Name=_Total|'name'=3;0;0;

real 0m0.238s user 0m0.003s sys 0m0.004s

Any help would be appreciated.

  • Message #1775

    Some counters require a "second to pass" to get average values, I am (simply) to lazy to treat them differently ...

    So code is like so:

    pdh.open();
    if (bCheckAverages) {
    	pdh.collect();
    	Sleep(1000);
    }
    pdh.gatherData();
    pdh.close();
    

    Michael Medin

    • Message #1776

      Might wanna clearify there is an option to disable it... but you have to add it...

      MAP_OPTIONS_BOOL_EX(_T("Averages"), bCheckAverages, _T("true"), _T("false"))
      

      Defaults to true so: Averages=false should speed things up... but again... wont always work...

      Michael Medin

      • Message #1778

        Ok. so "Defaults to true so: Averages=false should speed things up... but again... wont always work... " so "Averages=false" is this something that needs to be set in the nsc.ini

        • Message #1779

          No it is given on command line (somwhere after the -a thingy...)

          Michael Medin

          • Message #1781

            SWEET!!!! Thanks a lot!!! Did the trick.

      • Message #1812

        Can I ask what you meant by "won't always work" ?

        We're finding we sometimes get this error when using Averages=false. One one windows XP PC it's fine but on one windows 2003 PC it fails (OK on both PCs without Averages=false). Using version 0.3.7

        checkCounter "\Processor(_Total)\% Idle Time" MinCrit=-1 ShowAll Averages=false d NSClient++.cpp(1073) Injecting: checkCounter: \Processor(_Total)\% Idle Time, MinCrit=-1, ShowAll, Averages=false e \CheckSystem.cpp(1091) ERROR: \Processor(_Total)\% Idle Time: PdhGetFormattedC ounterValue failed: -1073738810: The data is not valid.

        (\Processor(_Total)\% Idle Time|\Processor(_Total)\% Idle Time)

        d NSClient++.cpp(1109) Injected Result: WARNING 'ERROR: \Processor(_Total)\% Idl e Time: PdhGetFormattedCounterValue? failed: -1073738810: The data is not valid.

        (\Processor(_Total)\% Idle Time|\Processor(_Total)\% Idle Time)'

        • Message #1813

          Humm... sounds doubtful counter should be a rate I think which % Idel Time is not, right?

          This is what I refer to:

          '"Obtaining the value of rate counters such as Page faults/sec requires that PdhCollectQueryData? be called twice, with a specific time interval between the two calls, before calling PdhGetFormattedCounterValue?. Call Sleep to implement the waiting period between the two calls to PdhCollectQueryData?."'

          Michael Medin

          • Message #1814

            Thank you for that and sorry about the excessive length of this post.

            I got there myself after some confusion. One problem is that I can't find any documentation on which counters require the two calls with a sleep in the middle and which do not.

            One comment I would make on NSClient++ is that although the windows API and indeed the PDHQuery class allow for multiple counters to be retrieved in one query, the CheckSystem::checkCounter() method always creates separate queries (with separate sleeps) even when retrieving multiple counters in one request.

            Although the scope for retrieving multiple counters in one request is severely limited by the 1024 byte limit in any case.

            There do also seem to be platform-specific differences, I ran this C++ test program on windows XP and windows 2003 trying to collect a counter value with only one call to PdhCollectQueryData?. I'm assuming for this counter it is invalid (even though it is not a rate) as on 2003 the call failed - on XP it succeeded but returned garbage (which is nasty).

            $ cat DLTest.cpp
            #include <iostream>
            #include <pdh.h>
            #using <mscorlib.dll>
            #using <System.dll>
            using namespace System;
            using namespace std;
            void main(int argc, char** argv)
            {
                    PDH_STATUS status;
                    HQUERY hQuery_;
                    if( (status = PdhOpenQuery( NULL, 0, &hQuery_ )) != ERROR_SUCCESS)
                        printf("PdhOpenQuery failedm status=%d\n", status);
                    HCOUNTER hCounter_;
                    if ((status = PdhAddCounter(hQuery_, "\\Processor(_Total)\\% Idle Time", 0, &hCounter_)) != ERROR_SUCCESS) {
                    //if ((status = PdhAddCounter(hQuery_, "\\Memory\\Available Mbytes", 0, &hCounter_)) != ERROR_SUCCESS) {
                        hCounter_ = NULL;
                        printf("PdhAddCounter failed, status=%d\n", status);
                    }
                    if (hCounter_ == NULL)
                        printf("Counter is null!\n");
                    if ((status = PdhCollectQueryData(hQuery_)) != ERROR_SUCCESS)
                        printf("PdhCollectQueryData failed: %d\n", status);
                    Sleep(1000);
                    if ((status = PdhCollectQueryData(hQuery_)) != ERROR_SUCCESS)
                        printf("PdhCollectQueryData failed: %d\n", status);
                    PDH_FMT_COUNTERVALUE data_;
                    if ((status = PdhGetFormattedCounterValue(hCounter_, PDH_FMT_DOUBLE, NULL, &data_)) != ERROR_SUCCESS) {
                        printf("PdhGetFormattedCounterValue failed, status=%d\n", status);
                    }
                    printf("Data is %f\n", data_.doubleValue);
                    if( (status = PdhCloseQuery(hQuery_)) != ERROR_SUCCESS)
                        printf("PdhCloseQuery failed, status=%d\n", status);
            }
            $
            

            Result on windows XP (correct value is close to 50%)

            Two calls plus sleep as in NSClient with "Averages=true"

            =============================================
            $ ./DLTestAvg.exe
            Data is 45.332229
            $ ./DLTestAvg.exe
            Data is 45.404375
            $ ./DLTestAvg.exe
            Data is 47.676999
            $
            $
            

            One call, no sleep as in NSClient with "Averages=false"

            =============================================
            $ ./DLTestNoAvg.exe
            Data is 0.000161
            $ ./DLTestNoAvg.exe
            Data is 0.000161
            $ ./DLTestNoAvg.exe
            Data is 0.000161
            $
            

            With two calls and a sleep, the data looks valid but with just the one call and no sleep although the call succeeds, the data is WRONG.

            Result on windows 2003 (correct value is close to 100%)

            Two calls plus sleep as in NSClient with "Averages=true"

            =============================================
            C:\Documents and Settings\Administrator>DLTestAvg.exe
            Data is 96.872520
            C:\Documents and Settings\Administrator>DLTestAvg.exe
            Data is 95.310060
            C:\Documents and Settings\Administrator>DLTestAvg.exe
            Data is 90.622680
            C:\Documents and Settings\Administrator>
            C:\Documents and Settings\Administrator>
            

            One call, no sleep as in NSClient with "Averages=false"

            =============================================
            C:\Documents and Settings\Administrator>DLTestNoAvg.exe
            PdhGetFormattedCounterValue failed, status=-1073738810
            Data is 0.000000
            C:\Documents and Settings\Administrator>DLTestNoAvg.exe
            PdhGetFormattedCounterValue failed, status=-1073738810
            Data is 0.000000
            C:\Documents and Settings\Administrator>
            
            • Message #1815

              Ok, well at least we know what is the cause then...

              Michael Medin

              • Message #1824

                Yes.

                Ideally, using the NRPE listener I'd be able to run one checkCounter command passing a list of say 100 counters over the socket and have NSClient++ collect the first pass for all 100, then sleep for 1 second, then collect the second pass for all 100, then return all the data in one big response.

                We're currently collecting 20 or so counters once every 5 minutes, sending them over one at a time and having to spend at least 20 seconds doing it.

                • Message #1825

                  That is doable I guess...

                  Please add this "feature request" as a ticket. As for the length of "NRPE" you could use alias to get around it... but! (big but here) internally NSClient++ also has limitations.

                  Michael Medin

                  • Message #1827

                    Please see 387 and 388

                    • Message #1828

                      Not sure how alias helps BTW - we seem to be using it already but the issue is the packet size I think ?

                      • Message #1829

                        aliases (in nscp.ini) can be used to make aliases from nrpe commands to longer commands but that means you have to configure the counters in nsc.ini

                        Michael Medin

                        • Message #1830

                          I see what you're saying. I didn't think about the length of the command. I think for us it is the length of the data being returned that is the problem (exact commands and messages being used on ticket 388).

                          Thanks for your help

                          • Message #1834

                            Yes... that will (also be a problem, one I did not think of :)... *sigh* NRPE really is crap...

Subscriptions