Opened 18 months ago

Last modified 14 months ago

#488 new defect

UNKNOWN Status when Monitoring Windows Clock Sync.

Reported by: BiFo Owned by: MickeM
Priority: 1 Milestone: future
Component: Core Version: 0.3.9
Severity: Bugs Keywords:
Cc:

Description

Hi,
Im running Nagios Core Version 3.2.3 on a Red Hat Enterprise Linux Server release 5.4 (Tikanga).

I created a script (VBScript) to monitor Window's date and time. As i understand, the Nagios Plugins exit status are the following:
0 = OK
1 = Warning
2 = Critical
3 = Unknown
My script never return 3 as exit status, it only return 0, 1 or 2 (depending on the result of it, cause it checks if its syncronized against another server).

The scripts works fine all the time, however, in random moments (approximately once every 3 or 4 hours) nagios report me things like this:

[06-10-2011 04:11:00] SERVICE ALERT: MyServer;Win Clock;OK;SOFT;2;NTP OK: Offset 0 secs.
[06-10-2011 04:10:00] SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 0 secs.

The code of my script:

Option Explicit
On Error Resume Next
Dim strCommand, objProc, objShell, input, strOutput, myRegExp, myMatches, myMatch, Status, result, Args, warn, crit, serverlist, criteria, i
Set Args = WScript.Arguments
criteria="least"
i=0
Err = 0

' Funcion que captura los argumentos.
Function CaptureArguments()
   for i=0 to WScript.Arguments.Count-1
      Select Case Args.Item(i)
         Case "-H" , "-h", "--help", "--HELP", "--Help", "/?", "/Help", "/HELP", "/help"
            ShowHelp
            Salir
         Case "-R", "-r", "--readme", "--README", "--ReadMe", "/r", "/R", "/readme", "/README", "/ReadMe"
            ShowReadMe
            Salir
         Case "-V", "-v", "--version", "--VERSION", "--Version", "/V", "/v", "/Version", "/VERSION", "/version"
            ShowVersion
            Salir
         Case "-S", "-s", "--serverlist", "--SERVERLIST", "--ServerList", "/S", "/s", "/ServerList", "/SERVERLIST", "/ServerList"
            CheckNextParameter
            serverlist = Args.Item(i+1)
            i=i+1
         Case "-W", "-w", "--warn", "--WARN", "--Warn", "--warning", "--WARNING", "--Warning", "/W", "/w", "/warn", "/WARN", "/Warn", "/warning", "/WARNING", "/Warning"
            CheckNextParameter
            warn = Args.Item(i+1)
            i=i+1
         Case "-C", "-c", "--crit", "--CRIT", "--Crit", "--critical", "--CRITICAL", "--Critical", "/C", "/c", "/crit", "/CRIT", "/Crit", "/critical", "/CRITICAL", "/Critical"
            CheckNextParameter
            crit = Args.Item(i+1)
            i=i+1
         Case "-B", "-b", "/B", "/b"
            criteria="biggest"
         Case Else
            wscript.echo "ERROR: Se ingreso un argumento Desconocido o Inesperado. Utilice el Help (/H) o ReadMe (/R)."
            Err = 1
            Salir
      End Select
   Next
End Function

' Funcion que verifica la correctitud del proximo parametro utilizado.
Function CheckNextParameter()
   ' Existe el Argumento para el Parametro?
   if (i+1 > WScript.Arguments.Count-1) then
      ' No, muestro error y salgo.
      WScript.echo "ERROR: Falta argumento en parametro " & Args.Item(i) & "."
      Err = 1
      Salir
   end if
   ' El argumento del parametro contiene un "-" o un "/" (indica el inicio de un nuevo parametro)?
   if instr(Args.Item(i+1),"-") OR instr(Args.Item(i+1),"/") then
      ' Si, muestro error y salgo.
      WScript.echo "ERROR: El argumento " & Args.Item(i+1) & " es invalido para el parametro " & Args.Item(i) & "."
      Err = 1
      Salir
   end if
End Function

' Funcion que verifica la correctitud de los Argumentos.
Function CheckArguments()
   ' Esta definido el parametro warn?
   If IsNull(warn) OR (warn = "") Then
      WScript.echo "ERROR: El parametro warn (/W) no esta definido."
      Err = 1
      Salir   
   End If
   ' Esta definido el parametro crit?
   If IsNull(crit) OR (crit = "") Then
      WScript.echo "ERROR: El parametro crit (/C) no esta definido."
      Err = 1
      Salir
   End If
   ' Esta definido el parametro serverlist?
   If IsNull(serverlist) OR (serverlist = "")  Then
      WScript.echo "ERROR: El parametro serverlist (/S) no esta definido."
      Err = 1
      Salir
   End If
   ' El argumento warn es numericos?
   If IsNumeric(warn) Then
      ' Si, El argumento warn es un numero entero (verifico si contiene un ".") mayor que 0?
      If (not instr(warn,".") = 0) OR (warn <= 0) Then
         ' No, muestro error y salgo.
         WScript.Echo "ERROR: El argumento warn (" & warn & ") no es un numero entero mayor a 0."
         Err = 1
         Salir
      Else
         ' Si, lo convierto a Double.
         warn=CDbl(warn)
      End If
   Else
      ' No, muestro error y salgo.
      WScript.Echo "ERROR: El argumento warn (" & warn & ") no es numerico."
      Err = 1
      Salir
   End if
   ' El argumento crit es numericos?
   If IsNumeric(crit) Then
      ' Si, El argumento crit es un numero entero (verifico si contiene un ".") mayor que 0?
      If (not instr(crit,".") = 0) OR (crit <= 0) Then
         ' No, muestro error y salgo.
         WScript.Echo "ERROR: El argumento crit (" & crit & ") no es un numero entero mayor a 0."
         Err = 1
         Salir
      Else
         ' Si, lo convierto a Double.
         crit=CDbl(crit)    
      End If
   Else
      ' No, muestro error y salgo.
      WScript.Echo "ERROR: El argumento crit (" & crit & ") no es numerico."
      Err = 1
      Salir
   End if
   ' El argumento crit es mayor que el argumento warn?
   If crit < warn then
      ' No, muestro error y salgo.
      WScript.Echo "ERROR: El argumento crit (" & crit & ") es menor que el argumento warn (" & warn & ")"
      Err = 1
      Salir
   End if
End Function

' Funcion que muestra el HELP.
Function ShowHelp()
   Wscript.Echo ""
   Wscript.Echo "Parametros      Descripcion"
   Wscript.Echo "/S serverlist       Uno o varios servidores separados por coma."
   Wscript.Echo "/W warn       Warning offset en segundos."
   Wscript.Echo "/C crit       Critical offset en segundos."
   Wscript.Echo "/B         Si varios servidores NTPd son especificados, se utilizara el de mayor offset." 
   Wscript.Echo ""
End Function

' Funcion que muestra la ultima version.
Function ShowVersion()
   WScript.Echo "*****"
   WScript.Echo " Modificado por: Fabian Olender"
   WScript.Echo " Version: 1.2"
   WScript.Echo " Fecha: 13/01/2011"
   WScript.Echo " Descripcion: Traduccion al Español y arreglo de BUGs en casos que no funcionaba."
   WScript.Echo "*****"
End Function

' Funcion que muestra el ReadMe.
Function ShowReadMe()
   WScript.Echo ""
   WScript.Echo " ----- "
   WScript.Echo "| Uso |"
   WScript.Echo " ----- "
   WScript.Echo "Uso: "
   WScript.Echo "   cscript /NoLogo check_time.vbs serverlist warn crit [biggest]"
   WScript.Echo "Ejemplos de Uso: "
   Wscript.Echo "   cscript /NoLogo check_time.vbs Server1,Server2 0.4 5 biggest"
   Wscript.Echo "   cscript /NoLogo check_time.vbs www.xxx.yyy.zzz 10 120"
   Wscript.Echo "   cscript /T:30 /NoLogo check_time.vbs TuDominio.net 120 300"
   WScript.Echo ""
   WScript.Echo " ------------ "
   WScript.Echo "| Parametros |"
   WScript.Echo " ------------ "
   Wscript.Echo ""
   Wscript.Echo "-H , -h, --help, --HELP, --Help, /?, /Help, /HELP, /help"
   Wscript.Echo "   Default: Sin Asignar"
   Wscript.Echo "   Descripcion: Muestra el Help del Script para conocer su funcionamiento."
   Wscript.Echo ""
   Wscript.Echo "-R, -r, --readme, --README, --ReadMe, /r, /R, /readme, /README, /ReadMe"
   Wscript.Echo "   Default: Sin Asignar"
   Wscript.Echo "   Descripcion: Muestra el ReadMe que se esta viendo ahora."
   Wscript.Echo ""
   Wscript.Echo "-V, -v, --version, --VERSION, --Version, /V, /v, /Version, /VERSION, /version"
   Wscript.Echo "   Default: Sin Asignar"
   Wscript.Echo "   Descripcion: Muestra la version actual del Script."
   Wscript.Echo "-S, -s, --serverlist, --SERVERLIST, --ServerList, /S, /s, /ServerList, /SERVERLIST, /ServerList"
   Wscript.Echo ""
   Wscript.Echo "   Default: Sin Asignar. Requerido para la ejecucion del Script."
   Wscript.Echo "   Descripcion: Listado de servidores con NTPd."
   Wscript.Echo ""
   Wscript.Echo "-W, -w, --warn, --WARN, --Warn, --warning, --WARNING, --Warning, /W, /w, /warn, /WARN, /Warn, /warning, /WARNING, /Warning"
   Wscript.Echo "   Default: Sin Asignar. Requerido para la ejecucion del Script."
   Wscript.Echo "   Descripcion: Tiempo en Segundos a partir de generar Warning de Dessincronizacion."
   Wscript.Echo "   Requerimientos: Valor Entero mayor que 0."
   Wscript.Echo ""
   Wscript.Echo "-C, -c, --crit, --CRIT, --Crit, --critical, --CRITICAL, --Critical, /C, /c, /crit, /CRIT, /Crit, /critical, /CRITICAL, /Critical"
   Wscript.Echo "   Default: Sin Asignar. Requerido para la ejecucion del Script."
   Wscript.Echo "   Descripcion: Tiempo en Segundos a partir de generar Warning de Dessincronizacion."
   Wscript.Echo "   Requerimientos: Valor Entero mayor que 0."
   Wscript.Echo ""
   Wscript.Echo "-B, -b, /B, /b"
   Wscript.Echo "   Default: Desactivado. Se utiliza por defecto el de Mayor Offset. Parametro Opcional"
   Wscript.Echo "   Descripcion: Si varios servidores NTPd son especificados, se utilizara el de mayor offset."
   Wscript.Echo ""
   WScript.Echo " ----------- "
   WScript.Echo "| Versiones |"
   WScript.Echo " ----------- "
   WScript.Echo " Modificado por: Freddy"
   WScript.Echo " Version: 1.1"
   WScript.Echo " Fecha: 02/12/2010"
   WScript.Echo " Descripcion: Modificacion para mejorar performance de pnp4nagios addon y sugerencias para arreglar ciertos casos de mal funcionamiento."
   WScript.Echo ""
   WScript.Echo " Modificado por: Dmitry Vayntrub (dvayntrub@yahoo.com)"
   WScript.Echo " Version: 1.01"
   WScript.Echo " Fecha: 17/01/2010"
   WScript.Echo " Descripcion: Modificacion general y renombre del script."
   WScript.Echo ""
   WScript.Echo " Creado por: Mattias Ryrlén (check_ad_time.vbs)"
   WScript.Echo " Version: 1.0"
   WScript.Echo " Fecha: -"
   WScript.Echo " Descripcion: Verifica el time offset de un cliente Windows contra uno o multiples servidor NTPd."
   WScript.Echo ""
End Function

' Funcion que ejecuta el comando W32TM y parsea el Output.
Function W32TM()
   ' Ejecuto el comando w32time.exe
   Set objShell = CreateObject("Wscript.Shell")
   strCommand = "%SystemRoot%\System32\w32tm.exe /monitor /computers:" & serverlist
   set objProc = objShell.Exec(strCommand)
   ' Parseo el Output para obtener unicamente el NTP offset.
   input = ""
   strOutput = ""
   Do While Not objProc.StdOut.AtEndOfStream
      input = objProc.StdOut.ReadLine
      If InStr(input, "NTP") Then
         strOutput = strOutput & input
      End If
   Loop
   Set myRegExp = New RegExp
   myRegExp.IgnoreCase = True
   myRegExp.Global = True
   myRegExp.Pattern = " NTP: ([+-][0-9]+\.[0-9]+)s"
   Set myMatches = myRegExp.Execute(strOutput)
   result = ""
   If myMatches(0).SubMatches(0) <> "" Then
      result = myMatches(0).SubMatches(0)
   End If
   For Each myMatch in myMatches
      If myMatch.SubMatches(0) <> "" Then
         If criteria = "biggest" Then
            If abs(result) < Abs(myMatch.SubMatches(0)) Then
               result = myMatch.SubMatches(0)
            End If
         Else
            If abs(result) > Abs(myMatch.SubMatches(0)) Then
               result = myMatch.SubMatches(0)
            End If
         End If
      End If
   '   Wscript.Echo myMatch.SubMatches(0) & " -debug"
   Next
End Function

' Funcion que muestra el resultado del Script con el formato adecuado.
Function ShowResult()
   ' Quito lo que esta despues del ".", quito el signo ("+" o "-") y obtengo el numero.
   result=Left(result,instr(result,".")-1)
   result=Right(result,Len(result)-1)
   result=CDbl(result)
   If (result > 0 AND result > crit ) OR (result < 0 AND result < crit) Then
      Err = 2
      status = "CRITICAL"
   Else
      If (warn > 0 AND result > warn ) OR (warn < 0 AND result < warn) Then
         Err = 1
         status = "WARNING"
      else
         Err = 0
         status = "OK"      
      End If
   End If
   WScript.Echo "NTP " & status & ": Offset " & result & " secs."
   Salir
End Function

Function Salir()
   WScript.Quit(Err)
End Function

' ########
' # Main #
' ########
CaptureArguments
CheckArguments
W32TM
ShowResult
Salir

How to use it with command line (DOS):

cscript.exe //T:30 //NoLogo check_time.vbs /S www.xxx.yyy.zzz /W 120 /C 300
  • www.xxx.yyy.zzz = NTP Server

"NSC.ini":

...
[External Scripts]
check_windows_time=cscript.exe //T:30 //NoLogo check_time.vbs /S 10.45.225.11 /W 120 /C 300
...
;[LUA Scripts]
command[check_windows_time]=cscript.exe //T:30 //NoLogo check_time.vbs /S 10.45.225.11 /W 120 /C 300
...

Service definition in Nagios Server:

define service{
           use                  generic-service
           host_name            MyServer
           service_description  Win Clock
           check_command        check_nrpe!check_windows_time
}

I've set "nagios.cfg" to the max debug level ("debug_level=-1" and "debug_verbosity=2") and what i've saw in "nagios.debug" for the "Win Clock" service i've defined in Host "MyServer? " when and "Unknown" result appeats is:

[1307617373.175579] [016.2] [pid=377] Found a check result (#4) to handle...
[1307617373.175615] [016.1] [pid=377] Handling check result for service 'Win Clock' on host 'MyServer'...
[1307617373.175636] [001.0] [pid=377] handle_async_service_check_result()
[1307617373.175657] [016.0] [pid=377] ** Handling check result for service 'Win Clock' on host 'MyServer'...
[1307617373.175676] [016.1] [pid=377] HOST: MyServer, SERVICE: Win Clock, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 3, OUTPUT: NTP OK: Offset 1 secs.\n
[1307617373.175755] [016.2] [pid=377] Parsing check output...
[1307617373.175776] [016.2] [pid=377] Short Output: NTP OK: Offset 1 secs.
[1307617373.175822] [016.2] [pid=377] Long Output:  NULL
[1307617373.175840] [016.2] [pid=377] Perf Data:    NULL
[1307617373.175858] [016.2] [pid=377] ST: HARD  CA: 1  MA: 3  CS: 3  LS: 0  LHS: 0
[1307617373.175886] [016.2] [pid=377] Service has changed state since last check!
[1307617373.175918] [016.1] [pid=377] Service is in a non-OK state!
[1307617373.175939] [016.1] [pid=377] Host is currently UP, so we'll recheck its state to make sure...
[1307617373.175959] [001.0] [pid=377] run_async_host_check_3x()
[1307617373.175976] [016.0] [pid=377] ** Running async check of host 'MyServer'...
[1307617373.175996] [001.0] [pid=377] check_host_check_viability_3x()
[1307617373.176017] [001.0] [pid=377] check_time_against_period()
[1307617373.176043] [001.0] [pid=377] check_host_dependencies()
[1307617373.176065] [064.1] [pid=377] Making callbacks (type 14)...
[1307617373.176101] [064.2] [pid=377] Callback #1 (type 14) return code = 0
[1307617373.176122] [016.0] [pid=377] Checking host 'MyServer'...
[1307617373.176141] [001.0] [pid=377] adjust_host_check_attempt_3x()
[1307617373.176161] [016.2] [pid=377] Adjusting check attempt number for host 'MyServer': current attempt=1/10, state=0, state type=1
[1307617373.176230] [016.2] [pid=377] New check attempt number = 1
[1307617373.176347] [001.0] [pid=377] get_raw_command_line()
[1307617373.176396] [2320.2] [pid=377] Raw Command Input: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
[1307617373.176417] [2320.2] [pid=377] Expanded Command Output: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
[1307617373.176438] [001.0] [pid=377] process_macros()
[1307617373.176457] [2048.1] [pid=377] **** BEGIN MACRO PROCESSING ***********
[1307617373.176477] [2048.1] [pid=377] Processing: '$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.176498] [2048.2] [pid=377]   Processing part: ''
[1307617373.176515] [2048.2] [pid=377]   Not currently in macro.  Running output (0): ''
[1307617373.176536] [2048.2] [pid=377]   Processing part: 'USER1'
[1307617373.176560] [2048.2] [pid=377]   Processed 'USER1', Clean Options: 0, Free: 0
[1307617373.176588] [2048.2] [pid=377]   Processed 'USER1', Clean Options: 0, Free: 0
[1307617373.176614] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.176636] [2048.2] [pid=377]   Uncleaned macro.  Running output (25): '/usr/local/nagios/libexec'
[1307617373.176655] [2048.2] [pid=377]   Just finished macro.  Running output (25): '/usr/local/nagios/libexec'
[1307617373.176675] [2048.2] [pid=377]   Processing part: '/check_ping -H '
[1307617373.176749] [2048.2] [pid=377]   Not currently in macro.  Running output (40): '/usr/local/nagios/libexec/check_ping -H '
[1307617373.176788] [2048.2] [pid=377]   Processing part: 'HOSTADDRESS'
[1307617373.176827] [2048.2] [pid=377]   macro_x[2] (HOSTADDRESS) match.
[1307617373.176854] [2048.2] [pid=377]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1307617373.176874] [2048.2] [pid=377]   Processed 'HOSTADDRESS', Clean Options: 0, Free: 1
[1307617373.176894] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.176915] [2048.2] [pid=377]   Uncleaned macro.  Running output (52): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address'
[1307617373.176936] [2048.2] [pid=377]   Just finished macro.  Running output (52): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address'
[1307617373.176971] [2048.2] [pid=377]   Processing part: ' -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.176996] [2048.2] [pid=377]   Not currently in macro.  Running output (86): '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.177016] [2048.1] [pid=377]   Done.  Final output: '/usr/local/nagios/libexec/check_ping -H My.Server.IP.Address -w 3000.0,80% -c 5000.0,100% -p 5'
[1307617373.177036] [2048.1] [pid=377] **** END MACRO PROCESSING *************
[1307617373.177126] [016.1] [pid=377] Check result output will be written to '/usr/local/nagios/var/spool/checkresults/checkX3Szdk' (fd=8)
[1307617373.177237] [064.1] [pid=377] Making callbacks (type 14)...
[1307617373.177277] [064.2] [pid=377] Callback #1 (type 14) return code = 0
[1307617373.178927] [016.2] [pid=377] Host check is executing in child process (pid=9253)
[1307617373.182785] [001.0] [pid=9253] process_macros()
[1307617373.183092] [001.0] [pid=9253] process_macros()
[1307617373.183119] [001.0] [pid=9253] process_macros()
[1307617373.183792] [001.0] [pid=9253] process_macros()
[1307617373.183859] [001.0] [pid=9253] process_macros()
[1307617373.183888] [001.0] [pid=9253] process_macros()
[1307617373.196901] [016.1] [pid=377] Current/Max Attempt(s): 1/3
[1307617373.196944] [016.1] [pid=377] Host is UP, so we'll retry the service check...
[1307617373.197046] [001.0] [pid=377] process_macros()
[1307617373.197084] [2048.1] [pid=377] **** BEGIN MACRO PROCESSING ***********
[1307617373.197130] [2048.1] [pid=377] Processing: 'SERVICE ALERT: MyServer;Win Clock;$SERVICESTATE$;$SERVICESTATETYPE$;$SERVICEATTEMPT$;NTP OK: Offset 1 secs.
[1307617373.197170] [2048.2] [pid=377]   Processing part: 'SERVICE ALERT: MyServer;Win Clock;'
[1307617373.197228] [2048.2] [pid=377]   Not currently in macro.  Running output (31): 'SERVICE ALERT: MyServer;Win Clock;'
[1307617373.197251] [2048.2] [pid=377]   Processing part: 'SERVICESTATE'
[1307617373.197359] [2048.2] [pid=377]   macro_x[4] (SERVICESTATE) match.
[1307617373.197383] [2048.2] [pid=377]   Processed 'SERVICESTATE', Clean Options: 0, Free: 1
[1307617373.197417] [2048.2] [pid=377]   Processed 'SERVICESTATE', Clean Options: 0, Free: 1
[1307617373.197445] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197503] [2048.2] [pid=377]   Uncleaned macro.  Running output (38): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN'
[1307617373.197526] [2048.2] [pid=377]   Just finished macro.  Running output (38): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN'
[1307617373.197547] [2048.2] [pid=377]   Processing part: ';'
[1307617373.197566] [2048.2] [pid=377]   Not currently in macro.  Running output (39): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;'
[1307617373.197586] [2048.2] [pid=377]   Processing part: 'SERVICESTATETYPE'
[1307617373.197608] [2048.2] [pid=377]   macro_x[42] (SERVICESTATETYPE) match.
[1307617373.197629] [2048.2] [pid=377]   Processed 'SERVICESTATETYPE', Clean Options: 0, Free: 1
[1307617373.197649] [2048.2] [pid=377]   Processed 'SERVICESTATETYPE', Clean Options: 0, Free: 1
[1307617373.197670] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197726] [2048.2] [pid=377]   Just finished macro.  Running output (43): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT'
[1307617373.197747] [2048.2] [pid=377]   Processing part: ';'
[1307617373.197766] [2048.2] [pid=377]   Not currently in macro.  Running output (44): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;'
[1307617373.197787] [2048.2] [pid=377]   Processing part: 'SERVICEATTEMPT'
[1307617373.197808] [2048.2] [pid=377]   macro_x[6] (SERVICEATTEMPT) match.
[1307617373.197831] [2048.2] [pid=377]   Processed 'SERVICEATTEMPT', Clean Options: 0, Free: 1
[1307617373.197851] [2048.2] [pid=377]   Processed 'SERVICEATTEMPT', Clean Options: 0, Free: 1
[1307617373.197871] [2048.2] [pid=377]   Cleaning options: global=0, local=0, effective=0
[1307617373.197893] [2048.2] [pid=377]   Uncleaned macro.  Running output (45): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1'
[1307617373.197911] [2048.2] [pid=377]   Just finished macro.  Running output (45): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1'
[1307617373.197931] [2048.2] [pid=377]   Processing part: ';NTP OK: Offset 1 secs.'
[1307617373.197953] [2048.2] [pid=377]   Not currently in macro.  Running output (69): 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 1 secs.'
[1307617373.197973] [2048.1] [pid=377]   Done.  Final output: 'SERVICE ALERT: MyServer;Win Clock;UNKNOWN;SOFT;1;NTP OK: Offset 1 secs.'
[1307617373.197994] [2048.1] [pid=377] **** END MACRO PROCESSING *************
[1307617373.198388] [064.1] [pid=377] Making callbacks (type 9)...
[1307617373.198523] [064.2] [pid=377] Callback #1 (type 9) return code = 0
[1307617373.198555] [001.0] [pid=377] handle_service_event()
[1307617373.198575] [064.1] [pid=377] Making callbacks (type 30)...
[1307617373.198619] [064.2] [pid=377] Callback #1 (type 30) return code = 0
[1307617373.198647] [001.0] [pid=377] run_global_service_event_handler()
[1307617373.198669] [001.0] [pid=377] check_for_external_commands()
[1307617373.198695] [016.1] [pid=377] Rescheduling next check of service at Thu Jun  9 08:03:45 2011
[1307617373.198715] [001.0] [pid=377] get_next_valid_time()
[1307617373.198734] [001.0] [pid=377] check_time_against_period()
[1307617373.198762] [001.0] [pid=377] schedule_service_check()
[1307617373.198783] [016.0] [pid=377] Scheduling a non-forced, active check of service 'Win Clock' on host 'MyServer' @ Thu Jun  9 08:03:45 2011
[1307617373.198837] [016.2] [pid=377] Scheduling new service check event.
[1307617373.198859] [001.0] [pid=377] reschedule_event()
[1307617373.198877] [001.0] [pid=377] add_event()
[1307617373.198900] [064.1] [pid=377] Making callbacks (type 8)...
[1307617373.198935] [064.2] [pid=377] Callback #1 (type 8) return code = 0
[1307617373.198990] [064.1] [pid=377] Making callbacks (type 20)...
[1307617373.199051] [064.2] [pid=377] Callback #1 (type 20) return code = 0
[1307617373.199075] [064.1] [pid=377] Making callbacks (type 13)...
[1307617373.199114] [064.2] [pid=377] Callback #1 (type 13) return code = 0
[1307617373.199171] [064.1] [pid=377] Making callbacks (type 20)...
[1307617373.199230] [064.2] [pid=377] Callback #1 (type 20) return code = 0
[1307617373.199275] [001.0] [pid=377] check_for_service_flapping()
[1307617373.199309] [016.1] [pid=377] Checking service 'Win Clock' on host 'MyServer' for flapping...
[1307617373.199331] [001.0] [pid=377] check_for_host_flapping()
[1307617373.199361] [016.1] [pid=377] Checking host 'MyServer' for flapping...
[1307617373.199414] [016.2] [pid=377] LFT=5.00, HFT=20.00, CPC=0.00, PSC=0.00%
[1307617373.199440] [016.1] [pid=377] Host is not flapping (0.00% state change).
[1307617373.199487] [016.1] [pid=377] Deleted check result file '/usr/local/nagios/var/spool/checkresults/creM40o'

I'm sure that theres only one instance of Nagios running (checked with "# ps faux", restarting nagios and even rebooting the system).

I've already try increasing the cscript timeout ("cscript.exe T XXX ..." in "NSC.ini") and the NRPE timout ("...check_nrpe -t XXX..." in "commands.cfg") but the problem remains.

Updated NSClient++ to its last version (0.3.9) and even try the last nightly build (0.4.0.111) but the problem remains.

"NSClient++" LOGs show me this when the UNKNOWN response ocurrs:

YYY-MM-DD hh:mm:ss: error:modules\CheckExternalScripts\CheckExternalScripts.cpp:214: The command (cscript.exe) returned an invalid return code: 128

Does anyone known what this could be happeining? I cant understand why Nagios Core report an "Unknown" state for my "Win Clock" service in host "MyServer?" when my script never return me an exit status of "3".

Please tell me if more information (like the "commands.cfg" or something else) is needed to help me solve this issue.

Thanks in advance!

PD: Sorry for my poor English.

Attachments (1)

nagios.debug.txt (66.4 KB) - added by BiFo 14 months ago.

Download all attachments as: .zip

Change History (2)

comment:1 Changed 14 months ago by BiFo

Its not a permissions issue cause i've give FULL PERMISSIONS to EVERYONE in all Nagios folder (My scripts included) and the problem remains.
Using the Debugger, the log file shows me the following when the error ocurrs:
[code]
2012-03-18 11:03:03: debug:NSClient++.cpp:1144: Injecting: check_windows_time:
2012-03-18 11:03:08: error:modules\CheckExternalScripts\CheckExternalScripts.cpp:214: The command (cscript.exe) returned an invalid return code: 128
2012-03-18 11:03:08: debug:NSClient++.cpp:1180: Injected Result: WARNING 'NTP WARNING: Offset 136 secs.'
2012-03-18 11:03:08: debug:NSClient++.cpp:1181: Injected Performance Result:
code

I've also found this two threads speaking about the same issue:

Both are still unsolve.

To test if the code is the problem, i've creted in very simple "intermediate script" between what my code returns and the NSClient++ itself: Creating a simple Batch scripts that calls the VBScript and workarround that nasty "128" error code when it happends:
[code]
@echo off
C:\WINDOWS\system32\cscript.exe T:30 NoLogo check_time.vbs /S www.xxx.yyy.zzz /W 120 /C 300
IF ERRORLEVEL 128 GOTO SKIP
IF ERRORLEVEL 0 exit /b 0
IF ERRORLEVEL 1 exit /b 1
IF ERRORLEVEL 2 exit /b 2
IF ERRORLEVEL 3 exit /b 3

REM Retorno codigo 128, desconocido (BUG de NSClient++ sin solucion a la fecha: 18/03/2012). Devuelvo codigo de error 0 (OK).
:SKIP
exit /b 0
code
It simply exits with the same error code that the VBScript would do, however when 128 happends, it ends with 0 (OK) exit status.
The 128 error code still appears in the NSClient++ LOGs and the UNKNOWN status randomly appearing remains...

Attached here you can see some LOGs taken from the "/usr/local/nagios/var/nagios.debug" file (at maximum debug level) from the moment it starts...

Bye!

Last edited 14 months ago by BiFo (previous) (diff)

Changed 14 months ago by BiFo

Note: See TracTickets for help on using tickets.