#529 closed enhancement (fixed)

add alert severity to real-time event log monitoring

Reported by: mikep Owned by: mickem
Priority: 1 Milestone: 0.4.1
Component: CheckEventlog Version: 0.4.0
Severity: Feature Requests Keywords:
Cc:

Description

Hello. I believe it would be very valuable to add the ability to specify what alert level will be sent when a real-time event log filter matches.

This would allow greater control over the state returned to Nagios and will allow for resolved problems to return an OK status immediately.

I'm thinking that maybe you can add a parameter like AlertSeverity? that could be specified in the config section of each filter.

Maybe something like the following?

[/settings/eventlog/real-time/filters]
Test App1=AlertSeverity = 0 id = 1000 AND source = 'Test App1'
Test App1=AlertSeverity = 1 id = 1100 AND source = 'Test App1'
Test App1=AlertSeverity = 2 id = 1200 AND source = 'Test App1'


I'm not sure if that is the best example of how it could work, but I think it shows that I want to attach different severities to different filters for the same service.

Thanks!

Change History (13)

comment:1 Changed 13 months ago by mickem

Not sure I follow... I would hazzard a guess though:

[/settings/eventlog/real-time/filters/test app 1 warning]
state=warning
filter=id = 1000 AND source = 'Test App1'
[/settings/eventlog/real-time/filters/test app 1 error]
state=error
filter=id = 1000 AND source = 'Test App1'

Question is if this becomes to complicated to configure though... perhaps better to have some other mechanism?

Michael Medin

comment:2 Changed 13 months ago by mikep

Yes, I'm not clear on the best way to have the configuration settings work. But I stand by the great value of the capability.

With a normal polling monitor, like CPU usage, we can define warning and critical values. When it is run on its interval, you always get back an ok, warn, or crit. This enables it to reset its state, if the value changes before the next poll interval.

With real-time monitoring, I want to have a similar capability in real-time.

For example, I have an application that connects to a DB.

If that connection takes longer than 5 seconds, I write a specific event message to the event log with an id of 1100. The app is working, but the delay could impact the user experience.

If the connection completely fails, I write a specific event message to the event log with a different id of 1200. This means I need to address it immediately.

If the app can connect to the DB successfully again, I write an event message with an id of 1000. This tells me that everything is working again.

To reflect the health of the app in Nagios, I have created a service named 'Test App1'.

So, I would like to be able to create nscp real-time filters that send back ok, warn, or crit for my service, based on matching the id 1000, 1100, or 1200 respectively.

I haven't looked closely at how you have implemented the real-time filters, but I had thought you were creating a list. It seems like that would still work, since the different state filters for a single service would not have identical filter values.

Thanks.

mikep

comment:3 Changed 12 months ago by mickem

This is what I am looking at now (ish):

...
; Definitation for real time filter: default
[/settings/eventlog/real-time/filters/default]

; DESTINATION - The destination for intercepted messages
destination = nsca

; OK MESSAGE - This is the message sent periodically whenever no error is discovered.
ok message = eventlog found no records

; SYNTAX - Format string for dates
syntax = hello world: %message%


; A set of filters to use in real-time mode
[/settings/eventlog/real-time/filters/test_1]

; DESTINATION - The destination for intercepted messages
destination = nsca_server_1

; FILTER - The filter to match
filter = id = 1001 and category = 1

; SYNTAX - Format string for dates
syntax = hello world: %message%


; Definitation for real time filter: default
[/settings/eventlog/real-time/filters/test_2]

; DESTINATION - The destination for intercepted messages
destination = nsca_server_2

; FILTER - The filter to match
filter = id = 1002 and category = 1

severity = WARNING

; A set of filters to use in real-time mode
[/settings/eventlog/real-time/filters]

test3 = id = 1003 and category = 1
...

comment:4 Changed 12 months ago by mikep

Hi Michael,

I'm unclear on the format you have included. Please tell me if the following config would have the effect I list below.

[/settings/eventlog/real-time/filters]
cart_checkout=severity = OK id = 1000 AND source = 'Shopping Basket'
cart_checkout=severity = WARNING id = 1100 AND source = 'Shopping Basket'
cart_checkout=severity = CRITICAL id = 1200 AND source = 'Shopping Basket'

In Nagios I have a service named "cart_checkout" to show the health of the checkout function of my shopping basket web app.

When a user checks out, the app will record a Windows event log based on the following.

A successful checkout will have an EventId = 1000 and a source of "Shopping Basket"

A slow checkout will have an EventId = 1100 and a source of "Shopping Basket"

A failed checkout will have an EventId = 1200 and a source of "Shopping Basket"

I want the nscp real-time monitor to make a nsca call with the correct information to tell Nagios that it is the status of service "cart_checkout" and a state of OK, WARNING, or CRITICAL (0, 1, 2) for the respective event id found in the event log entry.

Where I get confused by your example is how is the service name getting set?

In your example config file, using my example, would I do the following?

[/settings/eventlog/real-time/filters/cart_checkout_ok]
cart_checkout=id = 1000 AND source = 'Shopping Basket'
severity = OK 

[/settings/eventlog/real-time/filters/cart_checkout_warn]
cart_checkout=id = 1100 AND source = 'Shopping Basket'
severity = WARNING 

[/settings/eventlog/real-time/filters/cart_checkout_crit]
cart_checkout=id = 1200 AND source = 'Shopping Basket'
severity = CRITICAL 

Please clarify. Thanks!

comment:5 Changed 12 months ago by mikep

Last edited 12 months ago by mikep (previous) (diff)

comment:6 Changed 12 months ago by mickem

Almost...

[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 1000 AND source = 'Shopping Basket'
severity = OK 

[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 1100 AND source = 'Shopping Basket'
severity = WARNING 

[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 1200 AND source = 'Shopping Basket'
severity = CRITICAL

If you grab the 0.4.1 one build I did last night you should be able to try it out.
Notice it is unstable so don't expect to much :)
I need to extend the unit tests to support this as well as the reworked socket handling).

Michael Medin

comment:7 Changed 12 months ago by mikep

Ok, I'll go grab your latest build and test the current functionality. The last piece of the puzzle that I'm having a hard time understanding is how I set the name of the service. For example, a normal NSCA message looks like this in your debug output.

Sending (data): host: server01, service: cart_checkout, code: 1, time: 1337686091, result: warning cart_checkout is slow

How do I set the "service:" value in your config files? I see that you replaced "cart_checkout" with "filter" in your correction to my example above. With the current 0.4.0 real-time functionality, I set the "service:" value by including the name of the service "cart_checkout" in the filter definition line as I did above. So I'm confused why you changed it to "filter". How would I make your example above return "service: cart_checkout" in the NSCA message?

Thanks!
mikep

comment:8 Changed 12 months ago by mickem

Ahh... sorry...

This is the "same concept" as I have for servers and what not.
So the following is equvavlent:

[/settings/eventlog/real-time/filters]
foo=id = 1200
[/settings/eventlog/real-time/filters/foo]
filter=id = 1200

With the benefit of the first being simplicity ie. alias=filter and the benefit of the second is precision at the ost of having to type more.

SO the "service name" in that case comes from the [.../SERVICE NAME] section name.
You can also (given that I like flexibility) override the alias as well but lets not get into that...

Michael Medin

comment:9 Changed 12 months ago by mikep

Ok, here are my findings with 0.4.1.1. I think there may be some bugs and I'm not sure the current implementation meets the requirements, at least with the config file I used. Please let me know if I am still not using the config file correctly.

This was my initial config:

[/settings/eventlog/real-time/filters/default]
enabled=true
maximum age=5m
destination=NSCA
syntax=hello world: %message%
ok message = eventlog found no records

[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 100 AND source = 'Shopping Basket'
severity = OK 

[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 110 AND source = 'Shopping Basket'
severity = WARNING 

[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 120 AND source = 'Shopping Basket'
severity = CRITICAL

I received the error:

e rvice\NSClient++.cpp:1211 No one listens for events from:  ()
e og\CheckEventLog.cpp:82   Failed to submit evenhtlog result: Missing response from submission

It appears the config isn't picking up the destination from the default section.

So I added destinations to my config.

settings/eventlog/real-time/filters/default]
enabled=true
maximum age=5m
destination=NSCA
syntax=hello world: %message%
ok message = eventlog found no records

[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 100 AND source = 'Shopping Basket'
severity = OK 
destination=NSCA

[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 110 AND source = 'Shopping Basket'
severity = WARNING 
destination=NSCA

[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 120 AND source = 'Shopping Basket'
severity = CRITICAL
destination=NSCA

This gets me a nsca response, but the status code isn't as expected.

I use the follow command to create an event to match the warning filter.

eventcreate /ID 110 /L APPLICATION /SO "Shopping Basket" /D "App is slow" /T WARNING

This gets the nsca message.

d lient\NSCAClient.cpp:417  Sending (data): host: orvomdev01, service: cart_checkout_warn, code: 3, time: 1339029020, result: hello world: App is slow

This means that a service named "cart_checkout_warn" will have a status code of UNKNOWN (3=UNKNOWN in Nagios).

1) I believe the intended functionality is for the "code:" value to be set to a value that matches the severity value I set in the config (code: 1 in this case). i.e. OK=0, WARNING=1, CRITICAL=2, UNKNOWN=3

2) This config format still doesn't help me in a real life scenerio. Using this config, I'm actually defining filters for 3 different Nagios service (cart_checkout_ok, cart_checkout_warn, cart_checkout_crit) instead of defining 3 differnt state filters (OK, WARNING, CRITICAL) for the same service "cart_checkout". The intent of this capability is to allow the real-time filter to provide different status codes for differnt event messages related to the same service. The way I configured this test doesn't allow that result.

I'm guessing item 1 is just an oversite with a simple fix in the code. I'm not clear on how your current config format addresses item 2. Please clarify.

Thanks!

mikep

comment:10 Changed 12 months ago by mickem

Ahh, sorry...
I was a bit quick there.
I have fixed a few issues with the new filters in the latest build and added testcases so they actually work :)
In the next build there will be (an untested) new keyword "command".
The command overrides the "service name" (I call it command since I don't like to change terminology for passive checks) which if not set will be taken from alias unless you override it here. Sorry for forgetting it last time around...

So you should e able to do:

[settings/eventlog/real-time]
maximum age=5m
enabled=true

[settings/eventlog/real-time/filters/default]
destination=NSCA
syntax=hello world: %message%
ok message = eventlog found no records
command=cart_checkout

[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 100 AND source = 'Shopping Basket'
severity = OK 

[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 110 AND source = 'Shopping Basket'
severity = WARNING 

[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 120 AND source = 'Shopping Basket'
severity = CRITICAL

Also note that maximum age and enable are not set on the filter level (they are on the real-time level as they are global).

Michael Medin

comment:11 Changed 12 months ago by mickem

As a side not to you/or other people with more complex scenarios. The "default" filter is a standard feature which is a bit magical for simplicity. If you do not specify a parent you always get default if it exists.

In reality filters are a tree structure where you can have multiple parents.
so if you have "more then one" of these scenarios you can easily achieve these using multiple "templates" like so (ish):

[settings/eventlog/real-time/filters/shopping_cart]
destination=NSCA
syntax=hello world: %message%
ok message = eventlog found no records

[settings/eventlog/real-time/filters/shopping_cart]
command=cart_checkout
is template=true
parent=default

[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 100 AND source = 'Shopping Basket'
severity = OK
parent=shopping_cart

[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 110 AND source = 'Shopping Basket'
severity = WARNING 
parent=shopping_cart

[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 120 AND source = 'Shopping Basket'
severity = CRITICAL
parent=shopping_cart

[settings/eventlog/real-time/filters/foo_bar]
command=foo_bar
is template=true
parent=default

[/settings/eventlog/real-time/filters/foo_bar_ok]
filter=id = 100 AND source = 'Foo Bar'
severity = OK
parent=foo_bar

[/settings/eventlog/real-time/filters/foo_bar_warn]
filter=id = 110 AND source = 'Foo Bar'
severity = WARNING 
parent=foo_bar

[/settings/eventlog/real-time/filters/foo_bar_crit]
filter=id = 120 AND source = 'Foo Bar'
severity = CRITICAL
parent=foo_bar

comment:12 Changed 12 months ago by mikep

This is very good! I just performed some quick tests and it appears to work well. I will work on larger tests this weekend and provide you feedback as I gather it.

Thanks for your excellent work!

mikep

comment:13 Changed 12 months ago by mickem

  • Resolution set to fixed
  • Status changed from new to closed

Setting this to resolved

Note: See TracTickets for help on using tickets.