Nagios CMDB Closed Loop Integration

Introduction

Nagios is a popular event management application. With it’s scriptable commands and plugins, it is very customizable for performing any action when a monitor alert is triggered. This document will demonstrate how to communicate directly with the HEAT LiveTime CMDB and update CI’s in realtime with bi-directional feeds back to Nagios and HEAT LiveTime.

When a check fails and the monitor alerts, one common result can be that an incident is created whilst the failing service is investigated. With the event-handler capability of Nagios and the HEAT LiveTime inbound web services, this is simple to setup.

Additionally, with the HEAT LiveTime outbound web services and run book automation the event-handler can be paused and re-instated after the incident has been dealt with. This makes it easy to have the Nagios alert create just one request and then only create new ones once the issue with the service has been resolved. This is especially useful for services that are intermittently working/failing and alerting on the Nagios monitors.

Furthermore, if the CMDB in HEAT LiveTime is configured correctly, it can be possible to identify the item in question and take the item offline in HEAT LiveTime, restoring the item after the incident has been resolved.

Operational Workflow

  1. Service fails and alerts in Nagios
  2. The Nagios event handler calls a PHP script
  3. The PHP script parses the host and service names from Nagios and utilizing the HEAT LiveTime inbound web services, it determines the item in the CMDB in question. If the Item for the service is not found that is related to the host, then it falls back then it falls back to the default item configured in the script.
  4. An incident is raised against the item established in step 3
  5. If an exact item match is found in step 3, the item’s status is taken offline
  6. If the incident is successfully created, to prevent further requests being created, the event handler is disabled for the alert in Nagios via a web request with the host and service parsed in step 3.
  7. The Nagios check that triggered this is recorded as the first note on the incident.
  8. Once the issue is resolved, the incident is moved into a status which has an outbound web services script attached to it.
  9. Outbound script reads the first note on the incident to establish the Nagios check involved and re-enables the event handler on Nagios for this, via a web request.
  10. It also updates the item on incident back to it’s available status, if an item was identified in step 3.

Setup

For the purposes of this document, the Nagios Virtual Machine was used. This is available for download from the Nagios website. If Nagios is already setup and running, then skip to the last step of these instructions as ensuring php-soap is installed is the most important step here.

Once running, the admin pages are at:
http://host/nagiosxi/

The user pages are at:
http://host/nagios/

To avoid having to rely on accessing the console output of the virtual machine, the firewall needs opening up to allow local users access. To do this, for all ports, on a 192.168.1.0/24 based network (for example), one would login as root using the default password of nagiosxi and run:

iptables -I INPUT -p tcp -s 192.168.1.0/24 -j ACCEPT
iptables-save
service iptables save

Some checks then need to be added via the admin pages. For the example in this document, we add an FTP check to a Windows server named Windows 101.

If the auto discovery wizard is to be used to quickly get some checks added, where you pick which machines you want to check after it completes, traceroute needs to be installed first.

To do this login as root and run:

yum install traceroute

To get files to/from the virtual machine, one method my be to use scp, and to do that, you would need to first run:

yum install openssh-clients

Finally, PHP is already installed on the virtual machine but for web services commands to run, php-soap also needs to be installed. This is done by running:

yum install php-soap

Creating the Request in HEAT LiveTime

A web services script is created to do this, and this is copied to the Nagios server. There is a scripts folder already setup on Nagios so in our example we copied the script as:

/usr/local/nagiosxi/scripts/lt_nagios_create_request.php

In Nagios go to the admin Configure > Core Configuration Manager (http://192.168.1.125/nagiosxi/config/nagioscorecfg/)
Select Commands drop down > Commands > Add new
Enter the following, making sure Active is ticked and click Save:

Nagios Core Config

Command:

lt-create-request

Command Line:
/usr/local/nagiosxi/scripts/lt_nagios_create_request.php $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTNAME$ $SERVICEDESC$

Command Type:
Misc

To complete this addition, click Apply Configuration.

Next, Under Monitoring, Click Services.

Find the service that needs to create a request on alert (in our example, the Config Name is ‘Windows 101’ and the Service Name is ‘FTP’) and click tools icon to edit it.

Nagios ITSM Configuration

Under the Check Settings tab, use the drop down for Event handler to select lt-create-request and set the enabled option to on.

Nagios Service Management and HEAT LiveTime Config

Click to save those settings and then click Apply Configuration to complete this modification.

Configuring the HEAT LiveTime CMDB

For the correct item to be used, there are a number of ways to do this, but in our example PHP script, we set it up so the host is an item in HEAT LiveTime, and the Nagios Service being tested is also an item in HEAT LiveTime.

For instance, you might have a server numbered ‘Windows 101’ as the host with and FTP service running on it, which for this we have assigned an item numbered ‘FTP101’, which is of the type ‘FTP’. The items are then setup in a relationship so that the ‘FTP101’ is installed on ‘Windows 101’.

The number of the actual service item doesn’t matter but in our example the number of the host item does, if items are not to be assigned to the default item.

The example script first attempts to locate the item numbered the same as the host, ‘Windows 101’ and then looks for a related item of the type matching the service being checked, ‘FTP’. It finds the item ‘FTP101’ so it picks that item for the new incident.

If it didn’t find an FTP type of item related to the host, it would just raise it would just raise it against the default item.

The script will also update the item status to an Offline one but only if it finds an exact match to the service. If not, it will not update it. This is purely in the example script so obviously can be configured as required.

Also, the example script is not strict on the relationship type, for the purposes of demonstrating this. It could be further amended to only look at services installed on the host, as opposed to those also used by the host, if other relationships existed in the CMDB in question. This could readily be added to the script.

Re-activating the Event Handler

The HEAT LiveTime outbound web services can be used to call a script to reactivate the event handler in HEAT LiveTime once the request has reached a status where the service issues have been resolved.

Additionally, the script can call back to the HEAT LiveTime inbound web services to update the item’s status as online, if an exact item was identified and placed into an offline state in the original script that created the request.

As a Java application, the scripts behind the HEAT LiveTime outbound web services need to be in Java.

For the best performance, the script that performs these actions should ideally be coded in Java. However, for the purposes of this demonstration, it is also possible to code enough java to call another command, which can then be coded in any language. In our case, we are going to call a PHP script as we can then reuse some of the code used in the initial script too.

The script sends, as command line arguments, the status change involved (ie. if the status was Entered or Exited), the request number and the item number on the request. The PHP command then reads these and performs the required actions.

To install the provided code for this, jump to step 3 below and copy the included jar file over to the location specified. However, if making changes to the script, the following needs to be done:

1. To amend the code, copy this file to the folder with the java file in:

{HEAT LiveTime Install Folder}/LiveTime.woa/Contents/Resources/Java/livetime-listen.jar

2. After all required changes have been made, the following commands need to be entered:

javac lt_nagios_resolved.java
jar cf lt_nagios_resolved.jar lt_nagios_resolved.class -classpath service-listen.jar

3. The resulting lt_nagios_resolved.jar file then needs to be coped to the the following location on the HEAT LiveTime server:

{HEAT LiveTime Install Folder}/LiveTime/WEB-INF/lib/

After this, the HEAT LiveTime application server, needs to be restarted for this to be picked up.

By default, the java code is calling ‘php /root/lt_nagios_resolved.php’. As such, php and php-soap need to be installed on the HEAT LiveTime server, along with copying lt_nagios_resolved.php to the /root/ folder.

Note that the PHP script is set to change the status of the service to ‘Available’ in line with the online status provided by default for items of the Service category in HEAT LiveTime. This can be modified according to what is in use.

In fact, the script could be further developed to look for ‘Available’ and if that wasn’t present, try ‘Online’, etc, if multiple item category types were used. Obviously that is up to the user to add such coding.

Finally, to call this java code, the workflow used for dealing with the alert requires a status that is linked to the class in the jar file created above. To do this, ‘Outbound Web Services’ needs to be enabled under Setup > Privileges > System and the status in question is then updated so that it’s Listener Class is:

lt_nagios_resolved

Nagios Workflow ITSM

[download id=”1″]