Building a GroundWork Monitor Service Profile
From GroundWork Developer Kit
Contents |
[edit] INTRODUCTION
The GroundWork Monitor Bookshelf application describes the individual steps to create a Service Profile in the Configuration application. This document does not focus on those steps. Rather, this document focuses on the methods by which:
- Appropriate Services are selected for inclusion in a Service Profile
- Commands are tested against target applications or equipment
- Dependencies and required arguments for the Commands are configured such that the Services can be quickly implemented by the customer
- Components of the Service Profile are packaged for delivery to a customer, or to GroundWork for inclusion on GroundWork Connect or in a subsequent release of the GroundWork Monitor product
Building a Service Profile in a manner consistent with this document will mean that your profile can be readily integrated into GroundWork Monitor and customers can understand the value of the new profile and get it deployed quickly.
[edit] WHAT IS A SERVICE PROFILE?
The following description of a Service Profile is pulled from the GroundWork Monitor Bookshelf application:
A Service Profile is a collection of multiple Services. For example, you would typically want to check more than just disk space on a Linux Host. Some measure of memory use is also a good idea, and CPU load is also interesting. GroundWork Monitor contains several Commands and Services for checking these and many other parameters on Linux Hosts. These Services are grouped into Service Profiles, which you can use as is or modify to fit your needs.
Once you have a Service Profile you can combine this with a Host Template and create a Host Profile.
GroundWork Monitor is preloaded with a set of Service Checks for multiple types of servers, devices, etc. that you can use out of the box.
Service Profiles encapsulate a set of standard Services with Plugins and Best Practices that can be applied toward monitoring specific devices or protocols. The advantage of using a Service Profile is that it is pre-integrated.
Using Profiles, you can quickly configure GroundWork Monitor to monitor groups of devices the same way.
Customers gain value from having Service Profiles pre-defined for the applications and equipment that they are using.
In the Configuration (Monarch) application a Service Profile appears to be a collection of one or more Services. However, there are other components that make up a useable Service Profile.
[edit] Components
The components of a Service Profile that need to be assembled are:
- Service definitions
- Command definitions
- Nagios plugins
- Performance definitions
- Documentation
[edit] APPLICATION PROFILING
To determine the appropriate Services to include in a Service Profile for an application you must understand the application itself. GroundWork Monitor focuses on availability and performance monitoring. You need to identify measures of availability and performance for the application and how they can be obtained.
An example might be the Apache Web server. Availability measurements might be whether a particular URL is responsive and that the request results in expected content being returned. Performance measurements might include the length of time for the content to be returned. Both measures can be made with an HTTP request to the application.
Do not fall into the trap of trying to include Services for every possible measure. This approach is not recommended by GroundWork – most published Service Profiles have 5-10 Services. Select only the key measures of availability and overall application performance. Apache also supports showing you a number of internal statistics that can be useful to system administrators. GroundWork does not recommend creating monitors to extract each of these statistics unless failure of a particular application can be reliably and more quickly detected by monitoring such statistics. A user of the Service Profile should always be able to extend it as needed for a particular situation.
When profiling an application you need to understand the differences between versions of that application, and decide whether that will be a factor in selecting Services. You may decide to create multiple Service Profiles – each for a subset of application versions.
[edit] BUILDING THE SERVICE PROFILE
[edit] Command line testing
Before doing anything with GroundWork Monitor you should establish how monitoring is to be accomplished from an unprivileged user account on the GroundWork Monitor server. Eventually you will need to make this work as user nagios on that server, and then turn the command line testing into real Command definitions within GroundWork Monitor.
Whether you are using existing plugins included with GroundWork Monitor, or those available on sites such as nagiosexchange.org, you need to make sure they will run as user nagios and that they can be run with the current directory being /.
If you deem it necessary to write your own plugins then please read and understand Nagios Plugin Development Guidelines. Writing your plugins according to these guidelines will make them useful in the widest possible range of situations.
If the plugins rely on other packages then you must document the steps necessary to install those packages (see Documenting the Service Profile). If the effort to configure additional packages is significant then GroundWork usually places these profiles on the GW Connect site and does not try to include them in the GroundWork Monitor product. This is particularly so when the packages require architecture specific compilation. An example of this is the use of Perl DBD::DB2 to monitor the DB2 database application. The DBD::DB2 Perl module requires an SDK and compilation for each operating system architecture that GroundWork Monitor is supported on.
[edit] Command definitions
Once you have tested all of the plugins necessary to monitor the important measures of availability and performance, you need to turn the command line syntax into Command definitions within the Configuration application of GroundWork Monitor. The procedure for defining Commands is provided in the Bookshelf application but before creating Commands you should think about several issues:
- Arguments that are most likely to be changed by the user of the Service Profile should be moved to the Service definition. This allows the user to readily update arguments for each instance of the Service as well as use Monarch Macros to set arguments for a number of Hosts very easily.
- Arguments that are always the same for every Service instance but that might need to be set globally for a particular environment should be parameterized with Nagios Macros and set in the Resource file. The use of Nagios Macros must be clearly documented since the Commands will fail if those macros are not set.
[edit] Service definitions
The procedure for defining Services is provided in the Bookshelf application but before creating Services with the tested and implemented Command definitions you should think about the naming of Services.
Service naming standards will enable IT support personnel to quickly identify each monitored service. GroundWork uses a naming convention that you should follow. It is divided into four distinct fields formatted as follows:
mode_method_protocol_function
mode
Specifies if the Service is Active or Passive. An Active mode is implied and not part of the Service name. A Passive mode should normally indicated by a “p_” at the beginning of the Service name.
method
Specifies how data is being collected. Methods include: TCP, UDP, NRPE, NSClient, SNMP, RPC, and NSCA.
protocol
Specifies the application protocol used to collect the data. If there is no ambiguity in what transport the protocol is using, the transport specification should not be part of the Service name. For example: tcp_snmp, udp_snmp, tcp_dns, udp_dns, http, sql.
function
Specifies what is being checked. Function examples include: cpuload, mem_use, disk_c, and disk_/var.
Examples of GroundWork Monitor Service names:
Example 1: wmi_disk_c
| mode | Active assumed | method | WMI | protocol | TCP assumed | function | Free space on Disk C |
Example 2: tcp_ssh
| mode | Active assumed | method | NA | protocol | TCP | function | SSH port open |
Example 3: sql_mssql_login
| mode | Active assumed | method | SQL | protocol | TCP assumed | function | Login to MSSQL is possible |
Where Host containers for server clusters are used, the Monitors that check individual nodes should have the node hostname appended to the Service name.
[edit] Creating and Exporting the Service Profile
Once Service definitions have been created it is a simple task to create the Service Profile in the Configuration application. The Bookshelf application describes how to do that. GroundWork has a convention for naming of Service Profiles that follows the naming convention for the constituent Services. Please review the names of profiles already included in GroundWork Monitor if you are unclear on how to apply the convention. It is easiest to do this by looking at the files on the GroundWork Monitor server under/usr/local/groundwork/profilesPlease follow that convention. Once created, you can export the Service Profile in the same part of the Configuration application where you created it. The result is an XML file placed under the /tmp directory on the GroundWork Monitor server. It is important to retain the name of that file so that GroundWork and others recognize the file as a Service Profile definition. If you created a Service Profile called local-groundwork-server then the resulting file will be
/tmp/service-profile-local-groundwork-server.xml
[edit] Creating and Exporting the Performance Definitions
Performance definitions are used in GroundWork Monitor to take the output of Service checks and make it available for graphing in applications such as Performance, Status Viewer, and Dashboards as well as to make that output available for performance reporting in the Reports application.
Performance definitions are an important part of creating a Service Profile.
The procedure for creating Performance definitions is documented in the Bookshelf application. You should create definitions for as many Services in the Service Profile as possible. Some plugins can return a variable number of results. In most cases however you can configure Performance definitions to account for that. When reviewing the Bookshelf documentation look at the LIST macros and template options of the <code>rrdupdate command to solve this issue.
Once you have created and tested all of the Performance definitions for the Service Profile you must export them from the Performance application. You have to export them so that they can be imported by a customer along with the Service Profile you previous exported.
As you may have guessed, the procedure to do this is documented in the Bookshelf application! You have the option to export individual Service-Host entries or all entries. Whatever method you choose, the resulting XML files that are placed under the /tmp directory need to be edited until you have a single XML file containing all of the Performance definitions required for the Service Profile.
For example, if you have three Services that produce performance data and they are called wmi_iis_bytesin, wmi_iis_bytesout, wmi_iis_bytestotal, and you have created Performance definitions for each of them, you could go to each definition in the Performance application and click the Export button. You would end up with three files:
-
/tmp/perfconfig_wmi_iis_bytesin.xml -
/tmp/perfconfig_wmi_iis_bytesout.xml -
/tmp/perfconfig_wmi_iis_bytestotal.xml
You would need to copy the <service_profile> </service_profile> sections from each file into one file and make sure they are surrounded by one set of <groundwork_performance_configuration> </groundwork_performance_configuration> section. Look at example performance XML file definitions under /usr/local/groundwork/profiles on a GroundWork Monitor server if this isn’t clear.
If you choose to export all entries from the Performance application by clicking Export all you will end up with one file /tmp/perfconfig_ALL.xml. You will need to go into that file and delete <service_profile> </service_profile> sections for Service-Host entries not part of the Service Profile. The tag names are confusing! They in fact refer to Service-Host entries not Service Profiles. Do not be alarmed.
The name of the file should include the name of the Service Profile. So, for an exported Service Profile XML file of service-profile-local-groundwork-server.xml the performance XML file should be named perfconfig-local-groundwork-server.xml. This is necessary so that the Profile Importer function in the Configuration application recognizes the file as being part of the Service Profile.
[edit] Documenting the Service Profile
GroundWork has a standard for documenting Service Profiles. The standard is provided at the end of this document.
It is important that the documentation include specific instructions on how to set up new plugins and Commands in the Implementation Notes section. For example, if a new Perl module is necessary for a Perl plugin to run then instructions on how to use CPAN or otherwise download, build, and install that module must be included.
[edit] Packaging the Service Profile
At the end of following the above process for building a Service Profile you should have the following components:
- Service Profile XML file
- Performance XML file
- Any necessary Nagios plugins
- Standard documentation
For GroundWork to take the profile and either place it on GroundWork Connect or include it in a subsequent release of GroundWork Monitor, we need the first three components in an archive file, such as .zip, .gz, or .tar.gz. The documentation should be in Microsoft Word format.
[edit] GROUNDWORK SERVER PROFILE (LOCAL) SAMPLE
[edit] Description
This profile monitors the local GroundWork server. This profile ensures the main components of the monitoring server are running:
- Nagios®
- GroundWork Foundation MySQL database and associated feeder and listener processes
- NSCA
- SNMP TRAPD and SNMPTT
[edit] Services Configuration
- SERVICE - Definitions in Configuration are stored under this name.
- COMMAND LINE - Service command name with arguments to be passed to the plugin.
- PLUGIN COMMAND LINE - Plugin script called by Nagios® for this Service.
- EXTENDED INFO - The Extended Service Info definition, typically used for generating graphs
| SERVICE | COMMAND LINE | PLUGIN COMMAND LINE | EXTENDED INFO |
|---|---|---|---|
| local_mysql_database_nopw | check_mysql!mysql!root | $USER1$/check_mysql -H $HOSTADDRESS$ -d "$ARG1$" -u "$ARG2$" -p "$USER6$" | number_graph |
| local_mysql_engine_nopw | check_mysql_engine_nopw!root | $USER1$/check_mysql -H $HOSTADDRESS$ -u "$ARG1$" | number_graph |
| local_nagios_latency | check_nagios_latency | $USER1$/check_nagios_latency.pl | number_graph |
| local_process_gw_feeders | check_local_procs_arg!2:3!2:3!nagios2collage | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_gw_listener | check_local_procs_arg!1:1!1:1! groundwork.feeder.service.DataFeederService | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_mysqld | check_local_procs_string!10!20!mysqld | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_mysqld_safe | check_local_procs_string!1!2!mysqld_safe | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_nagios | check_nagios | $USER1$/check_nagios -F /usr/local/groundwork/nagios/var/status.log -e 5 -C bin/nagios | percent_graph |
| local_process_nsca | check_local_procs_arg!1:1!1:1!bin/nsca | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_snmptrapd | check_local_procs_arg!1:1!1:1!snmptrapd | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| local_process_snmptt | check_local_procs_arg!1:1!1:1!sbin/snmptt | $USER1$/check_procs -w "$ARG1$" -c "$ARG2$" -a "$ARG3$" | number_graph |
| tcp_http_port | check_http_port!3!5!80 | $USER1$/check_http -H $HOSTADDRESS$ -w "$ARG1$" -c "$ARG2$" -p "$ARG3$" | number_graph |
| tcp_nsca | check_tcp_nsca | $USER1$/check_tcp -H $HOSTADDRESS$ -p 5667 | number_graph |
[edit] Profile Package
This package includes the following files:
- Profile Definitions
- service_profile_gwsp2-local_GroundWork_server.xml
- perfconfig_gwsp2-local_GroundWork_server.xml
- Plugins Scripts (installed with the GroundWork Monitor product)
- check_http
- check_procs
- check_mysql
- check_nagios_latency.pl
- check_nagios
- check_tcp
- Performance Graphing Programs
- percent_graph.cgi
- number_graph.cgi
[edit] Installation
GroundWork Monitor Profiles are distributed with the product download and are already part of the Configuration database. The GroundWork Monitor Configuration tool is used to import updated Profiles and Profiles that require additional setup. The Profile Importer imports the Profile XML file and its companion Performance Configuration definition file.
To import Profiles within GroundWork Monitor, go to Configuration>Profiles>Profile Importer. The Profile Importer process is documented in the Bookshelf in the Administrator Guide, Section 4 Configuration, Chapter 4 Advanced Configuration - Configuring Profiles.
[edit] Implementation
This section contains detail settings used by this Profile. These parameters can be altered with the Configuration tool.
[edit] Command Parameters
Command parameters are in the Configuration Services section with the following names and default values. Any bolded arguments MUST be set before this Service Profile will work properly.
- local_mysql_database_nopw
- Check if the specific MySQL database responds.
- $ARG1$ - Database name to test.
- $ARG2$ - Database user ID. Must be authorized to access the database name.
- Note - The parameters are authorized by using the MySQL command: "GRANT ALL ON <database name> to <database user>@<host address> IDENTIFIED BY "<database password>"
- local_mysql_engine_nopw
- Check if the MySQL database responds.
- $ARG1$ - Database user ID. Must be authorized to access MySQL.
- local_process_mysqld
- Check for the number of mysqld processes running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 10.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 20.
- local_process_mysqld_safe
- Check for the number of mysqld_safe processes running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1.
- local_process_nagios
- Check if the Nagios® process is running. The check_nagios plugin will check to make sure the Nagios® status log is updating at least every 5 minutes, and the bin/nagios process is running. A critical alert will be generated if either of these conditions is not met.
- local_nagios_latency
- The check_nagios_latency.pl plugin will check the time between when an active service check is scheduled to executed and the time it actually executes. This term is called latency and is an indicator of the load on the server. This plugin will not generate an alert, but it will allow the latency to be graphed.
- tcp_nsca
- Check for a listening TCP port number 5667 on $HOSTADDRESS$. This is the port used by NSCA to listen for connections from remote hosts.
- tcp_http_port
- Check for a listening TCP port number 80 on $HOSTADDRESS$. This is the web server port used by GroundWork Monitor to listen for connections from system users.
- local_process_gw_feeders
- Check to make sure the GroundWork Foundation feeder processes are running. If these processes fail, the Foundation database will not be updated with current Nagios® data. The processes nagios2collage_socket and nagios2collage_event should always be running. A third process, nagios2collage_hostgroup will run hourly. A critical alert will be generated if at least 2 of these processes are not running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 2:3.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 2:3.
- local_process_gw_listener
- Check to make sure the GroundWork Foundation listener process is running. If this process fails, the Foundation database will not be updated with current Nagios® data. A critical alert will be generated if this process is not running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.
- local_process_nsca
- Check to make sure the NSCA daemon is running. If this process fails, remote passive check updates will not be received by this Nagios® system. A critical alert will be generated if this process is not running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.
- local_process_snmptrapd
- Check to make sure the SNMPTRAPD daemon is running. If this process fails, SNMP traps will not be received by the GroundWork server. A critical alert will be generated if this process is not running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.
- local_process_snmptt
- Check to make sure the SNMPTT daemon is running. If this process fails, SNMP traps that are received by SNMPTRAPD will not be processed by Nagios® and the GroundWork Foundation will not insert trap events. A critical alert will be generated if this process is not running.
- $ARG1$ - A warning alert will be generated if the number of processes exceeds this value. The default is 1:1.
- $ARG2$ - A critical alert will be generated if the number of processes exceeds this value. The default is 1:1.
[edit] Performance Graphing Parameters
The following parameters are used to generate performance charts. These parameters are set using the Performance Configuration tool in GroundWork Monitor.
- local_mysql_database_nopw
- Graphs the average number of queries per second against a given database.
- The Nagios® service description must contain the string "local_mysql_database".
- local_mysql_engine_nopw
- Graphs the average number of queries per second against all databases.
- The Nagios® service description must contain the string "local_mysql_engine".
- local_nagios_latency
- Graphs the latency of Nagios® service checks.
- The Nagios® service description must contain the string "local_nagios_latency".
- local_process_gw_feeders
- Graphs the number of gw_feeders processes.
- The Nagios® service description must contain the string "local_process".
- local_process_gw_listener
- Graphs the number of gw_listener processes.
- The Nagios® service description must contain the string "local_process".
- local_process_mysqld
- Graphs the number of mysqld processes.
- The Nagios® service description must contain the string "local_process".
- local_process_mysqld_safe
- Graphs the number of mysqld_safe processes.
- The Nagios® service description must contain the string "local_process".
- local_process_nagios
- Graphs the number of Nagios® processes.
- The Nagios® service description must contain the string "local_process".
- local_process_nsca
- Graphs the number of NSCA processes.
- The Nagios® service description must contain the string "local_process".
- local_process_snmptrapd
- Graphs the number of snmptrapd processes.
- The Nagios® service description must contain the string "local_process".
- local_process_snmptt
- Graphs the number of snmptt processes.
- The Nagios® service description must contain the string "local_process".
- tcp_http_port
- Graphs time taken to load web page.
- The Nagios® service description must contain the string "tcp_http_port".
- tcp_nsca
- Graphs time taken for NSCA daemon to respond.
- The Nagios® service description must contain the string "tcp_nsca".
[edit] Implementation Notes
The Nagios® latency graph relies on a Nagios® 2.0 binary /usr/local/groundwork/nagios/bin/nagiostats. All other checks should work for Nagios® 1.2 or 2.0.