You are here

Monitoring system enhancements

Project ID: 
230
Current stage: 
Manager: 
Unit: 
What: 

Description:

The School's existing Nagios framework was developed by Simon Wilkinson (see Project 22) and has now been in use for a couple of years. In the light of experience with the system, a number of enhancements are now desirable, including, for example:

  1. The provision of generic passive monitoring scripts for things like disc space usage, RAID errors, etc.
  2. More intelligent target IP address selection, (in the case of multi-homed targets.)
  3. The provision of a dependancy/hierarchy mechanism to limit spurious alerts when, for example, a switch/router fails.
  4. The provision of alerts via SMS text messaging.
  5. Monitoring of the server room environments.

Deliverables:

Why: 

Customer: The CO community

Case statement:

Any significant changes to the monitoring system will involve a considerable familiarisation effort; if that's to be done at all, then it makes good sense to do as much work as possible at the same time.

When: 

Status:

Timescales:

Priority:

Time:

How: 

Proposal:

Resources:

  • Familiarisation: at least 2 weeks
  • Production of enhancements: at least 3 weeks, but the actual effort necessary will depend on both the results of initial evaluation of requirements, and the subsequent code familiarisation.

Plan:

  1. Familiarisation with existing code base
  2. Decide the set of desirable enhancements - and their relative priorities
  3. Implement enhancements - to the degree that seems reasonable in the light of the code familiarisation
Other: 

Dependencies:

This project follows on from the Monitoring system upgrade project: there would be no point now making enhancements to the Nagios v2 infrastructure.

The MPU is expected to provide a standard client-side facility to monitor RAID - see Project 134 - Server Hardware Interaction.

Risks:

Milestones

Proposed date Achieved date Name Description