You are here
Monitoring system enhancements
Description:
The School's existing Nagios framework was developed by Simon Wilkinson (see Project 22) and has now been in use for a couple of years. In the light of experience with the system, a number of enhancements are now desirable, including, for example:
- The provision of generic passive monitoring scripts for things like disc space usage, RAID errors, etc.
- More intelligent target IP address selection, (in the case of multi-homed targets.)
- The provision of a dependancy/hierarchy mechanism to limit spurious alerts when, for example, a switch/router fails.
- The provision of alerts via SMS text messaging.
- Monitoring of the server room environments.
Deliverables:
Customer: The CO community
Case statement:
Any significant changes to the monitoring system will involve a considerable familiarisation effort; if that's to be done at all, then it makes good sense to do as much work as possible at the same time.
Status:
Timescales:
Priority:
Time:
Proposal:
Resources:
- Familiarisation: at least 2 weeks
- Production of enhancements: at least 3 weeks, but the actual effort necessary will depend on both the results of initial evaluation of requirements, and the subsequent code familiarisation.
Plan:
- Familiarisation with existing code base
- Decide the set of desirable enhancements - and their relative priorities
- Implement enhancements - to the degree that seems reasonable in the light of the code familiarisation
Dependencies:
This project follows on from the Monitoring system upgrade project: there would be no point now making enhancements to the Nagios v2 infrastructure.
The MPU is expected to provide a standard client-side facility to monitor RAID - see Project 134 - Server Hardware Interaction.
Risks:
Milestones
| Proposed date | Achieved date | Name | Description |
|---|