You are here

Improvements in mirror management

Project ID: 
137
Current stage: 
Manager: 
What: 

Description: This project would introduce improvements in the way the School carries out its mirroring of non-AFS file data. This would include measures for checking the validity of mirrors and improved tools for setting up and monitoring mirrors

Deliverables: A new suite of tools for setting up and managing the mirroring of non-afs disk space.

One specific requirement is the ability to easily find out where your mirrored machine's data has actually gone, ie which mirror server and location of the mirror on that server. This could be a web page with a search box, or a command line tool, or even a regular email was mentioned.

Why: 

Customer: School

Case statement: This project will provide added guarantees that all the file space which needs to be mirrored is in fact being mirrored and of the integrity of the mirrored data.

When: 

Status: Proposed improvements posted around.

Timescales: 2 weeks allocated.

Priority:

Time: 94 hours (13 days) so far 31/5/2011

128 hours (18 days) so far 1/11/2011

135 hours (19 days) 30/1/2012. Complete.

How: 

Proposal: See https://wiki.inf.ed.ac.uk/view/DICE/MirrorImprovementsProject

Resources: Just CO time.

URL: https://wiki.inf.ed.ac.uk/view/DICE/MirrorImprovementsProject

Plan: As effort is short, then the biggest wins for the least amount of effort. This would seem to suggest just improving our existing rmirror and rsync usage, perhaps just by adding some reporting.

Other: 

Dependencies: None

Risks: If project doesn't report accurately what is being mirrored, then data which people believe is being backed up, may not be.

Milestones

Proposed date Achieved date Name Description
2010-09-20 2010-09-14 prop Get acceptance of the proposed improvements/changes to mirrors.
2011-05-30 2011-04-01 implement Complete the implementation of the proposed changes, eg new mirror client component, changes to existing rmirror. rmirror and rmirrorclient component now in a usable state. Anything further will be fine tuning.
2011-05-30 2011-04-16 test Run a test server and client(s) to check things are working as expected. Tests show that you can mix and match old and new mirror client and server, but you only get the real benefit if both ends using the new components and the MACROs from the header files.
2011-11-30 2011-04-30 migrate Assuming testing goes well, migrate existing mirror servers and clients to new components. This has started as part of the testing. Not complete though. All servers done, about half the clients. The rest will happen as part of operational work.
2012-01-30 2011-12-15 finish Produce report ready for sign off in Dec.
2011-04-01 2011-03-10 imp1.0 Add an "is taped" helper resource to server and implement "is taped" script. Probably use /usr/tibs/tera -Q to determine if mirror partition is taped or not. Do we call the script with the dest dir as an argument, or do we query that in the script from resources ourself. Probably pass as param.
2011-05-18 2011-03-20 imp2.0 Implement "report" method on server. A machine readable format, a human readable format (based on the machine format) and then an aggregate/all option to show the report from all mirror servers in this cluster. Send emails when an error is spotted. - Done barr the email sending. Needs more thought.
2011-03-26 imp3.0 Look at changing the run locking so that if run is called, and it is already locked, then return (almost?) immediately with an error or warning that a run is already ongoing. There's doesn't seem to be a need to do this. So as we're short of time, I propose not doing this.

This will no longer be done.

2011-03-30 imp4.0 See if nagios is really necessary. If reporting is finished, and acceptible, then perhaps no need to go through the nagios learning curve again. Thus delaying this project even further. Given the elapsed time spent so far on this project. I propose skipping this feature.

This will no longer be implemented. email reports have been added.

2011-09-10 2011-06-08 mig2 As part of the migration, finish mirror report web page http://groups.inf.ed.ac.uk/cos/mirrors/ to report on state of mirrors. Try to have it done by next ops meeting.