You are here

Content Addressable Storage (CAS) Encrypted Backup

Project ID: 
130
Current stage: 
Manager: 
Unit: 
What: 

Description: A backup system which uses content addressable storage (to avoid
storing duplicate files) and encryption. This will build on an
existing MSc project

Deliverables: Backup server (Linux), at least one client (Mac).

Why: 

Customer: All self-managed

Case statement: This is Paul's "pitch", verbatim:

Background:

  • Disk space on personal computers is increasing.
    Several 100Gb is currently typical.
  • Laptops especially are now often seen as "personal" machines.
    They hold personal and work data.
    It is difficult (impossible?) to automatically distinguish these.
  • Most users (for home or work use) have ad-hoc (at best) backup schemes.
    These usually involve only a subset of the data with irregular backup times.
    The security of the data is questionable - is it encrypted? where is it stored?
  • Corporate backup schemes are often inappropriate.
    They can't handle the volume of data.
    They don't guarantee security of personal data (encryption).
  • Home backup schemes may be convenient, but have other problems ...
    Eg. Time Machine is not encrypted (what if the disk is stolen?)
    And it is not stored off-site (what if there is a fire?)
    And it doesn't work well if the laptop is encrypted ...
  • "Cloud"-based backup schemes are becoming popular ...
    Many offer encryption, but ...
    They are very slow for the typical volumes of data.
    They are proprietary and the user depends on the continued existence of the specific service.

The Concept:

  • Using a "content-addressable" storage technology, it is possible to store only
    a single copy of files (or even parts of files) which are common to multiple users.
    This has significant savings when there are lots of users - eg. many of them
    have the same OS and application files.
  • Encrypting files at the client end is normally incompatible with content-addressable
    storage, because the same file encrypted with different keys will appear in the backup
    as a different file.
  • We have an algorithm which we believe supports encrypted, shared storage. If
    a number of users backup to the same system, there will only be one copy of all
    the common files (os, applications, shared documents, etc). This saves
    considerably on space and (more importantly?) backup time. In addition, each user's
    files will only be accessible with their own key.
  • To implement this, we would need to build a a backup server.
    And a client for each platform.

Who would use this?

  • Individual users may use this at home to backup multiple family machines.
  • Departments may use this to backup user's laptops (or desktops).
  • Service providers may offer this as a service "in the cloud".

What would I like to do?

  • I currently have an MSC project implementing a prototype to explore the concept.
    This will not be sufficient to evaluate it for commercial use (performance etc.)
  • It would be good to build a realistic client and server to evaluate performance.
    Maybe a server for Linux. Maybe a client for the Mac.
    Maybe the server would be open source (to encourage implementation of compatible clients).
    Maybe the client would be charged.
  • It would be good to do a marketing survey
    Exactly what else is available. And how does the performance compare.
When: 

Status:

Timescales: Plan to start in November.
Programmer is employed for 5 months.

Priority:

Time:

How: 

Proposal:

Resources: CO involvement (Toby) during project, see Plan for further details. Probably 1-2 weeks CO time in total.

Plan: Paul's plan, slightly edited for context, from email 2009-10-01:

Plan is to start in November.

My current (rough) plan is ...

  1. 1 month or so background & planning -
    • look at filesystem details, attributes, ACLS, etc ...
    • design rough architecture
    • think about possible optimisations (to implement now or later)
    • think about what is needed at the the server end (cloud?)
  2. 3 months head down implementation
  3. 1 month testing & evaluation

I'd really just like to keep in touch with the COs, and get a bit of practical help.
So what I think I'd want from you is ..

During (1), join in the discussions on the design - say one or two brainstorming
meetings a week & some email.

During (2), do a bit of testing/evaluation of the code. Maybe set up anything
simple that is needed at the server end - probably just filespace, but maybe
a few simple CGIs or something.

During (3) do some testing and perhaps find/persuade some people to try it
out for an evaluation of the performance.

So ... I guess that the larger time commitment would come at the beginning and
the end.

Of course the money is tight so there won't be any flexibility in the timing, because
we will only have the programmer for a fixed 5 months ...

Ultimately, there may be a practical benefit for Informatics if it
turns into something useable. If not, I guess we all learn something
at least ...

Other: 

Dependencies: None

Risks:

Milestones

Proposed date Achieved date Name Description