Categories

RSS Aggregator

LoCloud is a Best Practice Network of 32 partners, co-funded under the CIP ICT-PSP Programme of the [...]

Some futurist scenarios are more probable than others. But even those that seem wildly improbable he [...]

Since its launch last year, the Digital Public Library of America (DPLA) has been working hard on br [...]

At the end of the first season of Downton Abbey in a scene that is exemplary of the serie’s greatnes [...]

a day for bringing together anyone interested in opening up archives for research and reuse The sess [...]

Guest blog by Emily Gore from DPLA. This article originally appeared on the DPLA website.   Image cr [...]

Heidi Blake and Jonathan Calvert who run The Sunday Times investigative Insight team placed an adver [...]

Digital Atlas of the Roman EmpireJohan Åhlfeldt, Lund, Sweden. And see AWOL’s Roundup of Resources o [...]

In getting ready to make a transition from digital preservation and repository development at the Li [...]

Open Access (free to read) articles on archaeology:The Archaeology of Volcan Mountain, San Diego Cou [...]

ROMAQ: The Atlas Project of Roman Aqueducts Roman aqueducts are amongst the most impressive and inte [...]

The National Digital Stewardship Alliance is forming an Education and Training group as part of the [...]

Developers from the New York Times have released some open source software meant for displaying and [...]

Attached are the slides from my recent talk, “Ballad Sheet Forensics, Preservation, and the Digital [...]

Our feelings toward April are mixed. As Edna St. Vincent Millay writes in “Second April” To what pur [...]

Top Subscribed RSS

Top Contributors

Protect Your Data: Storage and Geographic Location

This post is about row one column one, the first box, in the levels of digital preservation.

This post is about row one column one, the first box, in the NDSA levels of digital preservation.

The NDSA levels of digital preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell.

To that end, we are excited to start a new series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.

There are 20 cells in the five levels, so there much to discuss. We intend to work our way through each cell while expounding on the issues inherent in that level. We will define some terms, identify key considerations and point to some secondary resources.  If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.

Let’s start with row one cell one, Protect Your Data: Storage and Geographic Location.

The Two Requirements of Row One Column One

There are only two requirements in the first cell, but there is actually a good bit of practical logic tucked away inside the reasoning for those two requirements.

Two complete copies that are not collocated

For starters you want to have more than one copy and you want to have those two copies in different places. The difference between having a single point of failure and two points of failure is huge.   For someone working at a small house museum that has a set of digital recordings of oral history interviews this might be as simple as making a second copy of all of recordings on an external hard drive and taking that drive home and tucking it away somewhere. If you only have one copy, you are one spilt cup of coffee, one dropped drive, or one massive power surge or fire away from having no copies. While you could meet this requirement literally by simply making any type of copy of your data and taking it home, it will become clear that this alone is not going to be a tenable solution for you to make it further up the levels in the long run. The point of the levels is to start somewhere and make progress.

With this said, it’s important to note that all storage media is not created equally. The difference in error rates between something like a flash drive on your key chain, to an enterprise hard disk or tape is gigantic. So gigantic in fact that from error rate alone, you would likely be better off only having one copy on a far better quality piece of media than having two copies on something like two cheap flash drives. Remember though, the hard error rate of the storage devices is not the only factor you should be worried about. In many cases, human error is likely to be the biggest factor that would result in data loss, particularly when you have a small (or no) system in place.

“Complete” copies are an important factor here. Defining “completeness” is something worth thinking through.  For example, a “complete copy” may be defined in terms of the integrity of the digital file or files that make up your source and your target.   At the most basic level, when you make copies you want to do a quick check to make sure that the file size or sizes in the copy are the same as the size of the original files. Ideally, you would run a fixity check, comparing for instance the MD5 hash value for all the first copies with the MD5 hash value of the second copies. The important point here is that “trying” to make a copy is not the same thing as actually having succeeded in making a copy.  You are going to want to be sure you do at least a spot check to make sure that you really have created an accurate copy.

For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the media and into your storage system

A recording artist ships a box full of CDs and hard disks to their label for production of their next release. A famous writer offers an archive her personal papers and includes two of her old laptops, a handful of 5.25 inch floppies, and a few keychain quality flash drives. An organization’s records management division is given a crate full of rewritable CDs from the accounting department. In each of these cases, a set of heterogeneous digital media have ended up on the doorstep of a steward often with little or no preliminary communications. Getting the bits off that media is a critical first step. None of these methods of storage are intended for long term; in many cases things like flash drives and rewritable CDs are not intended to function, even in optimal conditions, for more than a few years.

So, get the bits off their original media. But where exactly are you supposed to put them? The requirement in this cell suggests you should put them in your “storage system.” But what exactly is that supposed to mean? It’s intentionally vague in this chart in order to account for different types of organizations, resource levels and overall departmental goals.  With that said the general idea is that you want to focus on good quality media (designed for longer rather than shorter life), for example “enterprise quality” spinning disk or magnetic tape (or some combination of the two), and a way of managing what you have.  For the first cell here, the focus is on the quality of the media. However, as requirements move further along it is going to become increasingly important to be able to be able to check and validate your data. Thus easy ways to manage the data on all of your copies becomes a critical component of your storage strategy. For example, a library of “good” quality CDs could serve as a kind of storage system. However, managing all of those pieces of individual media would itself become a threat to maintaining access to that content. In addition, when you inevitably need to migrate forward to future media, the need to individually transfer everything off of that collection of CDs would become a significant bottleneck for being able to move to future media. In short, the design and architecture of your storage system is a whole other problem space, one not really directly covered by the NDSA Levels of Digital Preservation.

Related Resources

You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Ricky Erway, 2012

The NDSA Levels of Digital Preservation: An Explanation and Uses Megan Phillips, Jefferson Bailey, Andrea Goethals, Trevor Owens

How Long Will Digital Storage Media Last? Personal Digital Archiving Series from The Library of Congress

(47)

Share
metadata entry

Contribution: Trevor Owens

Name: Trevor Owens

URL: link to the original post

Entry: http://blogs.loc.gov/digitalpreservation/2013/12/protect-your-data-storage-and-geographic-location/

Language: English

Format: text/html