Leaders Offer Insight on Preserving Digital History

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In May, the Library of Congress, which is in the process of implementing one of the world’s largest audio-visual preservation projects, brought together leaders in the enterprise storage and digital preservation communities to discuss this relatively new field and the challenges institutions like the Library face. The meeting was the first of its kind and attracted experts from IBM, Seagate, HP Labs, Engenio, Harvard, Stanford and the San Diego Supercomputer Center, among others.

For Mike Handy, special projects manager for the Library’s new National Audio Visual Conservation Center storage infrastructure, the meeting was a way for the Library’s digital preservation and storage team to validate its decision to go with a tape-based hierarchical storage management solution (see Sun Rises at the Library of Congress), share their findings with a broader community, and learn about the latest as well as up-and-coming preservation storage solutions from the experts.

Tape vs. Disk

“One of the real drivers for the meeting was to have a discussion about why it was we went with a large tape library solution, whereas the Internet Archive [also represented at the meeting], for example, went to the other end of the spectrum, using great numbers of small computers connected to commodity hard drives for their preservation collection,” says Handy. “There are good reasons to do both of those, but they’re not the same reasons.”

Indeed, some of the most heated discussions of the day revolved around tape versus disk, with representatives from IBM and Imation arguing for tape and the representative from the Internet Archive (and others) arguing for disk.

“The thing I found most interesting was that I had expected at the end of this discussion to have some notion that there were two or three other storage platforms between the two extremes,” says Handy. “But in listening to some of the discussions about tape and the speeds with which you can access tape these days, I’m less inclined to think we have a lot of weigh stations along that continuum.”

Stimulating Storage Debate

The following is a list of talking points — opinions, statements, and facts (the Library doesn’t say which is which) — that the Library of Congress presented at its May 24 Designing Storage Architectures for Preservation Collections meeting:

  • Disk storage is now so cheap it doesn’t make sense not to use it for all storage needs.
  • Disk densities cannot continue to increase at the same rate they have up to this point.
  • Tape densities have much more room for improvement.
  • Storage will quickly become a SAAS (Storage as a Service) commodity.
  • Checksum calculation and maintenance is an essential element of the storage architecture.
  • Life expectancies of electronic storage media (20-30 years) are irrelevant, since migration from one media to the next will result in continual refreshment.
  • Total cost of ownership for disk should include power and cooling requirements.
  • Tape is not appropriate for online or random access service.
  • Keeping media permanently mounted enhances usability and sustainability.
  • Using disk and then “spinning disks down” until needed is the best way to preserve bits long term.
  • Disks are disks; the distinctions between one disk technology and another are inconsequential in terms of reliability.
  • Latency issues will prevent distributed storage schemes from scaling economically.
  • With current technology the underlying topology of the data within the storage device(s) is not known to those devices.
  • File system technology and standards limit storage scaling.
  • Once an item has been declared as archival, it will not change (i.e., curator should not be allowed to edit it further).
  • Storage system reliability can be a factor either of the products themselves and/or monitoring and management.

Among the other solutions presented at the meeting was the LOCKSS Program, initiated by Stanford University Libraries and which the British Library uses for its preservation needs. LOCKSS, which stands for Lots of Copies Keep Stuff Safe, “is open source software that provides librarians with an easy and inexpensive way to collect, store, preserve and provide access to their own local copy of authorized content they purchase,” according to lockss.org. And unlike the Library of Congress’s new tape-based Sun storage solution, LOCKSS is a peer-to-peer system that runs on standard desktop hardware and requires almost no technical administration.

Shared Storage

Another trend that caught the attention of Handy and his Library colleague Jane Mandelbaum was shared storage.

“Say you, the University of Umptiyump, have a bunch of digital material that you want to put somewhere. For whatever reason you don’t want to support it locally,” says Handy. “You can send it to San Diego [to the San Diego Supercomputer Center]. They have a fixed fee on a per terabyte basis, an annual fee that you pay to have them store the data for you. That to me was one of the more interesting discussions of the whole day, because I don’t know anybody who can tell you to the level of detail that San Diego can about how much it costs to provide that service.”

“I think it’s an indication that people are moving back to what we used to call the service bureau model, when people didn’t have their own data centers and used to contract with service bureaus to get computing services and run their applications,” adds Mandelbaum, another special projects manager who helped organize the meeting.

“The pendulum swings back and forth,” she says, “and at the moment I think [shared storage or storage as a service] is something the digital preservation community is interested in exploring, whether it’s a distributed and/or replicated and/or backed-up storage where you either have your storage somewhere else or you distribute it for the purpose of disaster recovery or some combination of the two.”

Continuing the Conversation

As of now, there are no plans for a second meeting, though the Library hopes to keep the conversation going through other means. To that end, Handy and his team recently put together a wiki — “an experiment as to whether we could foster an ongoing discussion,” he says. Several attendees also suggested going on site visits, particularly to the Internet Archive to see its storage preservation solution up close and personal. And there was talk of collaboration on future preservation storage projects, particularly at the Library of Congress, which is in the midst of implementing its new tape-based Sun preservation storage solution.

While no one professed to having a change of heart, Handy said everyone in attendance agreed that the one-day session was useful. And who knows? The ongoing conversations the conference fostered may very well lead to the next big storage preservation solution.

For more storage features, visit Enterprise Storage Forum Special Reports

Jennifer Schiff
Jennifer Schiff
Jennifer Schiff is a business and technology writer and a contributor to Enterprise Storage Forum. She also runs Schiff & Schiff Communications, a marketing firm focused on helping organizations better interact with their customers, employees, and partners.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.