Leaders Offer Insight on Preserving Digital History
In May, the Library of Congress, which is in the process of implementing one of the world's largest audio-visual preservation projects, brought together leaders in the enterprise storage and digital preservation communities to discuss this relatively new field and the challenges institutions like the Library face. The meeting was the first of its kind and attracted experts from IBM, Seagate, HP Labs, Engenio, Harvard, Stanford and the San Diego Supercomputer Center, among others.
For Mike Handy, special projects manager for the Library's new National Audio Visual Conservation Center storage infrastructure, the meeting was a way for the Library's digital preservation and storage team to validate its decision to go with a tape-based hierarchical storage management solution (see Sun Rises at the Library of Congress), share their findings with a broader community, and learn about the latest as well as up-and-coming preservation storage solutions from the experts.
Tape vs. Disk
"One of the real drivers for the meeting was to have a discussion about why it was we went with a large tape library solution, whereas the Internet Archive [also represented at the meeting], for example, went to the other end of the spectrum, using great numbers of small computers connected to commodity hard drives for their preservation collection," says Handy. "There are good reasons to do both of those, but they're not the same reasons."
Indeed, some of the most heated discussions of the day revolved around tape versus disk, with representatives from IBM and Imation arguing for tape and the representative from the Internet Archive (and others) arguing for disk.
"The thing I found most interesting was that I had expected at the end of this discussion to have some notion that there were two or three other storage platforms between the two extremes," says Handy. "But in listening to some of the discussions about tape and the speeds with which you can access tape these days, I'm less inclined to think we have a lot of weigh stations along that continuum."
The following is a list of talking points opinions, statements, and facts (the Library doesn't say which is which) that the Library of Congress presented at its May 24 Designing Storage Architectures for Preservation Collections meeting:
Among the other solutions presented at the meeting was the LOCKSS Program, initiated by Stanford University Libraries and which the British Library uses for its preservation needs. LOCKSS, which stands for Lots of Copies Keep Stuff Safe, "is open source software that provides librarians with an easy and inexpensive way to collect, store, preserve and provide access to their own local copy of authorized content they purchase," according to lockss.org. And unlike the Library of Congress's new tape-based Sun storage solution, LOCKSS is a peer-to-peer system that runs on standard desktop hardware and requires almost no technical administration.
Another trend that caught the attention of Handy and his Library colleague Jane Mandelbaum was shared storage.
"Say you, the University of Umptiyump, have a bunch of digital material that you want to put somewhere. For whatever reason you don't want to support it locally," says Handy. "You can send it to San Diego [to the San Diego Supercomputer Center]. They have a fixed fee on a per terabyte basis, an annual fee that you pay to have them store the data for you. That to me was one of the more interesting discussions of the whole day, because I don't know anybody who can tell you to the level of detail that San Diego can about how much it costs to provide that service."
"I think it's an indication that people are moving back to what we used to call the service bureau model, when people didn't have their own data centers and used to contract with service bureaus to get computing services and run their applications," adds Mandelbaum, another special projects manager who helped organize the meeting.
"The pendulum swings back and forth," she says, "and at the moment I think [shared storage or storage as a service] is something the digital preservation community is interested in exploring, whether it's a distributed and/or replicated and/or backed-up storage where you either have your storage somewhere else or you distribute it for the purpose of disaster recovery or some combination of the two."
Continuing the Conversation
As of now, there are no plans for a second meeting, though the Library hopes to keep the conversation going through other means. To that end, Handy and his team recently put together a wiki "an experiment as to whether we could foster an ongoing discussion," he says. Several attendees also suggested going on site visits, particularly to the Internet Archive to see its storage preservation solution up close and personal. And there was talk of collaboration on future preservation storage projects, particularly at the Library of Congress, which is in the midst of implementing its new tape-based Sun preservation storage solution.
While no one professed to having a change of heart, Handy said everyone in attendance agreed that the one-day session was useful. And who knows? The ongoing conversations the conference fostered may very well lead to the next big storage preservation solution.
For more storage features, visit Enterprise Storage Forum Special Reports