Sun Rises at the Library of Congress
When Enterprise Storage Forum last visited the Library of Congress's groundbreaking audio-visual storage infrastructure in February (see Storing National Treasures), the Library had just sent out its RFP.
Now, after months of reviewing bids and answering questions and more than two years of meetings, planning, research and revision a team of consultants and members of the Library's Information Technology Services group and Motion Picture, Broadcasting and Recorded Sound division made its selection. The winner (drum roll, please): Government Micro Resources Inc. (GMRI), a "leading-edge" IT solutions provider located in the D.C. area, which presented the Library with a Sun-based storage area network (SAN) solution.
The Library received a total of three proposals, all from vendors it had worked with in the past. While to some this may seem like a low number, to the Library it was not.
"We put an awful lot of work into the development of the RFP," said Mike Handy, special projects manager for the Library's National Audio-Visual Conservation Center (NAVCC) preservation storage project. "We got what we thought was an accurate representation of what we needed. We didn't expect to get lots of proposals. We would have been shocked to get more than five or six. I think that the number we got was a reflection of the specificity of the requirements."
Indeed, the very fact that one vendor met all the requirements listed in the 61-page RFP (not including amendments) really stood out, said Handy, who added that GMRI's proposal also "gave the most value to the government."
What's Behind Door Number One
GMRI, which works with several OEMs, considered several of its partner solutions for the NAVCC storage project before choosing Sun Microsystems to fulfill the Library of Congress's RFP.
"As the engineering review progressed, it became clear that the Sun Microsystems/StorageTek solution provided the most viable solution to meet all of the Library requirements, including future growth," said Cathy Boleyn, vice president of business operations at GMRI. "It was a true end-to-end solution from a single OEM."
That solution consists of an AMD Opteron-based Sun system, which has a large memory capacity and high-performance PCI Express buses. This will be attached to a Sun Flex 380 RAID using a 4Gbps RAID controller. To allow file system data and metadata to reside on devices best suited for the access patterns of the different data, the Library is getting 72GB 15K rpm drives for small block random file system metadata and 300GB 10K rpm drives for file system data that will likely have a streaming access pattern. Sun-branded QLogic 4Gbps PDI-E HBAs will provide failover and load balancing for the RAID devices.
Sun T10000 high-performance tape drives, which have a 10E-19 bit error rate and an uncorrected/miscorrected bit error rate of 10E-33, will be installed in two Sun StorageTek SL8500 tape libraries. Hierarchical storage management (HSM) software, also from Sun, will be integrated with a Sun StorEdge QFS file system. This will allow for control of archive policy for both local and remote systems and will manage the migration of files from disk to tape.
However, this is not a classic tiered storage system, says Handy: "Rather, the disks serve as a temporary holding place, or landing zone, for large bursts of data. Movement of data from disk to tape will be managed in a steady-stream manner."
The system is also tailored to allow for efficient disaster recovery.
While some storage experts (including ones that worked with the library) argue that disks are a viable (or better) alternative to tape, the Library felt that a tape-based HSM solution was its best option based on power, space, cooling and monetary issues.
All of the equipment will be kept in a specially designed climate-controlled environment at the new 175,000-square-foot Conservation Building, part of the NAVCC complex in Culpeper, Virginia. Tape cartridges will stay in the tape library until the next generation of tape or some new long-term storage media is available. At that time, the Library will migrate all data to the new cartridges or media and then dispose of the old tapes. By the end of the third year of operation, the system is expected to reach 7 PB.
The Next Steps
The storage team met with GMRI in late June to go over logistics and next steps, including setting up a schedule with benchmarks and deadlines. Throughout the summer, equipment will be arriving at the Library's Madison Building in Washington, D.C., where it will be assembled and tested. At the same time, software developers will be working on programs to interface the archive with other systems at the Library. Once the testing has been completed (probably September) and the Culpeper facility is ready for occupancy (early 2007), the SAN will be disassembled and then reassembled at the NAVCC.
Handy calls that phase "the easy part," though he admits he might wind up eating those words. "It's going to sound clichéd," he says, "but we recently concluded that we had done what you're supposed to do: go and understand the requirements as best you can ... meet with technologists, IT people and AV experts ... develop an architectural model ... turn that architectural model into an RFC ... and gather comments from vendors, so by the time you get the final RFP, you have a very good idea of what the requirements and solutions are likely to be."
While Handy says that that two-year process may sound tedious, he and the ITS/Library team believe it has been well worth the effort. "You're going to spend quite a few resources up front in that process," he notes, "but what you spend up front you save afterwards."
The Big Picture
One of the most challenging aspects of the Library of Congress' NAVCC preservation storage project was that it was barely chartered territory. While there are several large institutions around the world that have been coping with digitally storing massive amounts of audio visual material (including NASA and the British Library), there is not a worldwide or even a local consensus on the best or most cost-effective way to do this.
Even the Library of Congress uses several different storage solutions in house, which is why in May of this year the Library's ITS team gathered experts from several storage OEMs, members of the digital preservation community and the Library to openly discuss the issues and misconceptions around this increasingly important technology. Enterprise Storage Forum will look at what the Library discovered about designing storage architectures for preservation collections in an upcoming article, and will re-visit the Library later this year to check on the progress of the new Archive SAN solution.
For more storage features, visit Enterprise Storage Forum Special Reports