Replicating and Storing DNA
Mostafa Ronaghi, the principal investigator at the Stanford Genome Technology Center (SGTC), is an expert in biotechnology who holds many patents in the area of DNA sequencing and developed a technology called Pyrosequencing that was used to map the human genome. Yet when it came to data storage, Ronaghi was no expert.
"We are biologists," he said, referring to his SGTC team. "We don't know anything about [storage-related] software, and our tolerance is very low as far as technology is concerned."
So when Ronaghi went looking for a way to safely store and replicate the enormous files SGTC's DNA Pyrosequencing Machines were producing, he made it clear to vendors that the solution he chose had to be easy to use, easy to scale, and the stored data easy to access.
Per Ronaghi, a single ultra-high-resolution image can be between 33 and 160 MB after processing, but about 60 to 70 GB when you include the raw data. And that very valuable (and expensive) data needed to be safely stored, and replicated at least once, to ensure the information was available in case of a machine failure and to allow Ronaghi and his team to share it with collaborators at other scientific research institutions.
Until recently, SGTC was using an archive system to store its ultra-high-resolution/data-intensive files. "But tracking those files was a headache," said Ronaghi. And the system was painfully slow when it came to accessing material. So Ronaghi was on the lookout for a better solution.
In 2006, Ronaghi heard about Parascale, which was developing a virtual storage network (VSN) that could run on commodity hardware, such as x86 Linux or Dell servers. Ronaghi contacted Parascale, and as soon as the Parascale VSN was available, became one of the first customers.
SGTC took delivery of its Parascale VSN in early Fall 2007, and so far is very happy with it.
Installation was very easy, said Ronaghi, as was using the software. "It has a very nice GUI, very simple, so we can track what the load is on the hardware. Parascale allows us to set rules, and to maintain three or four copies of each image. So if we lose two disk drives or two servers, we feel safe that we won't lose the data."
Ronaghi also liked that the Parascale VSN could run on SGTC's Dell x86 servers, meaning SGTC didn't have to buy additional hardware or appliances, and could scale up as necessary. Currently, SGTC has the ability to store 3 TB of data, but it plans on expanding the system to store 10 TB in the first half of 2008.
Besides giving SGTC the ability to incrementally scale to meet its storage requirements, the Parascale VSM is fast. "It scales both in storage capacity and bandwidth," said Ronaghi. "And that was quite unique for us because with the [archive] system, you could increase storage but [not] bandwidth." With Parascale, he said, more users can access the system without slowing it down, which is a definite plus.
A Scalable Commodity Solution
What differentiates the Parascale VSN from other storage solutions is its ability to store very large files, such as ultra-high-resolution images, using software that runs on commodity x86 servers, and its scalability.
"We are the only storage solution that can be deployed for file storage on industry-standard hardware," stated Bill Evans, the CEO of Parascale. "There are storage solutions that require NAS boxes. There are other storage solutions that require SAN boxes. But all of these products are custom hardware-software combinations. We provide software that you can run on computers you can buy at Best Buy. And that's a big deal."
As is Parascale VSN's scalability.
"The challenge that Stanford has is that ... they're dealing in ultra-high resolution images ... which are very big. But rather than having to do a forklift upgrade [i.e., having to throw out a 10 TB NAS or SAN box and replace it with a 50 TB one every year or so], they can incrementally expand their storage by simply networking into their deployment another machine or a couple more machines ... because what we offer is a network solution rather than an appliance solution," explained Evans. "So if they deploy us today with three storage nodes, they can subsequently add more disks to each storage node or they can add additional storage nodes, all of this without interrupting the operation of their deployment and most of all without having to do a forklift upgrade."
Those things combined mean cost savings for customers like SGTC.
As for who is a prospective Parascale VSN customer, Evans said, "As soon as you need more than one server to manage your storage, we're a fit." More precisely, the product is a good choice for enterprises with a minimum of 3 TB of data to be stored, that use x86 servers and like the idea of incremental expansion.