For the last 15 years, scientists at the J. Craig Venter Institute (JCVI) have been doing pioneering work sequencing the DNA of hundreds of organisms and exploring the world’s oceans to discover new organisms and their millions of genes and proteins.
At the same time, JCVI’s IT team has been doing pioneering work of its own, figuring out how to cost effectively classify and manage all that data — about three billion 64KB to 100MB files that occupy about 100TB of storage.
It was a challenge further complicated by the non-profit institute’s limited budget and the limited number of information classification solutions that could scale to manage that amount of data. Indeed, according to Steve d’Alencon, vice president of product marketing at Kazeon Systems, the information classification solution provider JCVI ultimately wound up using, before Kazeon’s Information Server, there was no enterprise data classification and management solution on the market that could scale to handle so much unstructured data.
In Search of a Data Classification System
Rajeev Karamchedu, JCVI’s director of IT operations, backed up d’Alencon’s claim. He personally spent years testing several different products before ultimately choosing the Kazeon Information Server IS1200. Many of those solutions, which Karamchedu declined to name, did an adequate job of managing structured data (data base files), but when it came to managing JCVI’s 100TB worth of unstructured data, “that was a very hard problem to solve,” Karamchedu said.
JCVI stores its files in volumes, with each volume containing years’ worth of genomic research data. Because users need to be able to access and update files at any given time, the IT team needed a system that could effectively classify and manage the volumes of accumulated data and analyze it based on a number of different criteria, such as access, age, utilization by user and document type.
Frustrated with his latest solution, which Karamchedu said got to the point where it was costing more to make the product work than the value it was delivering, he reopened his search for a data classification and management system and happened upon Kazeon.
Finding a Scalable NFS Solution
After several discussions with Kazeon’s pre-sales engineers, Karamchedu and his team tested the Kazeon solution in-house, helping Kazeon tweak the code to better suit JCVI’s needs. By the time IT was ready to put the system into production, it was very straightforward and “quite easy,” said Karamchedu. And since going live, the IS1200 has performed as expected.
“Kazeon’s Information Server managed our unique requirements and has allowed us to identify data for effective tiered storage,” said Karamchedu. “It integrates extremely well with NFS and that’s what we needed.”
Moreover, with the Kazeon Information Server, JCVI is able to set policies to index and classify data based on specific criteria, making it easier for JCVI to locate and access files as well as implement multiple tiers of storage.
“The Kazeon Information Server uses information access technology to deliver an extremely flexible, scalable and cost-effective platform to search, classify and act on electronically stored information,” stated d’Alencon. And because of its scalability, customers like JCVI can start off using the Information Server for storage optimization and then graduate to information lifecycle management (ILM) using the same platform, helping to control costs.
Karamchedu’s one gripe (if you can call it that) with the solution is its reporting abilities, which he said are currently “basic.” But he noted that Kazeon is working on an API that would allow JCVI (and other customers) to generate custom reports in Excel format, which he is looking forward to.
Advice for Storage Administrators
Due to budgetary and technology limitations, Karamchedu had to come up with several workarounds and point solutions before he found an appropriate solution for managing JCVI’s unstructured data. But if he could do things differently, he said, he would. In particular, he cautioned storage administrators against using point solutions.
“SRM or information management should not start after you’ve realized that you’ve got a problem,” he said. “It should start the day that you put your NAS or SAN gear in the data center.
“If I were to start from a clean slate today, there are a lot of things that I would repeat, but there are certain things that I would insert here and there on the back end that would have made my life a lot better now — for example, inserting a tiered terminology, even though one does not have a tiered system … so when one does have a tiered system in place, it’s second nature to users, to admins, to everyone.”