The two will conduct research into “active storage,” an effort to shift computation and transformation of data from client computers to storage devices. Eng Lim Goh, SGI’s chief technology officer, claims the effort “holds the promise of dramatic productivity breakthroughs for a broad range of computing disciplines saddled by large data.”
The alliance, which is part of a long-term collaboration between PNNL and SGI, includes options for more than 2.5 petabytes of storage over the next two years. As part of the first phase, SGI Professional Services will deliver a 380 terabyte file system this summer to the William R. Wiley Environmental Molecular Sciences Laboratory located at PNNL.
PNNL scientists will be able to take raw data sets stored on the file server and conduct computations to identify data signatures and patterns before the data is transferred to client systems.
“By developing methods to perform computing inside the file system, we will be able to reduce the amount of redundant data transfers, which routinely undermines productivity and lengthens the time to solution,” states Scott Studham, PNNL associate director for advanced computing.
The new file system is expected to sustain write rates in excess of 8Gbps and demonstrate single client write rates higher than 600Mbps. The system will leverage Lustre, an open source, object-oriented file system with development lead by Cluster File System Inc., with funding from the Department of Energy. Lustre is used on four of the top five supercomputers, including the PNNL cluster based on 1,900 Intel Itanium 2 processors.
SGI also plans to evaluate how the research effort can contribute to the evolution of the company’s SGI InfiniteStorage CXFS shared file systems.
Pushing the Boundaries
“The scientific and HPC community continues to push the boundaries of physical and software computing systems by creating and consuming ever larger data structures,” states William Hurley, senior analyst at the Enterprise Storage Group’s Enterprise Application Group.
The SGI product combination “puts SGI ahead of many storage and system vendors offering hardware or software … that are meant to provide a simplified, yet resilient means to access distributed data,” Hurley told Enterprise Storage Forum.
Hurley says the use of the Lustre and CXFS file system as a base Data Abstraction Layer provides common, shared access to a large data set while reducing the operating burden on distributed network and compute resources.
“The desire of many large and small organizations to achieve this operating condition for their file and relational data is high,” he says. “Only a handful of companies are delivering this type of functionality today.”
“Giving commercial organizations the ability to deploy a Data Abstraction Layer provides app developers, business analysts and IT operations staff a standardized data medium that expedites app development and reduces data errors that cause inaccurate reporting outcomes, while easing the burden on physical system resources and their management,” Hurley says.
Back to Enterprise Storage Forum