Faster than a speeding bullet. More powerful than a locomotive. Able to calculate more than one quadrillion floating point operations per second…. Look, up in the labs of Cray and IBM: It’s high-speed supercomputing on a scale never before seen!
Fighting to keep America competitive in the world supercomputing market, the High Productivity Computing Systems (HPCS) program of the United States Defense Advanced Research Projects Agency (DARPA) is on a mission. Its goal: to develop new high-end programming environments, software tools, architectures and hardware components that will ultimately drive the next generation of economically viable high-productivity computing systems.
The program, begun in 2002, is currently in Phase III, design and development. Responsible for that work are supercomputer manufacturers Cray and IBM, which were awarded contracts for $250 million and $244 million, respectively, in November 2006. The prototype petascale machines Cray and IBM are developing, to be demonstrated by December 2010, will be smaller than DARPA HPCS’s goal of two petaFLOPS sustained performance (scalable to greater than four petaFLOPS), but still more powerful than today’s fastest supercomputer. They will also boast state-of-the art petascale storage systems using the latest storage and I/O technology.
Already high-performance computing is used in cryptanalysis, the planning and execution of military scenarios, weather and ocean forecasting, the engineering of large aircraft, ships and other large structures, and in scientific and financial modeling. And DARPA HPCS hopes its latest supercomputing efforts will spur further advances in these and other areas by improving performance, programmability, portability, robustness and storage on a pestascale.
“A productive system requires a programming environment that is easier to use and that has less of a learning curve than the environments on today’s HPCs,” says William Harrod, DARPA program manager for HPCS. “It will need a computer architecture and operating system that enables more efficient execution. When a component fails, it should not bring down the application, so the system requires a comprehensive robustness strategy,” as well as a highly reliable storage strategy, problems that both Cray and IBM are addressing as part of Phase III of the project.
Storage on a Pestascale
Peter Rigsbee, senior product marketing manager at Cray, said storage and I/O play a key role in Cray’s HPCS development program, code-named Cascade.
“Petascale applications require a very high-performance scratch file system to ensure that storage is not a bottleneck preventing these applications from obtaining full system performance,” Rigsbee explains. To ensure these applications are properly supported, Cray will be using Lustre, a highly reliable and scalable file system from Cluster File Systems Inc.
Petascale computer centers require permanent storage that must be shared with other systems, says Rigsbee. “The permanency of this storage requires it be cost-efficiently archived and protected for many years. … Cray’s solution will rely on — and help advance — storage industry investments being made in technologies such as NFSv4 and pNFS, scalable NAS and MAID-based virtual tape,” which will no doubt eventually trickle down to enterprises.
Big Blue Automates Tiered Storage
For its Phase III work, IBM is utilizing its Roadrunner and other IBM supercomputing systems to create a highly scalable, multi-tiered system for petabyte-class storage. IBM’s General Parallel File System (GPFS), which handles more than two petabytes of online storage, will serve as primary storage, with its High Performance Storage Subsystem (HPSS) as the backup tape system.
As IBM Distinguished Engineer Rama Govindaraju explains, “IBM is implementing a novel hierarchical storage management system (HSM) based upon a tight integration of its GPFS and HPSS products,” which IBM hopes will address “the most significant problems of traditional archive systems.”
The system uses GPFS storage pools and policies to implement multi-tiered storage, says Govindaraju. “Placement policies can be defined to locate newly created files in a widely-striped, high-performance storage pool on high-performance disks (e.g., enterprise SAS),” he says. “As files age, migration policies move them to cheaper storage automatically, transparently and in parallel.” Instead of users archiving their own files to tape, the GPFS policy engine automatically archives files.
Additionally, notes Govindaraju, the GPFS/HPSS interface is being extended to support backup and restore. Backup will be driven by the GPFS policy manager, just like HSM, and takes advantage of the parallel data movement architecture of HPSS, he says. Backup and HSM share the same repository — that is, backup and HSM use the same copy of the file on tape. Backup policies leverage GPFS file system snapshots for consistent point-in-time backups and implement full and incremental backup on administrator-defined schedules. The policy language allows filtering of file attributes (name, type, owner, etc.), so paging files and the like can be eliminated from the backup.
As for future applications of the storage-related technology, Govindaraju says that “this technology, fired as it were in the crucible of high-performance computing, will eventually have a profound impact on the low end, including enterprise computing and our Fortune 500 customers.”