MIT Has Big Plans for IP Storage

MIT on Monday announced plans for a petabyte-scale IP storage system that can be “managed by a few graduate students,” in the words of MIT Media Lab Director Frank Moss.

The system, based on Zetera’s Z-SAN technology in collaboration with Bell Micro, Marvell and Seagate, will be used for the lab’s “Human Speechome Project” to collect and analyze video and audio data to better understand early childhood cognitive development.

Associate Professor Deb Roy has spent much of the last year compiling 12-14 hours of video per day of his nine-month old son in an effort to better understand early childhood learning and socialization data.

The digital audio and video data, collected at a rate of 350GB per day, will be processed and analyzed using a suite of data mining tools that Roy and his team have been developing. By mid-2008, the information will be assembled into a database exceeding one petabyte, to be processed and analyzed by several hundred parallel processing devices.

Zetera chief marketing officer Doug Glen said storage systems the size of the MIT system aren’t unique, but claimed, “It is unprecedented to build it with the simplicity, scalability and cost of the one we are talking about today.”

Requirements of the system include reads/writes in excess of 160 gigabits/second, shared volumes in excess of several hundred terabytes, scalability from an initial 50 terabytes to capacity well in excess of a petabyte, 100 percent data redundancy, file access by computers running multiple operating systems, a fully virtualized storage fabric, and affordability via low-cost, high capacity SATA hard drives.

Zetera senior product marketing director Jeff Greenberg said each component of the Storage over Internetworking Protocol (SoIP) system will access the system directly, with no RAID controllers to slow performance.

The system will ultimately be composed of more than 3,000 Seagate SATA drives, 300 Hammer Z-Rack storage enclosures, 100 Marvell-based 10G/GbE switches, and about 400 blade processors. It will process 700 terabytes of data during each 12-hour overnight analytical run. 150-drive stripes (aggregated virtual volumes) will be created using the native virtualization capabilities of Z-SAN. Protection against data loss will be delivered through RAID 10 mirrors (duplicate copies) of the raw video data, transform data and metadata files.

StorageIO founder and senior analyst Greg Schulz said the project is interesting — if a little overhyped.

“It is a cute little project, will get some press, however, it’s hardly something that has the storage vendors quaking in their boots over for now,” Schulz told Enterprise Storage Forum.

“Every few years someone does a science project like this to show how new technology and lower-cost technology can scale or change thinking,” Schulz continued. “It then typically takes a few years to productize and commercialize for turnkey business solutions even on a smaller scale to actually be delivered.”

Schulz took issue with calling the system an “array.”

“To say that a collection of nodes is an array is an unfair comparison to non-clustered solutions like those from EMC, HDS, IBM, etc.,” he said. “A more appropriate comparison would be how does this solution scale in capacity, performance, functionality and so forth when compared to peer, clustered, and grid-type solutions like those from EqualLogic, 3PAR, Isilon, Exanet, Panasas and many others.”

Back To Enterprise Storage Forum

Paul Shread
eSecurity Editor Paul Shread has covered nearly every aspect of enterprise technology in his 20+ years in IT journalism, including an award-winning series on software-defined data centers. He wrote a column on small business technology for, and covered financial markets for 10 years, from the dot-com boom and bust to the 2007-2009 financial crisis. He holds a market analyst certification.

Latest Articles

How Tape Storage is Used by Banco Bradesco, Treasury of Puerto Rico, Computational Medicine Center, Calgary Police Department, and Franklin Pierce University: Case Studies

Most technologies eventually outlive their own usefulness, but a rare few withstand the passage of time. While floppy discs vanish beyond the horizon, taking...

How Servers are Used by Ducati, Dashen Bank, Vivo Energy, Skyhawk Chemicals, and Feinberg School of Medicine

Out-of-date legacy systems can act as the weak link in an organization’s push for innovation. This is particularly true of legacy servers attempting to...

How Flash Storage is Used by BDO Unibank, Cerium Networks, British Army, University of Pittsburgh Medical Center, and School District of Palm Beach County:...

Flash storage is a solid-state technology that uses non-volatile memory (NVM), meaning data is never lost when the power is turned off. It can...