MIT Has Big Plans for IP Storage

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

MIT on Monday announced plans for a petabyte-scale IP storage system that can be “managed by a few graduate students,” in the words of MIT Media Lab Director Frank Moss.

The system, based on Zetera’s Z-SAN technology in collaboration with Bell Micro, Marvell and Seagate, will be used for the lab’s “Human Speechome Project” to collect and analyze video and audio data to better understand early childhood cognitive development.

Associate Professor Deb Roy has spent much of the last year compiling 12-14 hours of video per day of his nine-month old son in an effort to better understand early childhood learning and socialization data.

The digital audio and video data, collected at a rate of 350GB per day, will be processed and analyzed using a suite of data mining tools that Roy and his team have been developing. By mid-2008, the information will be assembled into a database exceeding one petabyte, to be processed and analyzed by several hundred parallel processing devices.

Zetera chief marketing officer Doug Glen said storage systems the size of the MIT system aren’t unique, but claimed, “It is unprecedented to build it with the simplicity, scalability and cost of the one we are talking about today.”

Requirements of the system include reads/writes in excess of 160 gigabits/second, shared volumes in excess of several hundred terabytes, scalability from an initial 50 terabytes to capacity well in excess of a petabyte, 100 percent data redundancy, file access by computers running multiple operating systems, a fully virtualized storage fabric, and affordability via low-cost, high capacity SATAhard drives.

Zetera senior product marketing director Jeff Greenberg said each component of the Storage over Internetworking Protocol (SoIP) system will access the system directly, with no RAID controllers to slow performance.

The system will ultimately be composed of more than 3,000 Seagate SATA drives, 300 Hammer Z-Rack storage enclosures, 100 Marvell-based 10G/GbE switches, and about 400 blade processors. It will process 700 terabytes of data during each 12-hour overnight analytical run. 150-drive stripes (aggregated virtual volumes) will be created using the native virtualization capabilities of Z-SAN. Protection against data loss will be delivered through RAID 10 mirrors (duplicate copies) of the raw video data, transform data and metadata files.

StorageIO founder and senior analyst Greg Schulz said the project is interesting — if a little overhyped.

“It is a cute little project, will get some press, however, it’s hardly something that has the storage vendors quaking in their boots over for now,” Schulz told Enterprise Storage Forum.

“Every few years someone does a science project like this to show how new technology and lower-cost technology can scale or change thinking,” Schulz continued. “It then typically takes a few years to productize and commercialize for turnkey business solutions even on a smaller scale to actually be delivered.”

Schulz took issue with calling the system an “array.”

“To say that a collection of nodes is an array is an unfair comparison to non-clustered solutions like those from EMC, HDS, IBM, etc.,” he said. “A more appropriate comparison would be how does this solution scale in capacity, performance, functionality and so forth when compared to peer, clustered, and grid-type solutions like those from EqualLogic, 3PAR, Isilon, Exanet, Panasas and many others.”

Back To Enterprise Storage Forum

Paul Shread
Paul Shread
eSecurity Editor Paul Shread has covered nearly every aspect of enterprise technology in his 20+ years in IT journalism, including an award-winning series on software-defined data centers. He wrote a column on small business technology for, and covered financial markets for 10 years, from the dot-com boom and bust to the 2007-2009 financial crisis. He holds a market analyst certification.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.