Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
With such attractions as lower costs and flexibility, it was only a matter of time before the success of Linux in the server sector translated into broader application within the storage market. This doesn't mean there aren't still doubts and questions about its viability as a storage platform. On the other hand, when a company that boasts the fourth largest commercial supercomputer system in the world successfully deploys a Storage Area Network (SAN) using the Linux operating system, it's time to take a closer look.
The company in question is NuTec Energy, serving the oil and gas industry with seismic imaging services. Based in Houston, Texas, NuTec employs 35 staff. In early 2000, the company struck a deal with IBM to develop a massively parallel supercomputing system capable of dealing with the ever-increasing demands of seismic signal processing for oil and gas industry-based applications.
The storage system initially consisted of 3000 Power 3/3+ CPUs with AIX on each server, and with each CPU running its own analysis. The Network File System (NFS) file server utilized 2 IBM 'Shark' units connected to three B80 servers and featured shared file access to all CPUs. By 2003, however, the system was not keeping up with the demands placed upon on it, so a project was established to specify a replacement.
According to Sampath Gajawada, manager of software development at NuTec Energy, "The target was a super-scalable SAN — a high-performance, single image storage environment using Intel, Linux, Fibre Channel, and Ethernet." He defined several key objectives for the SAN:
- Software tuned to be latency tolerant and massively parallel, offering buffered asynchronous communication and I/O
- High I/O bandwidth (>500 processing nodes)
- High computing power (processing power >2 Teraflops)
- Large, flat file system (10-100TB), with easy storage management
- Cost effective, solid price/performance balance, and scalable at low incremental costs
"The existing system just couldn't cope with the demands of our Depth-Domain Analysis and Time-Domain Analysis," said Gajawada. "We had reached the stage where business requirements were forcing us to reconsider our entire system. We looked at all the alternatives and settled on a combination of Intel and Linux."
The decision to favor Intel/Linux enabled NuTec to create a lower cost structure with industry standard hardware and to take advantage of the improved FP (Floating Point) performance of Intel Pentium 4 processors, which would prove to be especially beneficial to NuTec for their intensive graphical image processing requirements. The Linux route would also eliminate the NFS bottleneck and provide data sharing with SAN performance via a CFS (cluster file system) on the SAN that features the ability to scale to hundreds of nodes with minimal management.
NuTec adopted Minneapolis-based Sistina Software's GFS (Global File System) Linux cluster file system. Its cluster nodes physically share storage over Fibre Channel or shared SCSI, and while each node thinks the file system is local, file access is synchronized across the entire cluster.
In effect, GFS can pool storage onto cheap, efficient machines. NuTec's system resides on a Fibre Channel SAN infrastructure from LSI Logic for high I/O performance. Processing consists of 350 dual processor P4 based nodes, providing 750 Linux-running CPUs, each of which is four times faster per box than the previous AIX processors.
The following table, prepared by NuTec, compares the two systems: