Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
In 1995, Conoco, a 127-year old $27 billion international energy company, stepped up exploration efforts to find new sources of oil. The Houston, Texas-based company immediately recognized that the deepwater area off the Gulf of Mexico presented a significant economic and ecological environment for developing oil and gas reserves. To evaluate the targeted areas, Conoco had acquired more than 13,000 square miles of 3D seismic data by 1998. The then two-terabytes of seismic data got loaded into Landmark's SeisWorks' interpretation system located in Conoco, Inc.'s Gulf of Mexico Business Unit, in Lafayette, Louisiana.
Faced with the volume of seismic data tripling over the next four years, R. Holt Ardrey, director of information management for Conoco, Inc.'s Gulf of Mexico Business Unit, embarked on putting in place a storage system to maintain the seismic data through it lifecycle. To this end, the system had to integrate online storage with nearline storage. At the same time, the system had to minimize the need for massive amounts of disk storage, especially during low economic periods in the oil and gas industry.
Ardrey recalls the hows and whys of putting in a hierarchical storage management system (HSM), what makes seismic data unique, and how market conditions in the oil and gas industry drive computing decisions.
Can you explain what exactly is seismic data and how do you obtain it?
The Conoco division I work for explores new sources of oil and gas. So, we have two computing types - general purpose, and scientific and technical. Our environment involves only the Unix-based technical scientific environment. The primary application consists of a hierarchical management system (HSM) to support our online storage system.
We produce 3D seismic data, which are really images or traces of the earth's subsurface. The process consists of sending a low-frequency burst of energy source, such as an off-shore dynamite blast, down about 30,000 feet between the different types of rocks and beds that exist in the earth's subsurface. A listening-type device records some of the energy as it's reflected back to the surface.
What system is this seismic data stored on and what makes the data unique?
We've about 12 terabytes of data stored on various Network Appliances' network attached storage (NAS) filers. We try to upgrade our systems to the latest models. To date, we've gone from the model 750 filer to the 860 filer.
The processing of seismic data involves stacking all of the pieces of the data together to get an image of the subsurface of the earth. A typical seismic survey or file could be as large as 55 gigabytes. The file usually consists of extents or pieces of data. Each extent could range up to two gigabytes. You could have up to 30 extents in a specific survey. We try to limit each survey to no more than 65 gigabytes.
After we load a survey on a NAS filer, our HSM system triggers the Sony robotic tape library to write two sets of tapes. (The robot contains two sets of tapes.) One set of tapes gets removed periodically and sent to an off-site vault for disaster recovery purposes. The other set of tapes stays with the HSM system.
To interpret the results of a survey or to write it to a different file type, users don't need to modify any of the data. As a result, roughly 80 percent of the 12 terabytes used for this seismic data gets stored in read-only format on the NAS filers.
Can you describe how your HSM systems works?
It runs on a Sun staging server, which can accumulate files on disk before they are sent off to tape. We've some business rules in place for HSM. If the online disk system reaches a certain high watermark, the HSM system will go back and look at the data that has the longest period of time that hasn't been accessed. So it's the oldest last touched date first.
Since it already has a copy of that data on the tape, the HSM system doesn't have to copy it to tape at that time. The HSM system just releases that space on the filer and leaves a little stub in the HSM file system. To the user, the data still appears on the file system. If the user wanted to access an archive file, the HSM system would go to the tape robot, read the data back to the staging server, and the staging server would copy the data onto the filer.
How many robotic tape libraries do you have? Any problems with delays getting data back to the filer?
We've three Sony robotic tape library - two units in one office, and one unit in another office. Our principal robotic tape library can hold up to 25 terabytes. Each DTF2 tape cartridge holds about 200 gigabytes. The robot holds about 125 tapes slots. We keep expanding the number of slots we have.
The delay or latency moving data back or forth between the filer and the HSM system isn't as bad as you might think. It usually takes three or four minutes if the data needs to be restaged.
Why did you select the Sony tape library versus one from StorageTek?
When we put our HSM system in back in 1998, we were looking for hardware and software that could work together to handle seismic data and HSM. We selected Panther Software Corporation's Seismic Data Management Systems (now owned by Schlumberger) software to handle the stacking of seismic files. For HSM, we selected Large Storage Configuration Inc.'s SAM FS HSM (now called Sun FS HSM). The staging server consists of a Sun E4500 system with about a 150 gigabyte RAID disk array. We selected our first Sony PetaStar robotic tape library. Initially, our NAS filer consisted on a system from Auspex. We replaced it with filers from Network Appliance.
Your industry seems to go in cycles as the prices of oil and gas rise and fall. How does this affect the technology decisions you make?
In relationship to the HSM system, we analyzed the cost of storage if we continued just adding online disk storage via a SAN or NAS devices versus using nearline with HSM to supplement our online storage. The cost savings have been significant, allowing us to achieve a large return on investment over a two to three year period. Our initial system payback took about a year.
However, the cost per terabyte of data stored on tape versus the cost of data stored on disk has been a changing metric over time. For example, our industry has cyclical commodity prices for oil. Over the past 20 years, the price of oil has gone from $10 a barrel to $30 a barrel. From our capital management point of view, we don't choose to be investing in a lot of new systems when we are under cost pressure. On the other hand, we still need to be able to manage the data which is central to our business. The system we have helps us in a lot of ways to manage through these cycles.
Do you know how much it is costing you per megabyte or per gigabyte for storage?
That's not a metric we've ever sat down and calculated. We use different metrics to capture our storage costs. Each of our offices around the world operates independently over each others. When we roll our number to compare costs between the business units, we calculate what the total cost is for a scientific-technical user.