Managing Data for a Lifetime, Part 2

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Last week, we looked at the limitations of current technologies for managing data over the long haul. In this article, we’ll explore technologies that might allow for better management over the lifetime of data.

There are two aspects to data management: hardware technology issues, and the software that is used to manage the data.

As we explored in the last article, there are some hardware realities that we cannot get away from:

Tape densities are not increasing as fast as disk densities.
Tape performance is not increasing as fast as disk performance.
Tape reliability is greater than disk reliability.
CPUs are getting faster and creating more data than ever, while disk drive performance lags.

Given the problems with the underlying hardware, we cannot solve all of our problems with software.

Storage Inefficiency

As we discussed in A Storage Framework for the Future, the Object-Based Storage Device (OSD) standard gives us a potential framework for coordinating file systems with the underlying storage hardware and for replacing outdated SCSI and block device technologies. What we didn’t discuss enough in those articles was the growing problem of storage inefficiency.

From my perspective as a storage consultant out in the field, storage performance and efficiency have been on a downward trend over the nearly 25 years I have been in the business. The slope of this downward trend has been growing as more devices are added to fewer RAID controllers and smaller and smaller I/O requests are being made. Device efficiency has been getting poorer compared to the performance of the channel and the number of bytes under the control of a single disk drive, and seek and latency times have also stagnated. All those factors have left us with a looming efficiency problem.

Models of Efficiency

So if devices and hardware are not going to get much faster, the only chance we have for improvement is through increased efficiency. I would look to history for examples; here are three that I think are appropriate:

Disks and recovery;
TCP/IP stack; and
RAID.

Disks and Recovery

It was not that long ago that disk drives had almost no intelligence within the drive other than reporting errors. All error recovery and the location of that recovery was done back on the host. This presented a huge latency problem whenever a bad block was encountered. When the system received a message saying that a block was bad, it allocated a block from its bad block area and rewrote the data. This whole process could take a few second back in the 1980s. Since that time, microprocessors have improved vastly, SCSI technology was born, and now all disk drives have non-host based bad block allocation.

TCP/IP Stack

In the early 1980s, when the TCP/IP stack and UNIX systems became common, TCP/IP stacks were implemented with some type of data copy into a memory buffer and then another copy into the user space. This worked fine for the older, slower networks, since the CPU could easily process the data coming in and copy it around memory. This was shown to have significant system overhead, given all of the memory copies and CPU interrupts, but is was not a significant enough problem for the CPUs and systems of that time period that anyone considered changing the method.

By the mid-to-late 1980s, Cray Research was porting UNIX to their XMP and YMP product lines and developing a high-speed channel called HiPPI (High Performance Parallel Interface) that ran at 100 MB/sec. They hooked it up and ran TCP/IP, and the overhead of the memory copies was too great for the Cray system; the CPU was basically locked down with system overhead. The engineering staff came up with an idea: read the TCP header information and process it, and then read the data and move it directly to the user area. In the 1990s, SGI used the same technique in IRIX, and as network performance has increased, we have seen this software concept move from supercomputing into mainstream hardware with TOE cards (TCP Offload Engine).

RAID

This example is far simpler than the TCP/IP example. With early SCSI drives in the late 1980s, it was often suggested that aggregation of the drives would improve performance. What Veritas and others did was to create a volume manager that striped the drives. In the 1990s, of course, the concept of software striping became RAID. The volume managers are still around and have a new life managing multiple volumes and failover for some file systems. The question is, as file systems change over the next few years, will volume managers be required for file systems that have OSD support in hardware? Also, many HBA vendors are now providing failover within the device drivers, so two more features that were done in software will likely move to hardware.

My point with all these examples is that when inefficiencies either in performance or in cost create an opportunity, what was done is software often moves to hardware. This is not true in all cases and the timing varies, but if the pain is great enough, it can and will happen. Software tends to come about because it is often written to solve a problem, since it is usually faster and much easier to develop a software solution than a hardware solution.

What It All Means

Today we have a number of software-based products from a number of different vendors that are used for data lifecycle management; over time, some, if not all, of the functionality will move into an integrated hardware solution.

The big storage players — EMC, IBM, HP, Sun and StorageTek, among others — all have products that address the solution with a combination of hardware and software, but they are all constrained by the limitations we have today, ranging from archaic file systems and RAID controllers that do not talk to each other. Products that are created today are going to have to change, given all the new standards from T10, SNIA and others that will address security, reliability and data lifecycle management.

No vendor is going to create a product today that you will be able to manage, read the data, understand the content and know why you kept it in the first place 15 years from now, much less 50. I am aware of a few pharmaceutical companies that need to keep data for 50 years (throughout the patient’s life), and they are constantly migrating data, as is everyone else that has a lot of data.

It is clear (to me at least) that software-only solutions are not going to solve our problems. The scaling problem in the data path is a serious problem that I believe will get addressed soon, and without scaling we can never hope to solve the data lifecycle management problem, as the hardware technology and limitations are just too great.

If history is any guide, we will likely see a combined hardware and software solution that will allow us to manage data efficiently and effectively for life. The need is there; all that’s waiting is for someone to come along and invent a solution.

Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 24 years experience in high-performance computing and storage.

For more storage features, visit Enterprise Storage Forum Special Reports