Management of data over its lifecycle is a vexing problem. The first questions that come to mind are how long should that life be, and who should decide? The lifespan of data can have different meanings for different people in the same organization, along with different reliability requirements, which could require different copies with different […]
Management of data over its lifecycle is a vexing problem. The first questions that come to mind are how long should that life be, and who should decide? The lifespan of data can have different meanings for different people in the same organization, along with different reliability requirements, which could require different copies with different management policies. Add to this the constraints imposed by various operating and file systems, and you have a really big problem on your hands.
This is the first article in a two-part series on data lifecycle management. This article will attempt to define the problem, and the next article will discuss potential solutions — most of which have not yet been developed.
We have so many problems with the current framework for data management that it is hard to think of a good starting place. Here are a few of the problems that most users deal with in the current UNIX, Linux and Windows environment:
Data lifecycle management problems can be solved to some extent by database technology or products like EMC’s Centera, but they are still limited to what the underlying operating and file systems support, so vendors either have to do a whole lot of extra work to solve the problem or take some shortcuts.
What Users Want
Recently I have been involved in a number of projects involving multiple-petabyte archives. The managers of these archives have some common requirements:
People who manage data are facing these problems and many others. Data management polices are not coordinated among systems, since no common framework exists. Hierarchies of storage, with different storage policies that go far beyond what current HSM technologies support, is needed, but the infrastructure that could make this a reality is missing.
Operating and File System Implementations
There are a number of problems in Windows, Linux and UNIX operating systems that prevent the full management of data.
Users and Groups
UNIX-based operating systems and Windows are limited by the concept of user and group ownership. More so UNIX, which has hard POSIX defined definitions of users and groups and how permissions work for both.
Having a file system and operating system that associate ownership of all files for a user can be a bad thing. What if that user leaves suddenly under less than desirable terms? What if that user is working on a joint project with a number of other users? Can you have a common repository for all of the files?
There are now a number of “groupware” tools available that can manage this process, but shouldn’t basic operating and file system management not be constrained by archaic UNIX user and group concepts? I believe that the current design in UNIX and Windows systems does not provide the needed infrastructure.
File Metadata
At this point, you might want to brush up on the issue of file system metadata. Here’s two suggestions: Choosing a File System or Volume Manager and Storage Focus: File System Fragmentation.
Current file system metadata structures provide little or no information about what is in the file. All the current metadata structures provide is the location of the file. If someone wants to describe information about what is in the file, how it was written, data structures, and a myriad of other information that will be important if someone wants to actually be able to read the file in 20 years, today this is done by creating a database generally separate from the file system and files in question. Now you have two things to maintain, upgrade and migration.
A number of major storage vendors such as EMC, IBM and StorageTek have recognized this problem and have developed products to address these problems. Of course, each of the systems is proprietary, and interoperability between products is nonexistent. There are no standards in this area currently, so there is little you can do if you need this kind of technology except to buy what he vendors are offering.
Archiving Technology
While some might argue the point, utilizing tape to store most large archives results in far less O&M costs than storage on spinning disk or even emerging MAID (massive array of inactive disks) technology. I base this statement by looking at the cost per GB of disk and tape, including compression; the cost of power and cooling; the cost of tape drives and robots; and reliability costs.
One thing I did not mention is the cost of software. Because most large archives under the control of HSM systems use a common interface to access the archive called DMAPI (Data Management API), which was created by the Data Management Interface Group.
Development of DMAPI began in the early 1990s and was completed in the late 1990s. A number of forces came together that required vendors to let customers access their data regardless of which file system and HSM product were being used. Before DMAPI, all HSM vendors had proprietary interfaces that made migration to a new system difficult and painful, particularly for large archives.
Conclusions
There are no general solutions or standards that allow sites to manage data over long periods of time. A number of vendors have created proprietary solutions that might meet your requirements, but without standards and the underlying infrastructure in the operating and file systems, moving from one vendor to another will be very difficult. I have been involved in a number of brute force migrations. They always take far longer than planned, and cost more too.
Given that storage densities continue to grow faster than storage performance, this makes migration increasingly difficult. Here’s a table comparing density and performance growth for disk and tape over the last 15 years:
| Type | Density Increase since 1990 | Performance Increase Since 1990 |
| Seagate Disks | 600 times | 29.5 |
| Tape | 7500 | 64 uncompressed 104 compressed |
With at least an order of magnitude difference between density and performance over the last 15 years, even if these ratios begin to change, we are still far behind on the performance curve. The disturbing part is that it looks like the trend may even worsen over time.
Next time we will look at some technologies that could provide some relief to these problems.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 24 years experience in high-performance computing and storage.
For more storage features, visit Enterprise Storage Forum Special Reports
Henry Newman has been a contributor to TechnologyAdvice websites for more than 20 years. His career in high-performance computing, storage and security dates to the early 1980s, when Cray was the name of a supercomputing company rather than an entry in Urban Dictionary. After nearly four decades of architecting IT systems, he recently retired as CTO of a storage company’s Federal group, but he rather quickly lost a bet that he wouldn't be able to stay retired by taking a consulting gig in his first month of retirement.
Enterprise Storage Forum offers practical information on data storage and protection from several different perspectives: hardware, software, on-premises services and cloud services. It also includes storage security and deep looks into various storage technologies, including object storage and modern parallel file systems. ESF is an ideal website for enterprise storage admins, CTOs and storage architects to reference in order to stay informed about the latest products, services and trends in the storage industry.
Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.