My article last summer on data management prompted an e-mail from an overseas reader who wondered how to apply information lifecycle management (ILM) to a database environment.
The reader, who works in a hospital data center, wondered if I “had any insights into ILM and archiving where the data resides in a database, not simply a file system. Most articles on ILM pre-suppose that data resides in simple files, and that a file may be archived or not on its own merits, independently of other files or data. Where data resides in a database, one cannot simply archive the files.”
The hospital’s software vendor proposed an archiving system that essentially consisted of a duplicate database on cheaper storage. “This implies disk storage devices,” the reader said. “As you have pointed out elsewhere, tape has many superior qualities, including cost. It seems to me that this vendor, having created an unholy mess in their database design, has decided that it is too costly to design an archiving process which extracts data to files. They are probably not alone in this.”
Good points indeed. I agreed with him that ILM and databases are a messy situation. I didn’t know the details of his configuration, volume of data, recall rate, security, and system management, so making specific recommendations was impossible. But I told him I know of a few hospitals in the U.S. that use a database to index data and keep the patient data in a file system. An example of the file system structure for each patient is:
/Letter of last name/full patient name with last four digits of Social Security number/Year/Record type [MRI, CT, X-ray, etc.]
The database then points to the data and the HSM restores the data back to disk. That way you are only dependent on a database for indexing the data and a small application to get to the file system. Keep it simple is a very good rule of thumb in general. If you do not have to make it complex and let the database vendor control your business, so much the better. Over the long term, giving control to the database vendor might not be the best business decision.
The reader responded, “Traditionally, hospitals have continued to maintain paper records because of the limitations of capacity, longevity and cost in computerized systems. Is there anything on the horizon for a more generalized method of archiving old data from database systems in a way where it remains easily accessible?”
This was probably the longest exchange I have had with a reader. The person was on the other side of the world and yet faced with exactly the same problems we face here all the time. Database data management is a universal problem that currently has no good solutions. Most database vendors have a solution to the problem that keeps the data in their proprietary database and trusts the vendor to move the data forward and manage various hierarchies of data. While this might work, you become dependent on the database vendor for you database activities and solely dependent on that same vendor for your ILM and HSM.
There are advantages and disadvantages to giving control to a single vendor. The advantages are you have someone to blame if something goes wrong, and your ILM migration path to new technology should be clear.
But the downside is that you are dependent on a single vendor for both database and ILM technology. Is this part of the core technology and expertise for these companies? The vendor would need to be involved with SNIA, T10, ANSI and other standards bodies to understand and implement emerging standards from these organizations, and it also means potentially limited technology. Database vendors have certified hardware and file systems, which could limit choices for your organization.
What to Do
Right now there are no great choices for this reader or anyone else in this position. Your choices are what I like to call binary; you only have two. You can integrate the technologies yourself and face the same problems as our reader, or you can give control to the database vendor and let them deal with the problems and hope they keep up with the various technologies.
I do not see any middle ground. Data management using a database and storage management software from the database company is risky. With all the new standards emerging from all the various standards bodies, does the database vendor have the wherewithal to keep up with all of the pending changes? Take my favorite new standard, OSD (object-based storage devices) from T10. How many database vendors participated in the OSD standard development, or are participating in the parallel effort by SNIA?
I think it is all about your requirements, as I have said time and time again. Since the path forward is perilous and unclear, you should step back and review the choices based on your requirements. These are the kinds of things I would look at:
- What database vendor is being considered and what does the database support for ILM? What types of policies are supported? What types of storage and secondary storage are supported? Does the vendor understand ILM and your future requirements?
- What ILM file system vendor is being considered, and what databases are supported by that file system vendor? What versions of the database are supported and certified, and how soon after release did they support the version? Is the database certified for just the file system, or with the use of secondary storage such as lower performance RAID and especially tape, given the potential load and position time issues?
You would hope that the file system vendor is aware of and working on new standards, but it never hurts to ask that question too.
As you can see, whichever way you go, you face some hard choices, since you either give up control to a single vendor or you have to integrate the file system and database yourself. You could potentially ask the vendor of the ILM file system to have their certified database people plan or manage your environment.
I think we are in a transitional time period where we have limited choices of how to manage our data, and none of the choices are very good. This transitional period means that whatever you decide to do, you need to plan for the future, and assume that the future will have better choices for managing large quantities of data currently used in databases. The problem is that if you make a decision today that you expect will last for ten years, you might be sorely disappointed when these standards become available and the lifespan of your data becomes easier to manage. If you choose a vendor and a method today, are you going to be able to migrate the data easily when either the new standards become available, or your requirements are no longer met by that product?
As our reader said, right now ILM and databases is an unholy mess.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 25 years experience in high-performance computing and storage.
See more articles by Henry Newman.