Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Four years ago, I dreamt about what it would be like to be able to create my perfect storage product (I know what you're thinking — I probably need to get out more).
Back then I thought there were three areas that were lacking in the storage market. They were the following:
- High performance and predictive scaling;
- End-to-end security, and
- Simplified management.
It might — or might not — come as a surprise that I think very little has changed in the last four years. So let's pretend that I am COB, CEO, COO, CTO and every other important person in a storage company and I have been tasked with developing the next killer product. What would that product look like today, and is it much different than the product I wanted to develop four years ago? As I said, as far as I am concerned, the problems are still the same, but what has changed is my approach and the affect of this killer product.
So let's say I wasn't able to develop the product 4 years ago because the evil VCs pulled funding since they just did not get my idea, but I have found another friendly VC who has complete faith in me and I can do what I really want. What product is needed today and how is that product different than what was needed four years ago? Let's start with the same market analysis.
What Is Wrong with Products Today?
To me, this is the first question you should always ask in market analysis. If you are going to build a new product, you first have to see what is wrong with the products today, specifically, what is wrong with the products for a given market. Maybe you could develop a product that addresses large markets such as USB flash drives, but if you are going to have a higher margin, non-commodity product, you need to be able to address a specific market, whether that be enterprise, SMB, or my area of work, HPC.
So back to the question: what is wrong with the products today? In each of those markets, there is a problem with management of information. People need to manage information, not storage. This is a significant change from what I said four years ago in some ways. First of all, today it is about data management (the content of the information), not necessarily storage management (the blocks on disk).
Storage management must still be part of the data management framework, which also includes error and warning management, but the most important thing for businesses is regulatory compliance and being able use the information collected to improve business prospects. Because of this, vendors are developing storage products that are trying to provide awareness of the content of the information. What that means depends on the vendor. Some vendors are developing products that index the information and allow it to be searched, while other vendors are moving the files from fast to slow storage (usually fast disk with high cost to slow disks with lower costs) based on usage and business polices. Some vendors have even created file systems that implement this type of policy. You can view these types of file systems as an HSM, but with policy management not based on standard HSM functions of age and size, but on the information's importance. The key concept is information lifecycle management (ILM) policy, which is far different than how most current HSMs manage files.
Some of the problems with ILM products today are a lack of standards, a limited focus on areas like compliance, and a lack of solutions for long-term data retention issues, including policies for reliability (checksums for a copy and number of copies), deletion and metadata.
There are also a few other minor issues like ownership of the files, since UNIX only has user ownership and group permissions, and the need for a project identifier similar to that offered by mainframes.
So what is my product? If I were developing a product, I would let it be known that my company was going to first provide it as open source and a reference implementation and submit it to the appropriate standards bodies. The reason is that without standards and agreement on standards, the product will not be as useful to the community as a standards-based product. The standards bodies that I would submit to would be:
OpenGroup: I would propose changing the open system call to support a variety of new additions for things like:
- Add a reliability definition to open that will become standard. Say something like a 1-10 rating that could be defined by each installation and the reliability definitions are part of the field so that future file systems would know the definitions.
- Integration with the T10 OSD standard for fields added by OSD for objects.
- Integration with the T10-DIF standard for checksum for open.
- A data retention field for how long the file is to be kept.
- Usage statistic tracking — Be able to track who used and opened the file or not.
- Performance hint — Let's say you have a system with different hierarchies of storage, the user might want to keep the file available for 30 days, either because it will be reused or the application creating the file might want the file to be moved to low-performance storage immediately since it will not be reused.
- Backup policy — The user might not want the file to be backed up, which could have broad implications for e-discovery and regulatory compliance and could be overridden by the administrator.
- Encryption — This would be a yes or no and the encryption policy for yes or no would be defined by each installation, along with key management.
- Project/Account ID — Many organizations want to track usage via this accounting function. It is important to note that many organizations want to account for and manage data by project.
- Shred or no shred — When a file is removed, the space used by the file should be shredded so that the file cannot be recovered.
- Addition of some fields for user-definable metadata.
IETF: Changes will be needed to support many of the additional features in the open system call in NFS
T10: Likely encryption support for object-based encryption and use of the encryption hardware on the disk drive.
The chances of me getting this through the OpenGroup or Kernel.org (Linux kernel standards), IETF and the rest of the standards bodies is close to zero unless I get some help from a number of large organizations — commercial and government agencies that have a vested interest.
If I had a product that completely changed the standards to allow applications to open a file and place metadata, archival information and data protection in the file when it is created in the file system and not managed by a third-party application, I would do this work under an open source operating system and my company would be updating the operating system and the system calls and creating a new file system.
Once this was completed, I would head to any organization that had a requirement to preserve and archive data. Everyone from the National Archives and the Library of Congress to pharmaceutical companies that needed to keep drug trial information must have problems with our current lack of standards and the lack of a framework to manage information for the required life of the information. Someone at the National Archives once said we need to manage the files for the life of the republic. That's some information management need.
Two questions arise from this strategy:
- Will these companies and the U.S. government and potentially other world governments with these problems force the hand of IT companies and make this a standard?
- Will my company be able to make any money doing this project in the short term or the long term if I open source this, and if it somehow does become a standard, how will we make money at my company?
Both of these are huge concerns that could make or break a company.
As far as I can tell, the chances of anyone developing a product that makes the types of modifications to the storage stack that I described is just about zero. I do not think that those who control the standards bodies will allow this huge set of changes that would have a broad affect on many companies and organizations, given the coding and testing requirements and the coordination of different standards bodies.
I still think that these changes are needed, though, and do not belong as applications above the standard operating system environment but as part of the operating environment to allow management for the life of the information, since without a standard you will be locked into a single vendor.
At least that's what I'd do if I were in charge.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 27 years experience in high-performance computing and storage.
See more articles by Henry Newman.