Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
What's Missing from HPC
Though HPC file systems might have some or many of these features, today's high-end NAS boxes today have some features today's HPC systems do not:
- Application support- Any file system needs support from everything from databases to VMware to cloud applications. Just having support for HPC applications reading input data or writing checkpoints is not going to cut it.
- Replication-File systems also need the ability to replicate a file or block of data, including the metadata, such that policies for having data offsite are met.
- Data deduplication and compression-Because everything goes through a single entry point with NAS boxes, doing data deduplication is pretty easy. With parallel file systems, this becomes a huge issue as the data is spread out across many different targets. If different clients write, they might be start writing across different targets, so you will not see the same data starting on that target and data deduplication will not be found. This is the inherent advantage of HPC file systems in terms of spreading out the load, but also the downside. Compression is possible on either the client on file system target.
- Failover-It is far easier to failover NAS systems and REST-based systems than HPC file systems given the complexity of the data paths. You have tens or evens hundreds of storage targets, as well as metadata to deal with. Combine that with the number of requests in flight for read, writes and metadata operations and compare and contrast that with NAS or REST systems and it is obvious why failover is significantly more difficult.
- Resiliency-The issues around the difficulty of doing failover and data deduplication are much of the reason resiliency is difficult for HPC file systems
- Tiering-The NAS and REST world are ahead of most HPC file systems for moving data between tiers. Of course, the system that does this best and actually does most of this list best is the venerable IBM mainframe running MVS.
- Monitoring-Monitoring performance and system health is lacking in most HPC file systems and from what I have seen in many REST systems compared to what I have seen in the NAS world.
- Management-System management is not as easy as it seems, as you have to manage the whole system including the storage and the file system and network. Given the degree of difficulty it is not something that you do overnight. The NAS vendors have many years head start.
- Hot everything-The time and effort the NAS vendors have put into easy upgrades make this a clear win for them. Software-only REST vendors and poorly integrated HPC file vendors have a long way to go.
What Does the Future Hold for File Systems?
HPC file systems and the scalability they provide for both data streaming IOPs and metadata are a good model for the requirements for new storage technologies. These file systems can do hundreds of thousands of metadata operations per second and stream many TBs of I/O per second.
If the HPC file system community decides to invest in some of the enterprise features listed above, there is a good size market for these file systems. But these file systems will be challenged by the enterprise requirements listed above. Deciding which requirements to tackle first is going to be a challenge as different markets likely have higher priority on different requirements.
I suspect the HPC file systems vendors are going to be looking for new markets soon given their extreme scalability.
Photo courtesy of Shutterstock.