File System Interface Futures: Cloud Computing's Impact - Page 2


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  

Ceph, GPFS Lustre and Pan-FS support parallel I/O, which is I/O from multiple threads (these threads could be running on multiple nodes) to a single file, but Gluster does not. On the other side there are dozens of vendors developing REST- and SOAP-based object management interfaces.

Vendors are trying to create systems that support billions of objects in a single namespace. Given that the vendors are not constrained by the POSIX atomicity requirements and support for parallel I/O, this is far easier than developing this support inside a POSIX file system.

The main reason this is easier is that the interfaces with REST and SOAP are far narrower than POSIX and are encumbered by the standards process controlled by vendors. With REST and SOAP you can have policies on file for replication to remote locations, policies for access control, policies for encryption, etc. Each of these policies does not have to be done in POSIX inodes and if they were done in a database there might be consistency issues if there was a crash between the inodes and the database, not to mention the time to fsck (check the file system consistency).

On the other hand if the object is really big, I cannot use POSIX reads and writes to start reading the object until the whole object has been moved with a REST or SOAP interface. This might not be important in most application environments but it is going to be important in applications that need to process data before the whole file is there. This is important for applications such as oil seismic traces, raw video feeds and others; clearly, not your everyday applications. But it is still important to many communities where the files are very large and need to be processed before the whole file is received. And don't forget that all databases randomly position into the files.

I do not see C programs written for the oil industry being rewritten and accessing seismic traces, as they need a POSIX file system interface, given the performance requirements of parallel I/O. The industry challenge is that the current de facto standard interfaces for file systems is not meeting requirements for scaling to tens or hundreds of billions of files. And there is no movement to change the basic interface, and without a change POSIX file systems are going to be challenged to compete with file systems with REST and SOAP interfaces.

Is it going to become a fight between remote and local data access? POSIX file systems have a real competitor for data access at large scales for cloud applications that do not need the POSIX interface and all the overhead that goes with POSIX, but with all of the features such as asynchronous I/O and random positioning that go with POSIX and that are required for things like databases.

I personally think that long term we are going to basically have three types of data storage interfaces for both clouds and local data access. The first type will be our local computer, as there is not enough bandwidth on the planet – given the irregular connectivity – to be able to access files quickly for most of us. And therefore we are going to have local storage to deal with the issue.

The second type will be POSIX file systems. I think that the shared and parallel POSIX file systems are going to gain more and more market share with file system clients being distributed across networks of machines. NFS is not meeting the scaling and performance requirements in today’s storage market requirements so I think file system clients will be a larger part of the hierarchy. Yes, NFS will continue to exist but as part of parallel file system hierarchy.

Third will be object access by non-POSIX interfaces, what we know today as REST and SOAP, but might in the future include other methods. Who knows? I also think that you will have storage that has both POSIX and REST/SOAP interfaces from the current parallel POSIX file system vendors that will be part of this hierarchy. Cloud storage cannot replace POSIX file systems fully so we are going to have to coexist.

Submit a Comment


People are discussing this article with 0 comment(s)