File System Interface Futures: Cloud Computing's Impact
The world only has had only one agreed upon file system standard from about the mid-1980s. It is based on the POSIX (Portable Operating System Interface.
POSIX came from an IEEE Standard 1003.1-1988, released in 1988. And the last change to POSIX file system I/O was to add asynchronous I/O systems calls back in 1991. One of the big backers and participants for the development of the POSIX standard was the US government, as they were less than happy with having to port applications to various hardware platforms and operating systems from vendors. The various vendors ranged from Digital Equipment Corporation to IBM to HP, and also including various flavors of UNIX from Sun and others.
Right or wrong, the thought was that if you have a common application’s operating system, interface applications will be portable and the US Government would not have to worry about applications porting. Of course we all now know how naïve that was, but that was the goal.
With the advent of the Web a new interface was needed. You cannot do system calls or C library calls over http so Representational State Transfer (REST) was developed back in 2000 by Roy Fielding as a way to have an interface to Web servers.
In the last 12 years the REST interface and REST applications have exploded – especially over the last 3 years, with the movement of applications and storage into the cloud.
So what I want to explore this month is the questions: Will REST overtake POSIX as an interface of choice for all applications?
POSIX Strengths and Weaknesses
POSIX has been around for a long time and has rich interfaces. You have of course the C library interface with open/close, read/write, and the ability to randomly read or write data within a file with use of fseek.
It should be noted that Java supports the C library interface when it is doing file reads and writes but also supports REST. The system call interface provides more richness with direct system calls to the data with no application level buffering and with the addition of many more features on the open system call and support for asynchronous I/O.
POSIX has been around for almost 35 years and has, I would guess, many millions of applications that support using the standard, more likely billions. POSIX, on the other hand, has not been updated in over 20 years. There have been proposals to update POSIX, but since it is controlled by The OpenGroup, which has significant input from the vendor community, they do not want to make any changes. Changes cost money both for development and more importantly for test suites to validate the standards. And the time to run those test suites with updates to the operating system stack.
The POSIX interface for things like metadata consistency and multiple threads writing to the same file are burdensome for scaling file systems with billions of files and scaling applications that might require parallel I/O like, for example, a database. The POSIX interface allows you to access parts of files so that you can read and write before the whole file arrives unlike REST.
The Achilles heel for POSIX in my opinion is file system inodes and the requirements for atomicity imposed upon the file system by the standard. The command ls –l</> and the requirements around it are the enemy to scalability for most POSIX file systems.
REST Strengths and Weaknesses
I think the biggest strength of the REST interface is that the backend management of the file or objects is left up to the developers of the management system. The same could be said for POSIX file systems but the number of things imposed upon the developers limits what can be done.
SOAP (Simple Object Access Protocol) is similar to REST, but REST is less strongly typed than its counterpart, SOAP, and does not require XML. The REST interface, which uses http, has a modest set of methods for accessing the objects. Examples include:
REST uses these access method and other functions and features via the well-defined HTTP protocol. HTTP is used to address proxy and gateways, caching and security enforcement. And it allows application developers to define new, application specific methods that add to the current well-defined HTTP methods. For example, methods might include:
• createPurchaseOrder(string CustomerID, string PurchaseOrderID)
SOAP, though similar to REST, has some advantages. The biggest advantage of SOAP over REST comes from REST’s use of hTTP. Since SOAP does not use HTTP and HTTP conventions, SOAP works well over raw TCP, named pipes, message queues and other direct connections, but has the same advantages as REST, as the interface is not via system calls but via the file object.
We do not have a lot of POSIX file systems that scale today to 10s of PB and billions of files. There are three file systems in production with a parallel namespace (Gluster, PAN-FS, Lustre, and GPFS) and a new entry called Ceph.