Object Storage vs. POSIX Storage: Something in the Middle, Please
Object-oriented storage is becoming very popular. It's being used for websites, medical records, databases, Hadoop, general data media storage, and more. One of the big draws of object oriented storage is its simplicity. There are only a few commands for object based file systems:
- PUT (basically a "write" — PUT the object into the storage)
- GET (basically a "read" — GET the object from the storage)
- DELETE (delete the object which is the file)
- POST (dds an object using HTML forms — similar to PUT but using HTML)
- HEAD (returns an object's metadata but not the data itself)
The POST and HEAD commands are not universal, but you do see them in object storage systems that are compatible with AWS S3.
On the other hand, tried-and-true POSIX file systems that have a massive range of data access functions. Below is a partial list of IO functions for Linux:
This is only a partial list of POSIX IO functions including some of the extended attributes. There are many more.
Object-oriented storage systems are really an all-or-nothing situation. You can't do any byte-level file access like you can do with POSIX. When you read a file, you get the entire file via the GET function. Your application has to handle the data in the file. This means you have to port or rewrite your application to adapt to the storage paradigm.
There are some object-storage intermediaries that provide a gateway to/from the object-storage system and POSIX-compliant applications. Examples include the following:
These solutions GET/PUT complete files from the object storage to a local file system. The POSIX applications interact with the data on the local file system (which is POSIX-compliant). The intermediaries then copy the data to/from the object storage systems.
The concept of the intermediaries is good, but the devil is always in the details. How often do you synchronize data from the local file system to the object storage? What if the POSIX application is doing lots of small byte IO with the data files on the local storage? How and when does the intermediary move the data to/from the object storage? What happens if the local storage crashes? How is data consistency insured?
There are many, many more questions about these intermediaries, but in many cases they are the best solution available to handle data movement to/from the object storage for you.
One of the more fully featured tools is s3ql. It acts as an intermediate file system between S3 and local applications. It is POSIX-compliant so applications can run using the storage without worry. It handles hardlinks, symlinks, standard permissions, extended attributes and file sizes up to 2TB. It also has some additional features unique to s3ql, including the following:
- Dynamic Size
- Immutable trees
- Range of backends
What Is Needed?
Object storage systems are popular for many reasons, most notably simplicity, ease of use, and cost (at least compared to NAS systems). On the other hand, people complain that applications have to be rewritten to use the storage and that they are slower than other forms of storage.
POSIX file systems are the most common storage system in use today. The POSIX compliance provides a wide range of IO functions for applications to use, including byte-level access. However, with the large number of IO functions comes complexity, both for the application and the file system. Typically POSIX storage is faster than object storage but also a bit more expensive.
As a result, people using object storage want faster performance and byte-level access while retaining the simplicity, ease of use, and the cost. People using POSIX storage would like things to be simpler, approaching that of object storage, and they would like the cost to come down.
At this point, I would normally insert a picture of the rainbow unikitty with butterfly wings and say good luck. However, I think it might be possible to combine object-storage and POSIX to create something that comes close to the ideal.