New Object Storage Protocol Could Mean the End for POSIX


POSIX has been the standard file system interface for Unix-based systems (which includes Linux) since its launch more than 30 years ago. Its usefulness in processing data in the user address space, or memory, has given POSIX-compliant file systems and storage a commanding presence in applications like deep learning that require significant data processing. The POSIX-compliant Lustre file system, for example, powers most supercomputers, and POSIX's dominance continues down market too.

POSIX has its limits, though and features like statefulness, prescriptive metadata, and strong consistency become a performance bottleneck as I/O requests multiply and data scales, limiting the scalability of POSIX-compliant systems. That's often an issue in deep learning, AI and other data-intensive uses now, but as data and the need to analyze it grow exponentially, the problem has, over time, moved down market.

Enter object storage. Unlike a file system, object storage requires no hierarchical data structure. It's a flat pool of data, with each piece of data defined by its meta data. It has no scalability limits, making it ideal for high-end storage and applications, but it has one performance limitation that POSIX doesn't have: data requests have to go through the POSIX file system stack. POSIX gets around that problem with the mmap() function, which makes the user space an intermediary between the operating system and storage.

Recently some engineers - including longtime Enterprise Storage Forum contributor Henry Newman - took that advantage away from POSIX by creating mmap_obj(), which gives object storage systems the ability to process data in memory. With object storage's scalability (and lower cost), the breakthrough could mean the end of POSIX's dominance in compute-intensive environments.

POSIX, meet object storage

Though POSIX is a helpful method of transporting applications between different operating systems, it does not have the scalability to meet the most demanding applications, and its speed declines as demands increase.

Accessing data in file system storage then becomes a challenge, especially for organizations with very large amounts of data and performance needs. Object storage is a more recent form of data storage that holds data in any shape (called objects). Development began at Seagate in 1999, based in part on previous work by RAID inventor Garth Gibson and others.

Object storage is the most scalable of the three forms of storage (file and block are the others) because it allows enormous amounts of data in any form to be stored and accessed. Data is stored in a flat pool and can be managed through the metadata of each data object. But requesting stored object data requires additional processing time because requests must go through the kernel, or operating system.

Object storage is ideal for very large data stores and is widely used in cloud computing, but until now, POSIX has had one advantage: the mmap() protocol allows data to be processed in the user address space without needing to go through the kernel, which remains a bottleneck, with very little improvement in performance over the years.

With mmap_obj() that advantage is no more and now object storage can also process data in the user space. This is particularly important with NVMeOF frameworks - with flat storage pools that can scale infinitely and the ability to process data in high-performance memory, object storage now has the potential to make POSIX file systems obsolete.

As storage systems exceed billions of files, POSIX scalability and performance limitations could mean that object storage becomes the preferred option now that it has the ability to process data in the user space. At that level, the I/O and scalability limitations of POSIX will be the bottleneck.

NVMeOF and mmap change the game

Although intermediaries, such as S3 file system interfaces, are helpful, they still require data to be processed by POSIX applications before or after resting in an object storage system.

But what if there were a much faster, more scalable way to access data in object storage?

Non-Volatile Memory Express over Fabrics (NVMeOF) is a relatively new technology in which fabrics refers to network fabric and a device connected to a computer network allows data to be transferred into that device, such as an SSD. Using memory mapping to copy object data into the device means that all the data is temporarily stored and processed on the device rather than in POSIX. The SSD or other external device has much more available space for computing. The external device (a form of secondary storage for that computer) connects directly to the computer system and the CPU has a path to the data in the device: it is available almost as main memory while attached. Memory stays in the SSD during computing, and actively accessing the data—particularly the metadata—becomes much faster. Low latency is long-sought in object storage and data processing. With NVMeOF, it will become readily available.

NVMe (the original version) made data available to an SSD using a flash connection (such as a drive). But NVMeOF makes that data available to entire networks. Instead of just processing data within the device attached to the computer, NVMeOF allows users to access memory over the network. Object storage and NVMeOF in data centers and data lakes means higher compute power than previously known. Data lakes are a repository of raw, unstructured data. Using object storage (rather than file storage) for data lakes would give data analysts an easier method of managing and analyzing the data; using memory mapping and NVMeOF to quickly access it would provide new levels of high-performance computing.

NVMeOF could also provide higher compute power for data centers. Google, IBM, and Amazon Web Services are already using cloud-based object storage. Currently, accessing object-stored data in data centers requires an application program interface(API) and input and output commands. By using NVMeOF with memory mapping, data centers can bypass using a server's operating system (going through the kernel) to process data. No intermediary interface is needed, either. Data centers can bring object-stored data directly into memory for processing.

The need for a POSIX interface could be bypassed altogether with object storage by using a REST interface for applications.

NVMeOF and memory mapping technologies, paired with object storage, will change the way data centers, data lakes, and individuals process data. Network computing power and speed will skyrocket. Though this may have its limitations - transferring data in file and block storage to object storage, for one - it will mean new developments for data-intensive computing. In a time where data centers and cloud infrastructure must rapidly scale to meet demands for data storage and processing, accessing object storage through memory mapping could be an unparalleled way to accelerate data center performance.

The mmap_obj() developers note that one piece of work still needs to be done: there needs to be a munmap_obj() function to release data from the user space, similar to the POSIX function.


Want the latest storage insights?