Key-value is a fundamental data representation that is invading the storage world beyond just archive or other cold storage. With new technology based on key-value, it's quickly coming to higher performing storage. Is key-value storage in your future?
In June I went to the 30th International Conference on Massive Storage Systems and Technology (MSST 2014) and over the 4 days I was there I saw a theme rise to the top that I thought was quite interesting. Two years ago I barely heard the subject raised at the MSST conference, but at MSST 2014 I saw it mentioned several times a day. The subject is key-value storage.
Key-value is a fundamental data representation or data structure. It is very simple consisting of (key, value) pairs so that a keyis not repeated in the data set (i.e. it's unique). If you want to retrieve the value associated with a key all you need is the key itself. The number of operations used with key-value pairs is fairly small.
- Add: Add a pair to the collection of data
- Remove: Remove a pair from the collection of data
- Reassign: Change or reassign the value associated with a key
- Get: Get a value associated with a key
If you've ever examined object based storage (or file systems), then these operations should look very familiar to you. In fact, object storage such as Amazon's S3, Openstack's Swift, Caringo, and others, use these same basic operations for their storage system. These approaches to file systems have been around for a few years so they're not exactly brand new.
So why was key-value storage such a hot topic at MSST 2014? The simple reason is that it is simple and there are some new technologies that make key-value storage even easier and perhaps more applicable to storage that is faster than previously used.
From my observation, key-value storage is becoming the back-end for not just object based storage and file systems but a great many file systems, including ones that traditionally may have used block storage. The reason for this is that key-value storage is very simple. But being simple doesn't always push adoption of a new technology. Seagate has advanced this technology with the development of a key-value based, Ethernet attached hard drives. It is an example of where key-value storage is headed.
Seagate Kinetic Drive
I'm not sure if everyone has seen the new Segate Kinetic Drive technology but many people are talking about it and demonstrating what can be done with it. To create the kinetic drive, Seagate took the basic hard drive, removed the SAS or SATA interface and replaced it with two Ethernet interfaces and a simple processor. There are some power connection changes as well but the form factor is exactly the same with the Kinetic drive as the regular SAS drives.
The benefit of replacing the SAS interface with an Ethernet interface is that it strips away all of the intermediate layers between the application and the drive itself. This intermediate layer is all of the POSIX function calls, the file system, volume manager, drivers, and the storage server which can have RAID controllers, caches, the SAS controller, and on and on.
With the Kinetic drive, the application talks to a library layer that a developer creates that takes the place of the file system. Then the library talks directly to the Kinetic drive using TCP/IP. This greatly reduces all of the IO latency between the application and the actual storage. But how do you use the drive now that it only has an Ethernet interface?
Segate has turned the Kinetic drive into a key-value pair storage device with an interface that you can access via several programming lanugages including Java, C++, and Python. These interfaces have just a few simple client API's:
Do these functions look familiar? They should. The Kinetic drive has some other commands to help out the developer as well.
The drives also have an administrator API so that they can be managed and monitored. This includes setup features for the drive, security, and the ability to get logs from the drive. To use the Kinetic drive you have to issue commands to it in terms of PUT, GET, and DELETE, basic key-value functions.
While Seagate won't talk too much about what's exactly inside the drive, it is basically a simple key-value database and presumably some simple OS. Underneath the covers the drive does the space mapping for you including any garbage collection that needs to be done. The drive stores keys in the range of 1 byte to 4 KiB. The value is stored in 0 bytes to 1 MiB. Each drive can have multiple masters from a authentication and authorization stand point.
File systems can be written or adapted to use these drives as the storage back-end via open source programming libraries to interface with the drives. Alternatively, a fairly simple IO library can be written so that applications can do IO to/from the drives. The design of the application interface is up to the developer(s).
As a starting point the file system will need to break files larger than 1 MiB into 1 MiB chunks. All of the chunks can be sent to the same drive, spread them across multiple drives, or spread them around with copies of each chunk. You can do almost anything you want including replication, snapshots, striping, RAID, or just about anything else you can imagine. You can write the IO interface using the Kinetic libraries and the simple key-value commands.
The current Kinetic drive has two Gigabit Ethernet interfaces. The performance numbers that Seagate showed at the April SNIA showed sequential read and write performance of about 50 MB/s. The random write speed is also about 50 MB/s but the random read rate is 1.2x slower than traditional drives. So the performance is comparable to what you might see from a regular SAS drive.
There are some file systems that have demo-ed using the Kinetic drives as a back-end. For example, Swiftstack demonstrated a Kinetic backed version of Swift at the OpenStack conference in May of 2014. It is targeted at slower, cheaper storage that might be used for archiving data. The Fred Hutchinson Cancer Institute has deployed this Swift-Kinetic solution in production with great success.
Key-value storage limitations
Key-value storage seems pretty good doesn't it? With a key you can easily look up the corresponding value, delete the key-value pair, or replace the value in the key-value pair. The Seagate Kinetic drive does exactly this. The developer writes a library or layer that retrieves the requested data (read), writes the requested data (write), or erases the data (delete).
But storing data also requires metadata. There are many ways to do metadata operations, including storing the mapping from the file to the keys as part of the data in the Kinetic drives or perhaps in a separate database. The developer(s) write all of the functions and operations into an IO library.
This sounds very easy but the IO portions of the application will have to be re-written to use the specific IO library that talks to the Kinetic drives. Key-value storage is not exactly POSIX compliant so that means that either the storage interface will have to be new or the library layer will have to be written to accommodate POSIX IO function calls.
A simple example of the difficulty in using key-value storage is POSIX IO functions allow you to seek to a specific point in the file (offset). There is no corresponding key-value function that allows you to move a file pointer to a specified position in a file. With key-value storage you either get the entire value associated with the key or you get nothing. In the case of the Kinetic drive you get up to a 1 MiB chunk of data or you get nothing. You can't seek to a specific spot in a file and then do some sort of IO operation.
This is just an example of the issues surrounding the use of key-value storage. You either have to re-write your application to a specific IO library or you have to write a POSIX compliant (or close enough) library that talks to the key-value storage.
Another option is to write the IO library so that it has an intermediate file system for doing seeks and other operations that key-value storage can't provide. When a file is accessed or opened or created, a POSIX compliant file system, which is used as an intermediate file system, does all of the operations required by the application. Then at certain points the data in the intermediate file system is "flushed" or copied to the key-value storage. Making sure the data in the key-value storage is consistent with the intermediate file system takes some careful design and coding. This isn't the easiest solution but it's somewhere between a non-POSIX IO library and a POSIX compliant library.
One area where key-value storage can work well is archive storage. Archive storage is where you can park data that is rarely used but you want to keep around for a period of time. This usage pattern indicates that applications are not likely to directly interact with archive storage alleviating the need for POSIX compatibility or a re-write of the application. Really you just need tools to "put" the data into the archive and "get" it from the archive when you need it. These two operations, "put" and "get" map very well to key-value storage.
As I indicated earlier, it's not easy to create a POSIX compliant file system using key-value storage because of the difficult in mapping all of the POSIX IO functions to the simple functions of key-value storage. However that doesn't mean it's impossible - merely difficult. One way to achieve this is to take existing object-oriented file systems and adapt them to key-value storage. Ceph is a perfect example of this.
Ceph can present storage to clients as either an object-oriented file system, block storage, or as files. The file system inside Ceph that backs all three types of storage is object based. Version 0.80 (Firefly) of Ceph had some experimental support for a key-value OSD (Object Storage Device). So it is definitely possible to create a storage solution that isn't just an archive. It doesn't have to have the limited performance of archive storage either. Other file systems with good performance that could be adapted to use key-value storage are Lustre and GlusterFS. Performance is not a limiting factor for key-value storage.
Key-value Storage: Summary
Key-value storage is quickly becoming a popular storage technology. It is a fundamental data representation in many computer languages and is used in a great number of database tools. It's a very convenient mechanism for storing data. The large capacity but lower performance storage world is abuzz with key-value storage concepts. But key-value storage isn't limited to archive or lower performing storage. On the contrary, it can be used for faster performing storage as examplified by Ceph.
As an example of what you can do with key-value storage and how simple it can be, Seagate has created a new storage drive called Kinetic that you address using REST-like commands such as get, put, and delete. A simple open-source library allows you to then develop IO libraries so that applications can perform IO to/from the drives. Some object storage solutions such as Swift have already been ported to use the Kinetic drives. Ceph is also developing a version that can use Kinetic drives. Other object based storage systems such as Lustre and Gluster could theoretically use this technlogy as well.
Keep an eye on key-value storage. It could be coming to a file system near you.
Photo courtesy of Shutterstock.