Enterprise SSD and (Much) More: Can File Systems Keep Up? - Page 2
Shingled Magnetic Recording (SMR) drives have been out for a few years and are increasingly popular. Compared to "classic" hard drives, SMR drives reduce the guard space between tracks and overlap the tracks. The figure below is a diagram that illustrates this configuration.
SMR hard drive track layout
The tracks are overlapped so they look like roof shingles, hence the name "Shingled Magnetic Recording".
The diagram shows that the write head writes the data across the entire track (the blue and green stripes in a specific track). But the drive "trims" the data to the width of what the read head actually needs to read the data (this is the blue area of each track). This allows the areal density, the number of bits in a specific unit area, to be increased, giving us more drive capacity (and who doesn't like more drive capacity)
In the case of a write to an SMR drive, the write head will write data using the full width (green), before trimming the data (blue). For example, if the write head in yellow needs to write data to the drive, the write head writes to the current track, such as Track N+1 as in the figure below.
"Disturbed data" track layout
Because of the width of the write head, data is written to Track N+2 as well as N+1. After writing the data, it is trimmed. The post-trimmed data is show as "New Data" and the data it will over-write is shown as "Disturbed Data" and is in red. This data already exists and will be over written if the "New Data" is written.
To get around the problem of data over-write, the drive has to read the data that is about to be over written and write it somewhere else on the drive. After that, the drive can write the new data to a location that, hopefully, doesn't already have data stored in neighboring tracks. Therefore a simple write becomes at least a read-write-write (one read and two writes). I'm not counting the "seeks" that might have to be performed but you can see how a simple data write with existing neighboring data can lead to a large amount of activity adversely impacting the performance.
In an additional scenario, if a block of data is erased there is suddenly a "hole" in the drive that a file system will likely use at some point. There may be data on neighboring tracks surrounding the hole. A typical file system doesn't know that an unused block may or may not have neighboring tracks with data so it may be decide to write into that block. Therefore the disk will experience a read-write-write situation again.
IO elevators within the kernel try to combine different writes to make fewer sequential writes rather than a large number of smaller writes that are possibly random. This can help SMR drives but only to a certain degree. To make writing efficient the elevator needs to store a lot of data and then look for sequential combinations. This results in increased memory usage, increased CPU usage, and increased write latency. The result is that the write speed can be degraded for both sequential write IO and random write IO depending upon where the data is to be written.
How to Handle the Changes?
These are but just two examples of the changes in storage media. With SSD's, the performance is amazing, the media no longer rotates (it's now chips), it can jostle them around without fear of crashing the drive, individual byte writes can be very costly, and they have a limited life albeit a fairly long one. With SMR drives, the density has gone up which is wonderful (more storage capacity), but a small write to an existing piece of data or to a location on the drive where data already exists (or even close to where it exists), causes a performance slow-down.
You can take almost any file system today and use it on an SSD or a SMR drive. But it might not work as well as you hope because of the characteristics of the storage. File systems are adapting to storage media but they can only go so far without having to change. The question we face is: should we keep adapting current file systems to new storage or should be call a time-out and develop something new?
I am of the personal opinion that while we still have file systems that work with current storage, we should be looking to develop something new. This "something new" should take advantage of the storage technology to improve performance and ease of use.
For example, SMR drives need a file system that has some awareness that it is much more of sequential media than a random IO media. This may mean doing something with the media data as well since that can appear like random IO's to a drive, especially if you allow the file system to update the access time every time the file is accessed.
In the case of SSD's we need to rethink file systems so we don't try to force the classic rotating mentality to "chip" based storage. While file systems and tools are adapting to SSD's, SSD's are evolving as well. Also remember that SSD's have the problem that if you update a single byte in a block, then the whole block needs to be re-written. This also means that random IO's should be re-thought so to limit this behavior (Note: current SSD's have techniques to limit the re-write issue but they don't completely solve the problem).
We can even start to think a it more outside of the box and develop a specific file system for each storage type (e.g. SMR, SSD). However, we also need them to work together in case we have a combination within a single system. For example, could SMR drives just hold large sequential chunks of data while the metadata and small files are stored on SSD's? We might get the best of both worlds – inexpensive capacity (SMR drives) and very fast performance (SSD's).
To achieve this, do we need to create some sort of "segmented" file system that adapts to various storage types that also adapts to having multiple segments in use at the same time? Is there a common interface between segment types so that people can write file systems that adpat to the storage?
While I'm not a huge fan of POSIX in the fact that it can really limit parallel performance, I still think POSIX capability is needed. But we should take the opportunity to offer additional options for applications. For example, can we offer a simple key-value IO interface? Or can we offer a simple database interface?
Given how long it takes to design a file system, write it, and stabilize it, we, as a community need to start thinking about what we want for the future and what it should look like.