Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Storage technology is evolving extremely rapidly but our file systems are not. Is it time to re-think files systems so we can take advantage of this new technology?
Enterprise SSD - And Much More
Perhaps it's because I'm getting older but it seems like things are changing faster every year (my wife tells me it's because I'm getting older).
Regardless of the cause, things are changing quickly, particularly in storage world. SSD's are becoming very popular, pushing out spinning drives. We even have SSD's coming with 60TB's of capacity (link). Enterprise SSD's are quickly impacting enterprise storage designs. Spinning drives are now coming in massive capacities courtesy of Shingled Memory Recording (SMR) technology that come in 10TB and 12TB capacities. Non-Volatile memory is on the horizon. Yet, some aspects of storage are not changing or not changing much at all to adapt to these new technologies.
As I'm sure everyone knows, storage is not just the storage device, or the network interfaces, or the controller. File systems are also part of storage. Management tools, network protocols, and even processes are part of storage. As with any new technology in computation/storage, new technology can move the bottleneck to a different place in the solution as a whole.
Then some new technology comes along and pushes the bottleneck out or moves the bottleneck to a different location in the solution. The important thing to realize is that if technology pushes a great deal in one direction, the other aspects need to adapt or they become a bottleneck.
I believe we are a point in storage where aspects of storage need to evolve or change and to do it quickly. One of those aspects is the file system.
Before jumping into file systems moving forward, let's look at file systems of the past. I won't go too far back only to where spinning disks became popular. This includes hard drives and floppy drives. This is rotating media where a mechanical "head" reads and writes information to the media. By looking down at the drive you can think of it as a cylinder. The "tracks," which is where the data is written as concentric circles on the media, this is where the terms "cylinder," "track," "head", etc. come from. However, the physical layout of today's hard drives don't correspond to the geometry anyway. It is simply used as a vestige of the past.
File systems store the data and the metadata (data about the data). In general, Linux views all file systems using a common set of objects: (1) superblock, (2) inode, (3) dentry, and (4) file. The superblock is at the root of the file system and maintains state for the file system.
The inode, one of the most fundamental parts of a file system, represents all files and directory objects. It contains all the metadata to manage the objects including the permissions. Dentries are used to translate between inodes and names. Typically there is a directory cache that keeps the most recently used translations around (faster lookups). It also maintains the relationships between directories and files.
And finally, a VFS (Virtual File System) file represents an open file. This is done to keep the state for an open file as the write offset, etc.
Enterprise SSD – A Look Inside
Solid State Disks (SSD's) use solid-state storage instead of spinning disks for storing data. The drive is constructed of flash memory. The most common cell used in SSD's is the NAND Flash. For this device, the transistors are connected in series. These groups are then connected in a NOR style where each line is connected directly to ground and the other is connected to a bit line. This arrangement has advantages for cost, density, and power as well as performance but it is a compromise as we shall see.
In NAND Flash memory the cells are first arranged into pages. Typically, a page is 4KB in size. Then the pages are combined to form a block. A block, illustrated below in Figure 2, is many times formed from 128 pages giving a block a size of 512KB.
BlockView of NAND Flash
The blocks are combined into a plane. In many cases, a total of 1,024 blocks are combined into a plane, giving it a size of 512MB as show in Figure 3.
Plane View of NAND Flash
There will almost always be multiple planes in a single NAND-flash die. Manufacturers will also put several dies into a single NAND flash chip. Then you can put multiple chips in a single drive.
Data can be read from SSD's a single bit at a time but writing data to the SSD has to be done in blocks. This is a result of the chip layout and can cause issues with write operations to SSD's. To write even a single bit to a specific block on the SSD that contains data, requires that the data on the targeted block is first read and stored in memory. Then the block data in memory is updated with the new data and then written to an open block on the drive.
As you can imagine doing write IOPS to SSD's can be a difficult task, especially if the data is spread all over the blocks. While SSD's naturally have amazing performance to begin with, the probability of doing lots of updates to blocks can get fairly large. Drive manufacturers use all sorts of techniques to improve the data update on SSD's but the interaction between the hardware design and the file system still causes problems.