Storage Focus: File System Fragmentation Page 3
The fragmentation of metadata has only recently become a major issue. As file systems have grown, so has the file system metadata space required as well as the resulting performance issues. One of the first major architectural attempts to improve metadata performance was done by someone at Cray Research back in the late 1980s, and that was the separation of data devices and metadata devices (see this article for more information).
Since that time, most file systems that are used in large environments separate, or at least have the option to separate, data and metadata, and most of the same file systems that log operations also have the ability to separate the log to a separate device.
For most file systems that I have looked at, metadata allocation is sequential, just like data allocation. If you create a file and then another file, you will use sequential inodes. If you then remove a file, that inode is available for a first fit allocation. In removing and adding files from many directories and replacing them, and then doing this over and over again, you have effectively produced random allocations for the metadata.
Running a command such as ls -l would then be required to read the metadata for each file in alphabetical order, and would likely include a small read (often 512 bytes), an average seek and average rotational latency, and another read. Basically, we could do about 142 of those reads per second. Not much different than what we could do with the 8 KB I/O before. Of course, the reason is that the seek and latency time is much higher than the transfer time for the data. So the performance in this case would be quite bad.
Of course, this is all purely theoretical, as there are numerous other factors that play a part, such as RAID cache, RAID readahead, inode readaheads, inode allocation sizes, inode cache, and others. On the other hand, there are other factors that hurt performance, like latency to the terminal and contention. I know of one site where the ls -l in this example took almost 400 seconds before the client dumped the metadata and restored it. This process rewrote the inodes for each file and each directory in order, after which the ls -l command only took only 25 seconds.
Some file systems like NTFS and others have defragmenters that defragment the data space and sometimes the metadata space — notice I do not mention metadata, just the space. (NTFS allocation and fragmentation is a topic unto itself.) The performance degradation caused by fragmentation does not generally happen immediately, but is rather slow and happens over time, making it even more difficult to diagnose. Usually, at some point the file system reaches a state where the performance is at a steady level of poor performance because the file system is as fragmented as it can get.
Some file systems — often those supporting hierarchical storage management (HSM) — support methods of dumping and restoring the file system metadata. When it is restored, it generally is restored in a more optimal way, but this can be very time consuming. Metadata fragmentation is a new area that has just recently come to the forefront. It is going to take some time before file system vendors are able to effectively address this new area.
File system growth has been dramatic over the last few years. I remember a time not that long ago (1996) when 5 TB file systems were a big deal, and now 70 TB file systems are in use. This exceeds Moore’s Law by a long shot. The key points in this case are recognizing the problems for both data and metadata fragmentation and working with the tools and the vendors to get the problems resolved.