Storage Focus: File System Fragmentation
As file systems have continued to grow in size, fragmentation has become an increasingly important issue. It wasn’t all that long ago that we were living with measly 2 GB file systems and even smaller files. And while vendors were next able to provide file systems up to 2 TB, file sizes were still often still limited to 2 GB.
Now as we approach the middle of the decade, large file system sizes are in many cases 50-100 TB, and some environments are looking to have 500 TB to multiple petabyte file systems within a few years. Of course, several things are going to have to change to allow this to allow this to happen. One of the developments I believe will have to occur is the replacement of the SCSI standard with Object Storage Devices (OSD) (see www.t10.org/ftp/t10/document.03/03-275r0.pdf for more information), but we’re still a number of years away from seeing end-to-end OSD products in the mainstream.
My conjecture regarding fragmentation and the ever-increasing size of file systems is:
File system fragmentation grows with file system size, and given that the sizes of file system are increasing, end users are experiencing performance problems even if they do not always realize it.
Working off this premise brings up a number of issues:
- What is file system fragmentation?
- Will this happen to my file system?
- Can it be dealt with, and if so, how?
What is File System Fragmentation?
In the simplest terms, I would define file system fragmentation as any access to any data that is expected to be sequential on the device but is not. This is a broad definition, and covers a number of areas of data access within a file system.
In the Fibre Channel/SCSI world we currently live in, disk or RAID LUNs are simple block devices. This means that they address the first block of the LUN as block 0 and the last block as a number less than 4,294,967,296, which is 2 TB in 512 byte blocks. Note that 2 TB is the current limit on the largest SCSI LUN that can be created, and 512 bytes is the size of the formatted disk sector.
Almost all file systems allocate data based on a first fit algorithm within either the LUN or groups of LUNs used for a round-robin allocation (see this article for more information) First fit means exactly that — find the first hole in the file system and place the data in that location.
Remember, file systems communicate with the storage device, and within that storage all allocations are on 512 byte boundaries. File systems can only address data less than 512 bytes and/or non-aligned 512 byte boundaries by reading in the data on a 512 boundary.
File system metadata is another issue. There are often three parts of the metadata equation that need to be addressed:
- The file system inodes
- The allocation blocks that are used when the space for allocation within the inode is exceeded
- The file system internals, often called the superblock, that can be fragmented in some file systems if the file system grows in size
Metadata is an important and often overlooked area for fragmentation, and this especially true for file systems that support HSM (Hierarchical Storage Management), as they often have 10s of millions of files and sometimes up to 100+ million files.