Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Will This Happen to My File System?
Will fragmentation occur on your file system? While the answer to this question is a definite yes, it isn’t the critical question you should be asking. The real question that needs to be asked is not will this happen, but rather will you see the effect. If your applications only require a limited amount of the I/O bandwidth available, then when your file system does get fragmented, you’ll often still have the available bandwidth to meet your I/O requirements. This is what is generally referred to as architecting or engineering to peak.
In this case, if your peak performance is based on a completely fragmented file system, you will not have a problem. What often happens is people buy far more I/O bandwidth than is really needed for the applications they run. One of the underlying reasons for this in my opinion is fragmentation.
If you allocate data sequentially on a modern 15K RPM disk drive and you read it back, you can easily achieve over 50 MB/sec on average reading the entire device, but what if the data was not read sequentially? For example, consider an application that reads 8 KB blocks completely randomly, and the device has an average seek time plus average rotational latency of .007 seconds:
|1/||(8192/||(1024 * 1024 * 50)||+||0.007))|
|Block size||50 MB/sec transfer||Average seek + latency|
In this case, you could only read about 140 8 KB blocks per second, or 1.09 MB/sec (1/45th) of what you would be reading on average for a sequential I/O. Of course, this is the reason we have multiple 200 MB/sec channels and large numbers of disk drives in our systems. Almost all applications cannot use nor read/write data using the multiple 200 MB/sec channels, but they are needed to support the small random I/Os that are generated in many applications.
As data sections of file systems grow in size, searching for information becomes more of a time-sensitive issue with files being created and removed more often. For small files this is typically not a problem, as small files often fit in a single file system allocation block. Large files, on the other hand, can be a big problem.
The definition of a large file depends on the underlying allocation size within the file system. If you have a file system where the smallest allocation is 25 MB and your average files are 100 MB, then I consider this a small file rather than a large one, as you only need to go back to the allocation routines at most 4 times.
On the other hand, if your smallest allocations are 4 KB, then I would consider the same 100 MB file a large file given the number of times you’re likely to have to go back to the allocation routines. Each time you go back to the allocation routines to search for free space, you are competing with other processes for space allocation.
Remember, space allocation is most often first fit, so the space you get depends on what files have already been allocated and what other processes are asking for space. This all goes back to my original conjecture that most applications either do not have a need for full bandwidth from the file system and/or most architectures are developed with enough disk drives and channel IOPS (Input/Output operations per second) to support the needs of most applications.