Choosing a File System or Volume Manager
Though file systems have only changed in a methodical evolutionary way over the last 35 years, you have a number of choices that require examination before you can pick the best file system for your application environment. Server systems and software have changed far more radically than file systems. Here are ten areas that you need to consider before picking and implementing a file system.
- File System size requirements
- The underlying RAID/disk topology
- The number of files
- The distribution of file sizes
- The bandwidth and/or IOPS requirements
- The applications requirements
- Shared file system with homogenous access
- Shared file system with heterogeneous access
- Recovery requirements
- Plans for backup/HSM
By examining each of these points you will be able to narrow the number of file system choices you have for each of the server systems under consideration. In some cases the answers to a single question with a single system mean you have only one choice.
File System Size Requirements
In today's world many file systems and volume managers have an internal 2 terabyte file system limit. This limit is currently being changed in a number of file systems and volume managers, but the limit still exists. In many cases a vendor first changes the limits to support over 2 terabytes (TB), but then the file system and volume manager performance suffers as they often use the same techniques for allocation and space representation that were used with the sub 2TB version. The problem is that these techniques do not always scale.
The 2TB limit is also imposed by the SCSI command set, as the current limit for addressing in a single LUN is 2TB.This will likely change over the next few years.
The Underlying RAID/Disk Topology
Knowing the physical layout of the storage is very important in choosing a file system. Some file systems do well with large caches given the application load. Some file systems allow the separation of file system meta data (inodes and superblocks) and logs, and having these separated significantly improve the performance. Knowing what types of device, how to use them based on the file system features and your requirements is part of the whole planning process. You could have the best file system available for your requirements but you need to have the underlying hardware to take advantage of the file system features.
The Number of Files
Knowing how many files and the number of files per directory in some is cases becomes the overriding factor in a choice of file systems. I have some clients that want to have 100,000 files per directory (I do not get good answers as to why, but they want it anyway). Many file systems have extreme performance difficulties with far fewer files per directory (even as low as 10000). Taking it further, I have another client that wants to have 100,000,000 files in a file system. File system features such as data metadata separation become critical with these types of requirements. Personally, right now, I don't believe that any file system can really work efficiently with100,000,000 files in the file system, but who knows - I don't think anyone has performed tests to such limits.
The Distribution of File Sizes
File size distribution is important because of the underlying file system allocation algorithms. Many file systems cannot allocate large amounts of contiguous space. In some file systems the internal allocation cannot be larger than 8K, so if your environment has many large files this could be a problem. On the other hand, if the environment has mostly small files, it will work just fine. As an example, in some databases the file sizes are 2 GB. Having the ability to allocate in 2GB chunks will reduce the overhead within the file system.
The Bandwidth and/or IOPs Requirements
Some applications such as video require high bandwidth I/O, while other applications such as OLTP require high IOPs. Based on each of these requirements some file systems have advantages and disadvantages. Some file systems support automatic direct I/O (I/O that moves from the user space to the device without going through any system cache). This allows high performance I/O as data is not copied twice in memory (once from the user to the system cache and then to the device). This dramatically reduces the amount of CPU required to do the I/O. Other file systems support tunable cache sizes for databases, for reading and writing, and tunable readahead values.