Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
The Requirements of the Application
Internal features within file systems, such as being able to preallocate space, are important for both databases and in real-time streaming applications such as video. Preallocation allows an application to ask the file system in advance for contiguous space.
Another impending requirement for database and streaming applications is multithread writes. Some file systems allow multiple writes to be outstanding to the same open file descriptor. This is either accomplished using a threaded application or POSIX asynchronous I/O. Oracle is a good example of an application that wants to be able to have multiple outstanding writes. File systems often prevent an application(s) from opening the same file and allowing multiple outstanding writes. If an application such as Oracle knows what it is doing this is not a problem, but some file systems prevent this function due mainly to the dangerous nature of the activity. With some file systems and/or volume managers you need to buy a special version that allows this functionality.
Shared File System with Homogenous Access
A number of file systems, generally from server vendors, support shared data access. For these file systems one system is designated as the master of the file system metadata, and the other systems are clients to the master. In general file system metadata moves over a TCP/IP network and file system data moves over fibre channel. For these types of file systems, small block writes are usually significantly longer on the clients than the time required on the server. A number of new features and tunables are available in this area, but are best used by an expert.
Shared File System With Heterogeneous Access
Shared heterogeneous access is just plain hard work in this day and age. Fewer companies support this type of product because its complexities, but it is the holy grail that everyone wants and claims they need yesterday. A number of issues exist like data ENDIAN. What if one application writes the data from a big ENDIAN machine and another applications on a little ENDIAN machine tries to read it? With NFS the file's bits are flipped, but in this case the application must know what ENDIAN it was written with and provide for the bit flipping. Some vendors write applications in this way while others, especially home grown applications, are not generally written to accommodate this.
Understanding the requirements for recovery time is very important when considering a file system. Often this is accomplished by using a file system which supports logging. That way, after a crash, the only thing that needs to be checked is the log. Logging is one method of recovering quickly after a crash and has become, in some cases, a requirement.
Equally important is recovery after a major disaster. I have worked on sites that have suffered a power failure because of a storm, and then had the UPS hit by lightning which resulted in critical file system metadata devices getting 'fried.' With some work they were back up and running within 6 hours, but only because the file system they used supported metadata backup.
Plans for backup/HSM
Last but not least, you also need to consider what you need to do for backup and/or use of HSM. Some file systems have internal features that support backing up just the metadata and others have their own internal backup for data and metadata. If you want to have a 30TB file system, think of the amount of time it will take to backup. Even if you had 10 LTO drives running at a sustained rate of 18 MB/sec, not considering load and position time, it would take about 46 hours to perform a complete backup. This is a good example of why I believe that HSM will replace backup for large file systems, as very few sites have a 46-hour window to perform a backup. HSM allows a backup copy of the data on secondary media (tape, disk or in some cases multiple copies on tape disk and off-site). Knowing what your file system support for both backup and HSM based on your operation is of critical importance.
As you can see choosing a file system is not easy and I think it will become harder over the next few years. From working with clients, I am seeing a growing trend of sites moving from a server centric data center, to storage or data centric data center, and the file system controlling the data. This is happening because storage performance has not kept up with server performance, storage density is growing faster than storage performance and use of Fibre Channel allows storage to be farther way than the old SCSI systems allowed.
The file system is at the center of our storage universe. To make informed decisions, you need to know what is available, and what to use.