Performance Analysis for File Systems: NFS vs. Parallel
I was recently at a customer site working on a problem they had. This group was a customer of a large NAS vendor (no vendor names used in my articles) and had many PB of NFS attached storage. They were looking at potentially purchasing a new parallel file system, and part of my job was to help them characterize the workload. The two leading parallel file systems today are GPFS from IBM and Lustre which is supported by a number of vendors and is open source.
Characterizing the workload, of course, sounds pretty easy, and of course, it is not.
The way this group did performance analysis was to look at the statistics from the NFS server statistics. If you are buying an NFS server, that is likely a good approach, but if you are moving up the food chain and looking at purchasing a parallel system to meet the scalability requirements that cannot be archived with NFS, then think again.
A different approach was going to be needed for some good reasons.
The key difference between NFS and a parallel file system is that what happens in the data path is totally different. So the performance analysis techniques that you might have used with NFS are not the right techniques for a parallel file system.
The NFS path eventually goes over the TCP to the NFS server, and yes, you could use UDP but given the reliability, I am not sure if anyone does that any more. For NFSv2 or NFSv3, the default values for both parameters is set to 8192 bytes. For NFSv4, the default values for both parameters is set to 32768. You can set these values even larger, but the values will be negotiated by the server to the maximum the server supports. So setting them at say 1,048,576 might give you the warm comforting feeling that you can do 1 MiB I/O requests to your NFS server, but you might be making 16384 byte requests because that is all the server supports.
In some ways, this is not any different than a parallel file system, as the client might only be allowed to do I/O requests in the allocation size of the parallel file system. The big difference is that allocation sizes for parallel file systems are generally bigger than what is supported and negotiated by the NFS server.
There are a few key issues:
- Request sizes
- NFS vs. a parallel file system for metadata
- What needs to be done on the NFS client to understand what will happen on a parallel file system
As I have described, request sizes from the client to the NFS file system can be very different than what might be seen on a parallel file system. Big requests are important for disk drives to operate efficiently.
Here is an example of what happens if each I/O of the size on the left is followed by a random seek and latency followed by another I/O of the size. The columns on the right show the disk drive efficiency. I could only get this for Seagate drives as other vendors do not publish detailed information, and I have only shown you two drive types as that is all that will fit.
Clearly the I/O sizes for NFS—even default NFSv4 sizes—are horrible with less than 4 percent utilization, even with 2.5-inch 15K RPM drives. Of course, on the NFS server side there is I/O being coalesced into larger requests, but that takes work on the server side. To get I/O from the same file together on the disk requires lots of cache and therefore expense if you have many clients.
For a parallel file system I/O requests made to the file system servers are generally the size of the I/O request from the application, which for the two largest file systems in terms of market share can be over 1 MiB and up to in one case 16 MiB.
The bottom line is that a parallel file system will allow larger requests if the application can be changed or already makes larger requests than NFS.