The network file system (NFS) protocol is getting its biggest overhaul in more than a decade, and the results could be profound for end users (see The Future of NFS Arrives).
Version 4.1 of NFS, developed by a team of veterans from various storage interests, promises to unlock new performance and security capabilities, particularly for enterprise data centers.
NFS was originally designed to solve the problem of remote access to home directories and supporting diskless workstations and servers over local area networks. With the advent of cheaper high-performance computing in the form of Linux compute clusters, multi-core processors and blades, the demands for higher performance file access have risen sharply. It’s no wonder that a protocol designed for 1984 speeds would be unable to cope.
“NFS is getting pressure from clustered file systems like Lustre and GPFS, as well as custom file systems produced by Web 2.0 service providers such as Google GFS,” said Mike Eisler, senior technical director at NetApp (NASDAQ: NTAP).
The latest makeover to this time-honored distributed file system protocol provides all the same features as before: straightforward design, simplified error recovery, and independence of transport protocols and operating systems for file access. Unlike earlier versions of NFS, however, it now integrates file locking, has stronger security, and includes delegation capabilities to enhance client performance for data sharing applications on high-bandwidth networks.
pNFS Changes the Storage World
pNFS is a key feature of NFS 4.1. The p in pNFS stands for parallel, and pNFS will provide parallel I/O to file systems accessible over NFS. It enables the storage administrator to do things like stripe a single file across multiple NFS servers. This is equivalent to RAID 0, which boosts performance by allowing multiple disk drives to serve up data in parallel. pNFS takes the concept and extends it to multiple storage devices connected to the NFS client over a network.
“Even for files too small to stripe, those files can be distributed across multiple NFS servers, which provides statistical load balancing,” said Eisler. “With a capable cluster of NFS servers and a back-end file system, files or ranges within files can be relocated transparent to the applications accessing data over pNFS.”
pNFS, then, represents the first major performance upgrade to NFS in more than a decade. It achieves this through the standardization of parallel I/O and by allowing clients to access storage devices directly and in parallel. This eliminates the scalability and performance issues associated with NFS servers deployed today.
“pNFS moves the metadata server out of the data transfer path,” said Matt Reid, director, of product marketing at Panasas. “Instead of having to deal with a number of competing proprietary and open-source parallel file systems, pNFS will allow users to reap the benefits of parallel storage systems by choosing best-of-breed solutions without having concerns over vendor lock-in.”
Security, too, is beefed up in version 4. NFSv4 features an improved security model, with tightly integrated security mechanisms that are comparable to CIFS. In addition, the unified namespace feature will enable the aggregation of large numbers of heterogeneous NFS servers under a single namespace. While NFSv3 servers’ access control was limited to groups and users, the newest version includes access control of individual files or applications. Finally, file and directory delegations will allow a greater number of NFS clients to access a single NFS share.
As a potential downside, pNFS requires users to deal with multiple NFS servers. Depending on how well the NFS vendor integrates these servers, the management overhead may scale with the number of storage devices that are configured for parallel access. Thus it is best to purchase tightly integrated servers that provide a true single system image as opposed to a cluster of servers that are merely duct-taped together.
“For environments that need improved performance for large, sequential files where parallel access is needed, pNFS has some legs moving forward,” said Greg Schulz, senior analyst and founder of StorageIO Group. “However, not every environment or application needs parallel access to data.”
Schulz believes that backup could possibly benefit from pNFS. But that would require being able to read data fast enough to write the data in parallel to a parallel file or clustered file system.
“For mainstream markets, pNFS will most likely remain a niche, at least near term, being used for specialized applications and environments,” said Schulz. “The big benefactors of pNFS will be the parallel file systems like those from IBM, Sun, IBRIX and others as well as proprietary storage systems like those from Panasas.”
Vendors Get to Work
NetApp is shipping NFS 4.0 now, and plans to provide an NFS 4.1 server later this year or early next year. In all likelihood, this will begin with pNFS being included on a future release of NetApp’s Data ONTAP operating system. In parallel with that, the NetApp Linux NFS development team is producing a pNFS client and server for Linux so that customers can try out pNFS before Data ONTAP provides it.
Like NetApp, Panasas has been heavily involved with pNFS. It provided source code to the Linux NFS client and server, as well as to the Linux object storage driver, iSCSI driver, and SCSIstack.
“The architecture for the pNFS proposal was derived from the Panasas DirectFLOW parallel protocol, which is a core component utilized by the Panasas PanFS parallel file system,” said Reid. “The DirectFLOW protocol used in our ActiveStor Parallel Storage Clusters currently provides essentially all of the functionality expected to be available in the NFS 4.1 later this year.”
EMC Celerra, meanwhile, already supports NFS 4.0. The company plans to include server support for pNFS via Celerra in the near future, as well as assisting in the implementation of the Linux pNFS client, according to Sorin Faibish, senior technologist for the NAS Storage Group at EMC (NYSE: EMC).
Oracle (NASDAQ: ORCL) is also getting in on the act. It has developed dNFS (Direct NFS), which is an NFS client built into Oracle 11g. This solves several problems for Oracle customers. It eliminates the need to fiddle with NFS mount setting on the native NFS client, as well as any worries about the NFS client vendor making changes to the native NFS client that could negatively affect the operation of Oracle. This also opens the door to greater platform support. If customers prefer to use NFS without Direct NFS, for example, they cannot use Windows Server 2003 to host Oracle because of the lack of native NFS client support. By providing Direct NFS inside of Oracle’s database product, customers have more flexibility in selecting platforms for Oracle deployments, and where needed or desired, can more easily switch among UNIX, Linux, and Windows.
Reid sees the arrival of pNFS as signaling a major shift in the storage world — a move into a parallel universe, if you will.
“pNFS will accelerate the adoption of parallel storage to a broad range of customers, from traditional HPC, to commercial HPC, to Web infrastructure and media distribution,” said Reid. “The future of file storage is parallel, and NFS V 4.1 is the start of this major industry transformation.”