Storage Technology in Depth - DAFS
The emergence of network transports has given rise to new applications and services that take advantage of low-overhead access to the network interface from user or kernel address space; remote direct memory access (RDMA); transport protocol offloading; and, hardware support for event notification and delivery. The Direct Access File System (DAFS) is a new commercial standard for file access over this new class of networks. DAFS grew out of the DAFS Collaborative, an industrial and academic consortium led by Network Appliance and Intel. DAFS file clients are usually applications that link with user-level libraries that implement the file access Application Programming Interface (API). DAFS file servers are implemented in the kernel.
DAFS ClientsDAFS clients use a lightweight Radio Port Controller (RPC) to communicate file requests to servers. In direct read or write operations, the client provides virtual addresses of its source or target memory buffers; and, data transfer is done using RDMA operations. RDMA operations are always issued by the server.
Optimistic DAFS ClientsIn DAFS direct read and write operations, the client always uses an RPC to communicate the file access request along with memory references to client buffers that will be the source or target of a server-issued RDMA transfer. The cost associated with always having to do a file access RPC is manifested as an unnecessarily high latency for small accesses from server memory. A way to reduce this latency is to allow clients to access the server file and virtual memory (VM) cache directly, rather than having to go each time through the server vnode interface via a file access RPC.
An Optimistic DAFS improves on the existing DAFS specification by reducing the number of file access RPC operations needed to initiate file I/O and replaces them with memory accesses using client-issued RDMA. Memory references to server buffers are given out to clients or other servers that maintain cache directories. They are then allowed to use those references to directly issue RDMA operations with server memory. To build cache directories, the server returns to the client a description of buffer locations in its VM cache. These buffer descriptions are returned either as a response to specific queries (i.e., client asks: give me the locations of all your resident pages associated with file'') or piggybacked in the response to a read or write request (i.e., server responds: ``here's the data you asked for, and by the way, these are their memory locations that you can directly use in the future''). In Optimistic DAFS, clients use remote memory references found in their cache directories, but accesses succeed only when directory entries have not become stale. For example, this is the result of actions of the server pageout daemon. There is no explicit notification to invalidate remote memory references previously given out on the network. Instead, remote memory access exceptions thrown around by the target NIC and caught by the initiator NIC, can be used to discover invalid references and a switch to a slower access path using file access RPC. Therefore, by maintaining the NIC memory management unit in the case where RDMA can be remotely initiated by a client at any time is tricky, and needs special NIC and OS support. Finally, an Optimistic DAFS requires maintenance of a directory on file clients (in user-space) and on other servers (in the kernel).
Kernel Support For DAFS ServersSpecial capabilities and requirements of networking transports used by DAFS servers expose a number of kernel design and structure issues. In general, a DAFS file server needs to be able to:
- Do asynchronous file I/O.
- Integrate network and disk I/O event delivery.
- Lock file buffers while RDMA is in progress.
- Avoid memory copies.
Now, lets look at various kernel support mechanisms for DAFS servers. What follows is a description of each kernel support mechanism:
- Event-Driven Design Support: Consists of kernel asynchronous file I/O interfaces and integrating network and file event notification and delivery.
- Vnode Interface Support: P Consists of a vnode interface designed to address these needs.
- VM System Support: Consists of kernel support for memory management of the asymmetric multiprocessor system that consists of the NIC and the host CPU.
- Buffer Cache Locking: Consists of modifications to buffer cache locking.
- Device Driver Support: Consists of device driver requirements of memory-to-memory NIC.
Event-Driven Design SupportAn area of considerable interest in recent years has been that of event-driven application design. Event-driven servers avoid much of the overhead associated with multithreaded designs, but require truly asynchronous interfaces coupled with efficient event notification and delivery mechanisms integrating all types of events. The DAFS server requires such support in the kernel.
Vnode Interface SupportVnode/VFS is a kernel interface that separates generic file-system operations from specific file-system implementations. It was conceived to provide applications with transparent access to kernel file-systems, including network file-system clients such as a Network File System (NFS). The vnode/VFS interface consists of two parts: VFS defines the operations that can be done on a file-system. Vnode defines the operations that can be done on a file within a file-system.