Storage King for a Day: Dreaming of Storage Page 2
High Performance and Predictive Scaling
Some newer NAS products do scale reasonably well, but you are currently limited to 1 Gbit connections (some new 10 Gbit host cards are out, but even at PCI-X 133, they cannot be used efficiently). Most sites requiring multiple gigabytes of performance solve the high performance problem by using Fibre Channel-attached storage. Given the TCP/IP overhead and NFS, this is not possible with NAS, as even 100 MB/sec from a single host is nearly impossible.
For the most part, file systems do not scale linearly. There are many reasons for this lack of scaling, including:
- Sometimes the cause is the file system itself, as the internal algorithms for free space lookup, metadata lookup, file and directory names, and other areas do not scale linearly
- Sometimes the cause is due to the applications using the file system utilizing significant system overhead (see this article for more information)
Each of these areas can be mitigated by tuning the file system and tuning the applications, but what about the RAID? The RAID device is a block device (at least for now) that reads ahead blocks based on sequential addresses and writes behind blocks based on sequential addresses. The RAID and the file system have no communication about the topology of the data that you are using. All the RAID knows are simple block counts.
If the file system does not place data in sequential block order on the RAID, the RAID cannot know how to efficiently operate. The SCSI protocol does not provide a way of passing the data topology to the RAID, so if the data is not read sequentially and allocated sequentially, the RAID operates inefficiently, which means that scaling with the hardware is not really possible.
Even if the addresses are not allocated sequentially, most RAID devices still try and readahead, but this adds overhead, as you are reading data that you will not use, which of course reduces the RAID performance. A new device allocation method will be developed over the next few years that uses objects. This method is now in the process of being standardized. This development should help, but the file system will still need to communicate with the object, and work on that end is far in the future at best.