Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
Talk to many data storage experts about high-performance storage and a good portion will bring up Lustre, which was the subject of a recent Lustre Buying Guide. Some of the tips here, therefore, concern Lustre, but not all.
Use Parallel File Systems
Parallel file systems enable more data transfer in shorter time period than their alternatives.
Lustre is an open source parallel file system used heavily in big data workflows in High Performance Computing (HPC). Over half of the largest systems in the world use Lustre, said Laura Shepard, Director of HPC & Life Sciences Marketing, DataDirect Networks (DDN). This includes U.S. government labs like Oakridge National Lab’s Titan, as well as British Petroleum’s system in Houston.https://o1.qnsr.com/log/p.gif?;n=203;c=204650394;s=9477;x=7936;f=201801171506010;u=j;z=TIMESTAMP;a=20392931;e=i
Lustre can deliver over a Terabyte of data per second through the file system. Another example of a parallel file system is the IBM General Parallel File System.
“Parallel file systems allow concurrent requests for data to be made in parallel so work gets done faster,” said Shepard. “This is especially important in HPC where some compute jobs can be so complex, and so extremely data intensive, that they take days or even weeks to run on even the largest supercomputers in the world.”
Parallel File Systems v Hadoop
Hadoop has gotten a lot of ink over the past couple of years. So which is better and which one should be used when?
“If the answer of when to use a file system vs parallel file system vs Hadoop were an easy one there would be a lot fewer books and papers on the topic,” said Shepard. “To massively oversimplify, parallel file systems and map reduce approaches are fundamentally different and this leads to different operation characteristics.”
Her advice is that organizations trying to conquer highly calculation-intensive problems, like modeling and simulation, should gravitate to parallel file systems for their ability to deliver massive amounts of data across multiple compute nodes. Their reliability and the fact that they are POSIX compliant mean they can continue to use their core applications without having to alter them.
That being said, it is increasingly easy to have both. There are a number of Lustre-based solutions that offer interoperability with Hadoop, the latest DDN Lustre appliance EXAScaler, for example.
Coping with Peak Traffic
High performance storage is a fine goal but achieving it consistently can be a challenge. For one thing, the volume of data stored keeps growing. No sooner have you gotten existing storage performance going at a high roar when suddenly the volume doubles. Further, no matter how much performance is improved, it can be hard to maintain it during a sudden surge in traffic.
“High performance sites are pushing for intelligent caching to give some relief from ‘peaky’ IO so they don’t have to buy infrastructure to meet their peak IO requirements, which are only reached once in a while,” said Shepard. “To reduce costs, they are implementing active archive capacity to offload Tier One storage, but still keep data live and accessible.”
Keeping it Simple
Lustre has never had a reputation for being a plug and play type of technology. Increased adoption within companies that cannot afford to employ a handful of PhDs to manage the file system means the requirements and associated use model are changing.
“Simplicity of management is now an often-stated requirement in Request for Proposals (RFPs) and customer requirement discussions, specifically around the need for a fully functional and scalable management graphical user interface, in addition to the ubiquitous command line interface,” said Torben Kling Petersen, Principal Engineer, Seagate Technology.
Build Your Own versus Appliance
Shepard advised first-time parallel file system users to seriously evaluate which path they want to take – build their own or appliance. Decision criteria are going to vary widely, of course, but tend to center on a few pivot points:
* In-house expertise in storage infrastructure in general and parallel file systems in particular.
* Resources for ongoing management and upgrades.
* Converged vs. non-converged implementations (latency and cost are the factors here).
* Support paths (i.e. is it important for you to be able to escalate any issues not only for the hardware and tools, but also through to actual issues in Lustre).
“The path which requires the least in-house expertise is to buy an appliance – everything is sized and configured, one vendor to help / hold accountable, known hardware for adding performance and capacity, etc.,” said Shepard. “The most capable Lustre appliances will be able to keep their users supported on recent versions of the Lustre Master Branch, and allow for expansion of capacity or performance independently so many of the benefits of a la carte are available. However, a lot of organizations want to build deep expertise in house, and these would tend toward a more a la carte approach.”
Greg Schulz, an analyst with StorageIO Group, recommended that in addition to looking at the usual fast back-end storage system using either InfiniBand SRP or Fibre Channel for block access, check out the newer solutions. These include solutions starting to appear with 12Gb SAS native interfaces, as well as 12Gbs back-end drives, including both high performance hard drives, solid state hybrid drives, hybrid hard drives, and high-capacity drives that have surprising good performance.
“Also look at using either solid state drives or even newer generation solid state hybrid drives that have write acceleration for placing metadata to boost the performance of a Lustre environment,” said Schulz.
Better Data Placement
Intel Labs has been heavily involved in the development of Lustre. It has created Differentiated Storage Services (DSS), a capability found only in the Intel Enterprise Edition for Lustre. DSS tags data with information about how the data is being utilized, so that software or hardware-based cache mechanisms know where to place data for optimal performance.
“Storage I/O for applications and workflows can be improved using DSS, with small file and random I/O receiving the biggest boost,” said Brent Gorda, General Manager of the High Performance Data Division, Intel.
Photo courtesy of Shutterstock.