Whamcloud Building New Lustre Distro
The open source Lustre technology is a parallel file system that is often found in high performance computing (HPC) environments. Users of the file system will soon get community Lustre distribution, thanks to the leadership of startup Whamcloud.
Whamcloud is a venture backed startup that includes veterans from Oracle and Sun, where the Lustre project originated. The reason why Whamcloud is building a Lustre distribution isn't about creating a fork from Oracle, but is about helping to support and expand the Lustre community.
"Since April of 2010 there has been confusion in the community, and we've seen an impact in the business confidence in Lustre," Brent Gorda, CEO and president of Whamcloud told InternetNews.com. "The community has been asking for leadership, the commitment of a for-profit entity that they can rely on for support and a path forward for the technology."
Gorda stressed that his company is not building a 'Whamcloud Lustre distribution' but rather a community Lustre distribution. He noted that Whamcloud will be leading the community in doing a release of an open source project.
Gorda explained that in a typical Lustre installation, a client runs in the compute cluster and mounts the Lustre file system over a high speed Infiniband or 10GbE network. A driver module is loaded into the kernel and the file system is mounted like any other local or network file system. Client applications see a single, unified file system even though that file system is capable of over 200GB Gbytesper second and can be made up of hundreds of servers and tens of thousands of disks.
"A Lustre distribution therefore includes the client-side kernel module and the server software to run on the disk nodes," Gorda said. "It also includes networking software called LNet."
LNet routes storage traffic intelligently, including translating from Infiniband to Ethernet protocols.
"A community distribution starts and ends with the canonical source tree," Gorda said. "Since there isn't a tagged (Lustre) 2.1 release available, the process needs to select a candidate branch first. Then the code is, of course, built and tested and issues resolved."
Gorda added that any patches or code modifications will be pushed back into the canonical tree for inclusion in the mainline.
"By pushing code back toward the canonical tree, this activity is not a fork," Gorda said. "But more to the point, by working with the gatekeepers to ensure the code is acceptable and goes in, we are contributing to improve the mainline itself."
In terms of Whamcloud's relationship with Oracle, Gorda noted that his company has signed the Lustre code contribution agreement and has cooperated regularly with Oracle engineers on multiple technical issues.
"We have Whamcloud employees who were employed by Oracle at one point and this is, quite frankly, an advantage in communication and cooperation between our two companies," Gorda said. "We value the relationship with Oracle. We do not think this move in of itself will or should change our relationship."
Until this point, Whamcloud has been providing support for Lustre and contributing its efforts to the mainline. The community distribution of Lustre that Whamcloud is helping to lead will still be called Lustre as well. Oracle owns the trademark to the name Lustre.
"Since we are in no way claiming this as a Whamcloud product, and we will strictly follow both the letter and the spirit of the GPL, this is not seen as a trademark issue.
The community distribution of Lustre will be built using the upcoming Lustre 2.1 release, which will add new features and capabilities to the open source file system. Gorda noted that the Lustre 2.0 release, which was published last fall, had a number of features pulled from it in order to focus more heavily on stability.
"Lustre has made significant progress in addressing stability issues, though more still needs to be done," Gorda said. "The intention of the 2.1 release, in large part, is to bring some of those yanked features back."
Among the key new features in Lustre 2.1 is improved metadata scalability (SMP scaling). Gorda explained that the metadata scalability is one that could lead to a dramatic increase in performance for metadata operations (file creates/stats).
Moving forward, the Lustre community distribution effort will face a number of challenges. According to Gorda, the biggest challenge in the community release is always testing at scale.
"Since the project is intended to support the world's fastest systems, it has features which require substantial hardware to exercise them," Gorda said. "Over the past few years, Sun (and now Oracle) have been partners in the Hyperion Consortium at Lawrence Livermore National Laboratory (LLNL)."
Gorda noted that the Hyperion Consortium of 10 HPC partners currently has a Dell cluster of 1,152 nodes and a peak Linpack of over 100 Tflops.
"It is a considerable system and easily one of the top 100 systems in the world," Gorda said. "However, this system is considered to be the entry-level test system for finding at-scale issues with Lustre. The engineers would be happier with a much larger system but realize that it is difficult to get time on such a resource."
The Lustre community distribution is being set for a summer 2011 release.