Salt or SaltStack is a Python-based open source configuration management tool. It is very modular, with Python modules handling certain aspects of the available Salt systems (target nodes to be managed). These modules allow for the interactions within Salt to be detached and modified to suit the needs of a developer or system administrator. There are six module groups:
- Execution Modules. This group of modules represent the functions that are available for direct execution from the remote execution engine.
- State Modules. The State Modules group contains the components that make up the backend for the Salt configuration management system.
- Grains. Grain Modules constitute a system for detecting static information about a system and storing it in RAM.
- Renderer Modules. The Renderer Modules are used to render the information passed to the Salt state system (i.e. serialize it).
- Returners. This group of modules handles the arbitrary return locations (from remote execution calls).
- Runners. These modules are master side convenience applications executed by the salt-run command.
Notice that these tools fall into a general category of configuration management of systems. They are used to configuring, deploying and maintaining systems. This is done by describing what is to be installed on the systems as well as methods for telling the management server what the target node resources look like. This approach allows more servers to be managed by a single person. It also means that servers can be allocated and then changed depending upon the situation.
Impact on Storage?
DevOps is really focused on collaboration between development teams, QA and IT operations. This typically falls into the realm of servers and application development, but it doesn't have to. It can also have a sizable impact on storage.
One of the benefits of DevOps is that it integrates the various teams that develop and deploy applications and infrastructure change. This can easily apply to storage as well as servers. All too often one team tests updates for storage servers, including working with the manufacturer, and another team does the actual deployment. In a DevOps world, administrators who test updates for storage systems are likely to be the same administrators who do the final deployment. This way they have more experience with the changes, including any quirks.
The Research Computing Group at Harvard supports hundreds if not thousands of researchers and a tremendous amount of research data. They support multiple research groups that cover a range of HPC and non-HPC hardware. What they have done is to utilize DevOps to de-centralize the data storage, making it more robust and easier for the researchers to use.
They have a relatively small staff, yet they deploy, manage and update a very large number of NFS servers. The exact number is increasing all the time, and I wouldn't be surprised if it over 100. This is the opposite of what traditional best practices would dictate — the creation of a large centralized NFS server to reduce the amount of management, monitoring and overhead. What the Harvard RC team discovered is that using DevOps, they could use the same number of people to deploy and administer a much larger number of NFS servers.
Recall that the tools previously mentioned—Chef, Puppet, Ansible, and Salt—are all focused on deploying servers. The NFS servers that the RC staff created are just simple servers with attached storage running Linux. The DevOps tools can easily deploy these servers in large quantities. They can also update them as needed. All of these can be done by the same number of staff that administered their large centralized storage system.
They standardized on a simple two-socket server with a number of drive bays. For example, the Dell R730 has either 16x 2.5" drive bays with up to 29TB (1.8TB 2.5" drives), or 8x 3.5" drive bays with up to 48TB using 6TB drives. Using a simple RAID controller in RAID-6 you can get 36TB of useable space. The R530 is very similar to the R730. There is also the Dell R730xd which can accommodate up to 24x 2.5" drives (up to 43.2TB or 39.6TB using RAID-6) or twelve (12) 3.5" drive bays (up to 72TB or 60TB of RAID-6 capacity). Other vendors have somewhat similar systems. These systems can be used for small groups managing storage via NFS.
The researchers in the group share the storage as working space for the applications. The RC staff has found that the capacity of these simple 2U servers is more than enough for these groups. In addition, many of the applications run by these researchers perform very well by NFS. There are some applications that need higher performing storage, and for that they have a Lustre file system. But for a very large percentage of researchers, the NFS storage works very well.
As the applications are run, the NFS storage is mounted on the appropriate servers. The researchers on the team all have access to the data they need. The RC staff operates a centralized backup/archive system where researchers can store their data in a redundant manner. Moving data between the centralized archive and the NFS servers is fairly fast using a high-speed network.
One benefit of using distributed NFS servers is that if one of them fails or is off-line for some reason, not all researchers lose access to their data. For example, if there are 40 NFS servers and one is lost, then approximately only 2.5 percent of the researchers lose access to their data. Compare this to a setup with a single NFS server, where all researchers lose access to their data.
Also, if they allocate jobs to nodes that are close to the NFS storage, then the performance is improved (less latency). This means their data is more localized to where the compute is allocated. This can be done using current resource managers (a.k.a. job schedulers).
If a research group needs more capacity than a single NFS server can provide, then it's fairly easy to either add external storage to the server, or split the group into smaller teams. In the second case, the added workload on the staff is almost negligible because it is configured using one of the DevOps tools.
The Benefit of DevOps for Storage
DevOps is having a large impact on the development and deployment of applications. Integrated teams work together to perform all of the necessary steps for development, testing, deployment and on-going maintenance of the application. This allows the teams to deploy applications faster or to increase the number of deployments they can perform. Tools such as Chef, Puppet, Ansible, or Salt can be used for quick and easy deployment of the infrastructure necessary for application development.
These same tools can be used for storage as well. The example of Harvard's Research Computing (RC) group illustrates what can be accomplished by applying DevOps to storage. This is just but one example; I'm sure there are others. Take a look at these tools and see if they can be used in your storage infrastructure. I think you will be pleasantly surprised.
Photo courtesy of Shutterstock.