DevOps represents a cultural shift in how applications are developed and deployed. It can also be used in server deployment. But can it be used with storage?
DevOps (development and operations) refers to a software development method that includes the roles of software developers (dev) and other IT personnel with an emphasis on IT operations (Ops). DevOps is generally an enterprise software development phrase that emphasizes communication, collaboration, integration and automation. The goal of DevOps is to help an organization rapidly produce software products and services and to improve operations performance.
In the classic organization, the various teams, such as software development, quality assurance (QA) and IT operations are separated with very formal communications between them. DevOps creates a multidisciplinary approach so that the teams work together to develop products and services more rapidly. This involves a change in the processes and methods used for communication and collaboration between the various teams.
Generally DevOps is applied to developing products and services from an IT organization. It has bled into data center operations and system administration in a general sense. By some estimates it is used by 33 to 55 percent of organizations and is quickly gaining dominance as the development and deployment method.
Example of DevOps
In the classic development pattern, there are separate teams that handle specific tasks. A team of developers writes the application, doing limited testing to make sure it works to some degree. Then they pass it along to testers (QA) who will run unit tests and more comprehensive tests against the application. The hardware and development used by these two groups is likely to be different which each group owning their own. Hopefully, there has been some coordination between the two groups so that they at least use the same OS and set of tools.
There may be a few releases candidates between the developers and QA, but once they have achieved their requirements, the product is turned over to release managers who then coordinate "alpha" or "beta" testing, first with internal customers and then possibly with external customers. There may be some feedback to the developers, who make bug fixes, changes and updates. Once that is finished, the QA team does another review, and the code is labeled as ready.
Next the code is released to the sysadmins who are going to deploy it. Ideally, they served as testers in the previous cycle, but that's not always true. Moreover, the production hardware and software may be different than what the development and QA team used. Obviously, you want them to be as similar as possible, but there are almost always differences.
DevOps takes this process and creates a single organization that is responsible for all of the necessary tasks—development, QA, test, and sysadmins—with the focus on quickly delivering a better product. Rather than a small group of people focused solely on development, you have people who do development, test and QA, and who are also sysadmins all at the same time. A phrase you commonly hear for people in DevOps teams is "sysadmin coders." This "combined team" may divide up the necessary tasks, but they communicate with each other frequently to coordinate the entire process.
Using DevOps allows the pace of development to increase, which in turn results in applications being deployed faster and usually with better quality. It can also mean that the same number of people can produce more releases or updates.
At its core, DevOps is really a cultural change in how teams work together. This requires a change in the organization, a change in the behavior from team members, and a change in how management approaches the merge into DevOps. Everyone has to be part of the change and agree to it. As with any good cultural change in IT, there are some tools that can help with the transition.
When you start talking about DevOps there are a number of tools that come up in the conversation. While not an exhaustive list, tools such as Chef, Puppet, Ansible, and Salt are some of the top-mentioned tools. There are others, but these are the ones you encounter in polite conversation. It is definitely worthwhile briefly reviewing these tools.
Chef is written in Ruby and Erlang and streamlines the task of configuring and maintaining servers. It works with on-premise servers as well as cloud platforms such as those from Rackspace, Amazon EC2, Openstack, Microsoft Azure, IBM Softlayer, and Google Cloud Platform. The tools are open source, and there is a "Chef Company" that provides support to people using Chef.
You write "recipes" in Chef (written in what is basically Ruby) that describe how Chef manages server applications and utilities and how they are configured. They describe a series of resources that should be in a particular state. This means what packages are installed, what services should be running, or which files should be written. You can specify the specific versions of packages and the order in which packages are installed based on package dependencies.
When used with servers, Chef runs in a client/server model. The Chef client sends various attributes about the particular server to the Chef server. The server uses Solr to index these attributes and provides an API for clients to query this information. These queries can be done from Chef clients, and the resulting data is used to help configure the server.
Puppet runs along the lines of Chef in helping with the configuration and maintenance of servers. It is open source and focused on configuration management, particularly the management of servers. It too is written in Ruby and has its own language for creating node configurations (you can also use Ruby). These are called "manifests" and are similar to Chef's recipes.
The system information is discovered via utility named Facter. This tool compiles the Puppet manifests into a system-specific catalog that contains the resources and resource dependencies (very similar to Chef) which are applied against the target systems. The manifests contain a high-level description of the system aspects such as users, groups, services, and packages without the having to use system specific commands such as yum or apt-get or rpm.
While Puppet is open source, there is a company named Puppet Labs that can provide support for people using Puppet.
Ansible is an open source tool like Chef and Puppet. It is classically used for configuring and managing compute systems. It uses SSH or PowerShell and requires Python 2.4 or later to be installed on the servers. The various modules that make up Ansible use JSON and standard output. The module is installed on the target node (the system to be controlled) and communicates with the Ansible server node using JSON over standard output. The modules can be written in virtually any programming language that is scriptable, such as Python, Perl, Ruby, and Bash.
The system uses YAML to express reusable descriptions of systems. These descriptions are called Playbooks, and each Playbook maps a group of hosts to a set of roles. Each role is represented by calls to Ansible call tasks. As with Chef, in addition to on-premise systems, Ansible can be used with various private and public cloud providers such as VMware, OpenStack, AWS, Rackspace Cloud Servers, Eucalyptus Cloud, KVM, XenServer, IBM SoftLayer, Microsoft Azure and Cloud
Salt or SaltStack is a Python-based open source configuration management tool. It is very modular, with Python modules handling certain aspects of the available Salt systems (target nodes to be managed). These modules allow for the interactions within Salt to be detached and modified to suit the needs of a developer or system administrator. There are six module groups:
- Execution Modules. This group of modules represent the functions that are available for direct execution from the remote execution engine.
- State Modules. The State Modules group contains the components that make up the backend for the Salt configuration management system.
- Grains. Grain Modules constitute a system for detecting static information about a system and storing it in RAM.
- Renderer Modules. The Renderer Modules are used to render the information passed to the Salt state system (i.e. serialize it).
- Returners. This group of modules handles the arbitrary return locations (from remote execution calls).
- Runners. These modules are master side convenience applications executed by the salt-run command.
Notice that these tools fall into a general category of configuration management of systems. They are used to configuring, deploying and maintaining systems. This is done by describing what is to be installed on the systems as well as methods for telling the management server what the target node resources look like. This approach allows more servers to be managed by a single person. It also means that servers can be allocated and then changed depending upon the situation.
Impact on Storage?
DevOps is really focused on collaboration between development teams, QA and IT operations. This typically falls into the realm of servers and application development, but it doesn't have to. It can also have a sizable impact on storage.
One of the benefits of DevOps is that it integrates the various teams that develop and deploy applications and infrastructure change. This can easily apply to storage as well as servers. All too often one team tests updates for storage servers, including working with the manufacturer, and another team does the actual deployment. In a DevOps world, administrators who test updates for storage systems are likely to be the same administrators who do the final deployment. This way they have more experience with the changes, including any quirks.
The Research Computing Group at Harvard supports hundreds if not thousands of researchers and a tremendous amount of research data. They support multiple research groups that cover a range of HPC and non-HPC hardware. What they have done is to utilize DevOps to de-centralize the data storage, making it more robust and easier for the researchers to use.
They have a relatively small staff, yet they deploy, manage and update a very large number of NFS servers. The exact number is increasing all the time, and I wouldn't be surprised if it over 100. This is the opposite of what traditional best practices would dictate — the creation of a large centralized NFS server to reduce the amount of management, monitoring and overhead. What the Harvard RC team discovered is that using DevOps, they could use the same number of people to deploy and administer a much larger number of NFS servers.
Recall that the tools previously mentioned—Chef, Puppet, Ansible, and Salt—are all focused on deploying servers. The NFS servers that the RC staff created are just simple servers with attached storage running Linux. The DevOps tools can easily deploy these servers in large quantities. They can also update them as needed. All of these can be done by the same number of staff that administered their large centralized storage system.
They standardized on a simple two-socket server with a number of drive bays. For example, the Dell R730 has either 16x 2.5" drive bays with up to 29TB (1.8TB 2.5" drives), or 8x 3.5" drive bays with up to 48TB using 6TB drives. Using a simple RAID controller in RAID-6 you can get 36TB of useable space. The R530 is very similar to the R730. There is also the Dell R730xd which can accommodate up to 24x 2.5" drives (up to 43.2TB or 39.6TB using RAID-6) or twelve (12) 3.5" drive bays (up to 72TB or 60TB of RAID-6 capacity). Other vendors have somewhat similar systems. These systems can be used for small groups managing storage via NFS.
The researchers in the group share the storage as working space for the applications. The RC staff has found that the capacity of these simple 2U servers is more than enough for these groups. In addition, many of the applications run by these researchers perform very well by NFS. There are some applications that need higher performing storage, and for that they have a Lustre file system. But for a very large percentage of researchers, the NFS storage works very well.
As the applications are run, the NFS storage is mounted on the appropriate servers. The researchers on the team all have access to the data they need. The RC staff operates a centralized backup/archive system where researchers can store their data in a redundant manner. Moving data between the centralized archive and the NFS servers is fairly fast using a high-speed network.
One benefit of using distributed NFS servers is that if one of them fails or is off-line for some reason, not all researchers lose access to their data. For example, if there are 40 NFS servers and one is lost, then approximately only 2.5 percent of the researchers lose access to their data. Compare this to a setup with a single NFS server, where all researchers lose access to their data.
Also, if they allocate jobs to nodes that are close to the NFS storage, then the performance is improved (less latency). This means their data is more localized to where the compute is allocated. This can be done using current resource managers (a.k.a. job schedulers).
If a research group needs more capacity than a single NFS server can provide, then it's fairly easy to either add external storage to the server, or split the group into smaller teams. In the second case, the added workload on the staff is almost negligible because it is configured using one of the DevOps tools.
The Benefit of DevOps for Storage
DevOps is having a large impact on the development and deployment of applications. Integrated teams work together to perform all of the necessary steps for development, testing, deployment and on-going maintenance of the application. This allows the teams to deploy applications faster or to increase the number of deployments they can perform. Tools such as Chef, Puppet, Ansible, or Salt can be used for quick and easy deployment of the infrastructure necessary for application development.
These same tools can be used for storage as well. The example of Harvard's Research Computing (RC) group illustrates what can be accomplished by applying DevOps to storage. This is just but one example; I'm sure there are others. Take a look at these tools and see if they can be used in your storage infrastructure. I think you will be pleasantly surprised.
Photo courtesy of Shutterstock.