Imagine you are a seasoned operations engineer (you know, neck beard and all). Over your career you have most certainly developed a toolkit of scripts that you can use, with minor changes, to perform all your regular tasks of provisioning and managing the plethora of environments you have seen and dealt with over the years. When it comes of configurations, you know all the admin consoles you deal with like the back of your hand. You can log in and make the exact tweaks to application server configs that is just what is needed to address the issues you are facing. For Database related stuff, you know exactly who to call and that DBA has mastered his end of the deal as well as you have yours. You have things down to a routine. You know exactly when the next application release is due. You know when to expect the next update to the OS. You are the master of your domain.
But as systems have become virtualized and as developers have started practicing Continuous Integration (CI), things have started to change. The number of environment and their instances you have to deal with has gone up by several orders of magnitudes. Developers now don’t release updates and new versions every few months; they are pumping out CI builds daily. In fact, multiple builds a day. All of them need to be tested and validated. That requires new environment instances to be spun up, fast. These builds also often come with configuration changes. Logging into consoles to make each one of these changes individually is no longer a viable option. Furthermore, the need for speed is critical. Developers’ builds are creating a backlog (pun intended), as the environments to even just test them on are not available as needed. We have a problem.
Let me start by introducing two concepts:
1. Cycle time
Cycle time is defined as the average time taken from the time a new requirement is approved, a change request is requested or a bug that needs to be fixed via a patch is identified, to the time it is delivered to production. Agile organizations want this time to me the bare bones minimum. It is what limits their ability to release new features and fixes to customers. Organizations like Etsy have cycle-time down to minutes! While this is not possible for enterprise applications, the current cycle time of weeks or sometime even months is absolutely unacceptable.
2. Versioning Environments
The need to maintain multiple configurations and patch levels of environments that are now needed by development, on demand, requires Ops to change how they handle change and maintain these environments. Any change operations makes to an environment – whether it is applying a patch or making a configuration change, should be viewed as creating a new ‘version’ of the environment, not just tweaking a config setting via a console. The only way this can be managed properly is by applying all changes via scripts. These scripts, when executed, would create a new version of the environment they are executed on. This process streamlines and simplifies change management, allowing it to scale, while keeping Ops best practices (ITIL or otherwise) intact.
Infrastructure as Code
The solution to addressing both these needs – minimizing cycle time and versioning environments can be addressed by capturing and managing your Infrastructure as code. Spinning up a new virtual environment or a new version of the environment then becomes a matter of executing a script that can create and provision an image or set of images – all the way from OS to the complete application stack installed and configured. Hours can become minutes.
Versioning these scripts, like one would version code, in an SCM system, allows for proper configuration management. Creating a new version of an environment now becomes checking out the right script(s) and making the necessary changes to the scripts – to patch the OS, change an App Server setting or installing a new version of the application, and then checking the scripts back in as a new version of the environment.
Two kinds of automation frameworks have come up in the recent past to allow for managing Infrastructure as Code.
- The first kind of frameworks is one that is Application centric. These are usually capable of managing as code Application Servers and the applications running on them. For example, IBMs Rational Automation Framework (RAF) falls into this space. It supports app servers and technologies like WebSphere, WebLogic, JBoss, MQ, etc. Such frameworks are specialized with pre-cooked libraries for all the typical automation tasks for the technologies they support. They cannot perform low-level tasks like install an OS, but can fully automate App server and application level tasks.
- The second kind of automation framework is the generic kind. They are not specialized for any technology and can be scripted to perform any task, all the way from installing an OS on (virtual) bare metal to patching an application. They require much more work upfront, but can handle any kind of task. The two main players in this space are Chef and Puppet. More on them on a future post.
Business at the speed of DevOps
Infrastructure as Code hence has become the cornerstone to allow for the speed that DevOps demands and the management of multiple versions of multiple environments, to handle the CI builds being spun out by development. Without it, Ops becomes what puts the water (or fall) in water-SCRUM-fall.
- Understanding DevOps – Part 1: Defining DevOps
- Understanding DevOps – Part 2: Continuous Integration and Continuous Delivery
- Understanding DevOps – Part 3: The Battle of Dev vs Ops
- Understanding DevOps – Part 4: Continuous Testing and Continuous Monitoring
- Adopting DevOps – Part 1: Begin with the Why
- Adopting DevOps – Part II: The Need for Organizational Change