The Right Way to DevOps with Terraform and Ansible

Cloud Native applications have become a norm these days. Developers and Operations engineers are continuously finding ways to improve the quality and speed of deploying and maintaining these applications.

It started with folks writing bash scripts, migrating to python like scripting and then over to configuration management and infrastructure as code tools.

The philosophy behind these Infrastructure as Code and Configuration Management tools is to treat your infrastructure as programmable assets. That also means use of version control systems like Git. I am pretty sure there is no need to convince anyone today about the benefits of version control.

Amongst all the tools in this, Terraform and Ansible stand out the most. Let’s look at the reasons behind that.

Terraform is an Open Source Infrastructure as Code Orchestration tool. The key capabilities are:

Multi-Cloud It works with all the Major Cloud Providers like AWS, Azure and GCP. It also has support for other “providers”.
Client - Only Architecture It has a simple client only architecture that eliminates the need for maintaining/installing servers/agents.
Declarative Code It provides you with a high level config language that is declarative in nature. You can use to express the final desired state and terraform will figure out the most optimal way to deploy it.
Immutable Code Modifications to the template, then terraform will calculate the difference between the existing state of your infrastructure and the desired state and recreate only those resources.

Sample Terraform Code

resource "aws_instance" "webserver" {
  count = 5
  ami = "ami-063aa838bd7631e0b"
  instance_type = "t2.micro"
  subnet_id = "subnet-12345abc”
}

This Terraform code will deploy 5 instances of size t2.micro on AWS within subnet - “subnet-12345abc” (You will have to supply AWS secret key and Access key along with the region to deploy this ). Follow this to learn how to setup Terraform - https://www.terraform.io/intro/index.html

Ansible is an Open Source configuration and application deployment tool. The key capabilities are

Multi-Cloud It works with all the major cloud providers too.
Client-Only Architecture It doesn’t not require any agent or server unlike Chef/Puppet. It leverages SSH to make remote calls to the servers to configure them.
Procedural Code Ansible playbooks, which are essentially templates written in yaml and look like plain English. It provides the user with more control to describe the steps that ansible should take to get to desired state.
Idempotent Executing the template multiple times produces the same result.

Sample Ansible Code

- ec2:
      count: 5
      image: ami-063aa838bd7631e0b   
      instance_type: t2.micro
      vpc_subnet_id: subnet-12345abc

This Ansible code will also deploy 5 instances (ubuntu VMs) of size t2.micro on AWS. Follow this to learn how to setup Ansible - https://www.ansible.com/resources/get-started

At first glance both these tools look pretty similar.

They have

Multi-Cloud capabilities
Simple client-only architecture
Easy to write templates
Can execute commands on remote instances

a. Terraform leverages - remote-exec or in some cases local-exec - https://www.terraform.io/docs/provisioners/remote-exec.html

b. Ansible uses SSH to login into the instance and execute commands
Seamless integration with CI/CD tools

The right architecture is to leverage both Terraform and Ansible in your DevOps environment. Terraform should be used for provisioning of Cloud infrastructure and Ansible for Configuration management.

In the next section, we will look at the following

Benefits of Stateful Architecture and Declarative Code of Terraform in Cloud Infrastructure provisioning
Benefits of Procedural Code and conditional statements of Ansible in Application configuration

Reasons for this architecture

We will use some sample code and an app - fitcycle to understand it better.

In addition we will specifically run this against AWS.

You can find the Terraform and Ansible Templates on Github at - https://github.com/apperati/vcs-fitcycle-deployer

1. Stateful architecture of Terraform

Let’s look at structure of organization for Terraform Code. It consists of the following files.

terraform.tf - This file contains the template that describes all the resources that need to be deployed. This is agnostic to the cloud account and region.
provider.tf - This file contains the the default Cloud Specific “ACCESS and SECURITY” keys. You may also provide an alias name for the account and the region.
variables.tf - This file describes all the variables that are used within the terraform.tf file. For ex: AMI IDs, SSH keys, Tags, etc.
terraform.tfvars - This file is used to attach values to the variables described in variables.tf. If the values are not provided here, then Terraform will prompt them at the time of execution.
terraform.tfstate - This file contains the state of your cloud infrastructure after deployment. This file is referenced when modifying/destroying the resources

Now, refering to the sample code from before (which would be part of terraform.tf). After the first run of these two templates, they will both deploy 5 instances.

Sample Terraform Code

resource "aws_instance" "webserver" {
 count = 5
 ami = "ami-063aa838bd7631e0b"
 instance_type = "t2.micro"
 subnet_id = "subnet-12345abc”
}

Sample Ansible Code

- ec2:
      count: 5
      image: ami-063aa838bd7631e0b   
      instance_type: t2.micro
      vpc_subnet_id: subnet-12345abc

But let’s say the instance count has to be modified to 10.

Sample Terraform Code with Count = 10

resource "aws_instance" "webserver" {
  count = 10
  ami = "ami-063aa838bd7631e0b"
  instance_type = "t2.micro"
  subnet_id = "subnet-12345abc”
}

Sample Ansible Code with Count = 10

- ec2:
      count: 10
      image: ami-063aa838bd7631e0b   
      instance_type: t2.micro
      vpc_subnet_id: subnet-12345abc

During this run,

Terraform will compare the existing state of the cloud infrastructure within ".tfstate” file with the intent specified in the ".tf” file (template) and takes a diff. It then deploys only those resources which are needed. In this case, it will deploy 5 additional instances to make a total of 10

Where as in case of Ansible, it will deploy 10 additional instances, to make it a total of 15. This is because, Ansible doesn’t maintain any state.

This would result in maintaining another template to track instances in case you want to destroy/delete them. This might also result in confusion as to what the count should be, 10 or 15 ? In case of Terraform, it can be done with a single template.

Ansible template can be modified to achieve a similar effect like Terraform by using “exact_count” and “count_tag” as shown below.

- ec2:
      exact_count: 10
      count_tag: 
            Name: APP
      image: ami-063aa838bd7631e0b
      instance_type: t2.micro
      vpc_subnet_id: subnet-12345abc

With this template, Ansible will count the number of instances that are tagged with “Name = App”. When executing this template, ansible will take a diff between “exact_count” and count of tags and either add/delete instances based on the value.

But this is not ideal experience as compared to Terraform when it comes to infra deployment.

Summarize:

Terraform provides a simple way to deploy and manage state without extra “bits” for AWS
Ansible needs more commands to “mimic” terraform for AWS

2. Declarative Code paradigm

AWS/Azure/GCP release new services at rapid pace and keeping up with the dependencies between these services can get overwhelming.

This is where a declarative code paradigm of Terraform has advantages. The resources / services can be described any sequence within the template and terraform will figure out the most optimal way to deploy it. Below is an example from vcs-fitcycle-deployer

declarative_code

As you can see above the “aws_vpc” resource can be placed after the subnet resource (or anywhere in the template for that matter) and the template would still produce the same result. This is possible because at every execution, terraform builds a graph of these resources and schedules their deployment based on dependencies.

Where as the same wouldn’t be possible with Ansible because of it’s procedural nature. The onus would be on the user to describe the sequence in which the resources should be deployed to reach the desired state. This also demands knowledge of dependencies across services. Change in sequence may result in completely different results or worse, it may not deploy at all.

Let’s say your team wants to collaborate on a single template and someone needs to add a new instance/resource definition to it. All they have to do with Terraform is that declare it at the end of the template with the reference to the VPC to deploy it. Simple !!

But this is not true with procedural code in Ansible.

Hence, Terraform again is better suited for Infra provisioning.

3. Procedural Code Paradigm

Once this infra is up and running, the next phase it to run your application within those instances. That may involve various stages.

Below is high level architecture for fitcycle app.

The steps to take in this scenario would be to configure in the following order:

Databases
Database LoadBalancer (HAProxy)
Application Services in HA configuration (Django and Flask in our case)
Web Server with Nginx

Ansible is perfectly suited for this scenario because of it’s procedural nature. It can install packages or copy new config files in the sequence mentioned above.

Let’s quickly glance at structure of organization for Ansible.

ansible.cfg - This file contains the configuration needed for ansible. you may describe the plugins that will be used along with SSH mechanism etc.
playbook.yaml - This is the main file which is executed by ansible. It contains all the tasks or roles that need to be executed.
inventory/ - This directory contains all the yaml file(s) that describe the filters to dynamically fetch the resources from your cloud account. Dynamic inventory plugin from Ansible are resource specific, for example: aws_ec2 , aws_rds etc allow to gather list of all resources that match the filters like VPC, region, cloud tags etc.
roles/ - This directory contains all the roles. Roles allow playbook.yaml to be broken down into modular files that makes it easier to manage and compose. It consists of tasks, templates, variables etc.

You might be wondering how will Ansible know which instances to install the packages on ? It’s easy !! Ansible also works with all major cloud providers. You can set tags on your instances and point Ansible to take specific action(s) on those instances

In the example below, Ansible is looking for instances that are tagged with “Tier = {SOME_VALUE}" that is described in inventory file.

For example:

Within the playbook, configure_fitcycle.yml, file we can see that the tags are described as “DBLB”, “APP” etc. This allows Ansible to target for those instances which have matching tag and then execute a specific “role”.

In our example, it’s looking for “DBLB” role. Then it leverages JINJA templates to build a new config file that uses PRIVATE_IP_ADDRESS of instances tagged as “DB”.

Apart from this, Ansible also conditional statements like “when”, which provides more control (https://docs.ansible.com/ansible/latest/user_guide/playbooks_conditionals.html#applying-when-to-roles-imports-and-includes)

- hosts: "APP"
  roles:
     - role: debian_app_config
       when: ansible_facts['os_family'] == 'Debian'

Here the config is run if the os type is Debian. You can leverage this to do quite complex actions. That’s why Ansible if suited for Configuration management.

Terraform has “remote-exec” which can execute commands on instances once they are running but it’s as powerful as Ansible.

4. Day 2 Operations

Ansible is very crucial in Day 2 operations. You can install patch or upgrade packages in place without affecting the infrastructure. It’s based on concepts similar to point #3.

With Terraform, it would result in re-creation of all the instances where these packages need to run. This might result in disruption to your services.

5. Quick Dev/Test

Lets say you have an infra that contains , 20 instances, RDS Databases in HA, S3 bucket, load balancers. While testing application on this infra, ideally the developer would want to push the code to git and see the change taking place. If this was done with Terraform, then it would result in deletion of all these resources and then recreated. This can result in loss of developer productivity as this infra setup can take quite some time.

In this scenario, Ansible comes to the rescue. It can pull the code from git and install on the required instances, running tests.

Another reason to use Ansible for Configuration Management.

Obviously, at the end of the day, the dev would still need to do end to end testing / deployment before pushing to production.

Conclusion

We can say that Terraform is perfect for Cloud Infrastructure Management and Ansible is perfect for Configuration Management.

You can follow the README on https://github.com/apperati/vcs-fitcycle-deployer to deploy your own infra.

You can also watch the YouTube video - The Right Way to DevOps with Terraform and Ansible - https://www.youtube.com/watch?v=AsPIKWF1y_M

DID YOU KNOW ?

Swagger Stats is a free and open source API telemetry tool - http://swaggerstats.io/ that easily integrates with your NodeJS application and works with Grafana, Elasticsearch, Kibana and Prometheus. Check it out !!

“Looking for an easier way to expedite infrastructure and app deployment in Hybrid Environments ? - Try out VMware CAS - https://cloud.vmware.com/cloud-assembly"