All articles

Ansible and using a Bastion server to AWS EC2 instances

Decorative graphic - dark pink background with Tech Talk hashtag text in bold textured dark grey with offset white outline

Andrew, our DevOps engineering lead, talks about his experiences with Ansible and using a Bastion server to AWS EC2 instances, when working on our infrastructure development.  This post is one of our #TechTalk focused posts, that we’ll be focusing on different areas of our work in the coming months.

Part One: introduction

As I’ve often said  “security is a pain”. But alas, the world is a hateful place full of bad people wanting to steal your stuff. Actually it’s not. But for the time being let’s assume it is. Let’s also assume that you haven’t yet moved all your applications to the sunny uplands that are server-less or managed Kubernetes clusters. Rather you have a series of EC2 instances that you manage by Ansible.

You could access each host directly, connecting to each via ssh from your desktop. But this has three main disadvantages.

1. Managing DNS/IP: Somewhere, someone has to maintain a list of these EC2 hosts and their addresses. Sure you can have some system that puts them into Route 53’s DNS for you but then it leads to the second problem.

2. Elastic IP: IP addresses that stay static and are accessible over the internet are a resource that cost money and, sooner or later, expect to run out of them.

3. Security: Trying to secure and audit a series of hosts can be tiresome.

A far better approach is to secure one host and have this act as a gateway to the others restricting access to your hosts to this one host. This host is your bastion and only it requires an elastic IP address, a single DNS record in Route53 and external ssh access. (You should even whitelist access if you are able).

You may be thinking that this bastion server is just a stepping stone into your collection of EC2 instances, whereby you open a shell onto the bastion host and then ssh again to your final destination. A “jump-box” if you will in an old parlance. While this is still possible it misses the point.

The bastion server should be the minimum installation possible. Running nothing more than ssh, if it can be helped, to reduce its security exposure. Further this two-step approach is no good if we want to connect to our EC2 instance from Ansible (or heaven help us Jenkins).Consider the following ssh/config file:

Host ec2-* !ec2-bastion
   ProxyJump ec2-bastion
Host ec2-*
   User ec2-user
   StrictHostKeyChecking no
   Hostname %h.example.net
   UserKnownHostsFile /dev/null
   IdentityFile ~/.ssh/ec2.pem

This does the things we might expect for accessing our EC2 hosts, defines the private key to use, adds the domain name for us etc, but for all hosts except our bastion it proxys through this host to the desired location using the ProxyJump setting. It does the second ssh step on the bastion host for us.

To Ansible, or anything else connecting via ssh, the connection is essentially transparent.

There is however one thing missing. From the bastion server the destination host is unknown. The DNS problem remains. But we’ll come back to that, assume for now that it’s working as intended.

Part Two: Ansible and Amazon.aws inventory plugin

You may be familiar with Ansible’s inventory file that describes the hosts Ansible is going to manage. This was usually a simple ini or yaml file. It was also possible to use an executable file and the inventory was created from the standard output from this file. Great for dynamic inventories and Amazon produced such a file in ec2.py.

Enter Ansible Core and ec2.py was replaced with the Amazon.aws inventory plugin. This is great, but documentation is sketchy. The inventory it produces for Ansible depends on the settings you feed into the plugin.

Consider this aws_ec2.yml file:

 1 plugin: amazon.aws.aws_ec2
 2 use_extra_vars: yes
 3 filters:
 4   tag:Ansible:
 5   - "True"
 6 keyed_groups:
 7   - key: tags.Type
 8     separator: ''
 9   - key: tags.Env
10     separator: ''
11 hostnames:
12   - tag:Name

1. The file must end aws_ec2.y(a)ml

2. Specify this file as your inventory in ansible.cfg. It’ll make life easier.

3. All my ansible managed ec2 instances are tagged Ansible:True. The filter on lines 3–5 means all other (read legacy) hosts are ignored. This, if nothing else, is worth moving from ec2.py even if you don’t have to.

4. All hosts are tagged with their Type (web, db, etc) and Env (dev, uat, live etc). Lines 6–10 create ansible groups for each of these.

5. Finally the host is referenced by its Name tag.

Here is a simple output from ansible-inventory:

ansible-inventory --graph
@all:
 |--@aws_ec2:
 |  |--ec2-bastion
 |  |--ec2-dev-db
 |  |--ec2-dev-web
 |  |--ec2-monitor
 |--@bastion:
 |  |--ec2-bastion
 |--@db:
 |  |--ec2-dev-db
 |--@dev:
 |  |--ec2-dev-db
 |  |--ec2-dev-web
 |--@infra:
 |  |--ec2-bastion
 |  |--ec2-monitor
 |--@monitor:
 |  |--ec2-monitor
 |--@web:
 |  |--ec2-dev-web
 |--@ungrouped

Part Three: that problem from earlier

So let’s get back to our bastion problem. Currently we have a ssh configuration that expects us to be able to ssh to any of our hosts using ec2-bastion as a proxy. But if we try to ssh to ec2-monitor for example our ssh session times out in the bastion as the session doesn’t know how to get to ec2-monitor from ec2-bastion (I’m assuming here that ec2-monitor’s security group allows ssh from ec2-bastion inside their mutual vpc).

In short, we want to tell bastion about all its new friends.

For this we install a host file on bastion with the following fragment:

{% for host in groups['aws_ec2'] %}
{{hostvars[host].private_ip_address}} {{host}} {{host}}.{{domain}}
{% endfor %}

This template creates the lookup for the EC2 hostnames (and hostname.domain for completeness) using the internal ip address of the instances managed by Ansible.

I leave it as an exercise for the reader to construct a playbook that deploys this template fragment into the bastion’s /etc/hosts file.

There you have it:

  • Ansible access to each EC2 instance using transparent ssh proxy via a bastion server.
  • Only one Elastic IP required.
  • Reduced maintenance of DNS records
  • Reduced security surface area.

I hope you find this useful.

Andrew also posts on his Medium account – the original version of this post can be found there