Using Ansible with a jump host

For an OpenStack project using Ansible, I recently had to figure out how to make Ansible work with a jump host. After an initial phase of total confusion, I finally found my way through the documentation and various sources and ended up with several working configurations. This post documents what I have learned on that journey to hopefully make your life a bit easier.

Setup

In my previous posts on Ansible, I have used a rather artificial setup – a set of hosts which all expose the SSH port on the network so that we can connect directly. In real world, hosts are often hidden behind firewalls, and a pattern that you will see frequently is that only one host in a network can directly be reached via SSH – called a jump host or a bastion host – and you need to SSH into all the other hosts from there.

To be able to experiment with this situation, let us first create a lab environment which simulates this setup on Google’s cloud plattform (but any other cloud platform that has a concept of a VPC should do as well).

First, we need a project in which our resources will live. For this lab, create a new project called terraform-project with a project ID like terraform-project-12345 (of course, you will not be able to use the exact same project ID as I did, as project IDs are supposed to be unique), for instance from the Google Cloud console under “Manage Resources” in the IAM & Admin tab.

Next, create a service account for this project and assign the role “Compute Admin” to this account (which is definitely not the most secure setup and clearly not advisable for a production setup). Create a key for this service account, download the key in JSON format and store it as ~/gcp_terraform_service_account.json

In addition, you will need a private / public SSH key pair. You can reuse an existing key or create a new one using

ssh-keygen -P "" -b 2048 -t rsa -f ~/.ssh/gcp-default-key

Now we are ready to download and run the Terraform script. To do this, open a terminal on your local PC and enter

git clone https://github.com/christianb93/ansible-samples
cd ansible-samples/jumphost
terraform init
terraform apply -auto-approve

When opening the Google Cloud Console after the script has completed, you should be able to verify that two virtual networks with two machines on them have been created, with a topology as summarized by the following diagram.

SSHJumpHostLabSetup

So we see that there is a target host which is connected to a private network only, and a jump host which has a public IP address and is attached to a public network.

One more hint: when playing with SSH, keep in mind that on the Ubuntu images used by GCE, sshguard is installed by default which will monitor the SSH log files and, if something that looks like an attach is identified, insert a firewall rule into the filter table which blocks all incoming traffic (including ICMP)from the machine from which the suspicious SSH connections came. As playing around with some SSH features might trigger an alert, the Terraform setup script will therefore remove sshguard from the machines upon startup (though there would of course be smarter ways to deal with that, for instance by adding our own IP to the sshguard whitelist).

The SSH ProxyCommand feature

Before talking about SSH and jump hosts, we first have to understand some features of SSH (and when I say SSH here and in the following, I mean OpenSSH) that are relevant for such a configuration. Let us start with the ProxyCommand feature.

In an ordinary setup, SSH will connect to an SSH server via TCP, i.e. it will establish a TCP connection to port 22 of the server and will start to run the SSH protocol over this connection. You can, however, tell SSH to operate differently. In fact, SSH can spawn a process and write to STDIN of that process instead of writing to a TCP connection, and similarly read from STDOUT of this process. Thus we replace the TCP connection as communication channel by an indirect communication via this proxy process. In general, it is assumed that the proxy process in turn will talk to an SSH server in the background. The ProxyCommand flag tells SSH to use a proxy process to communicate with a server instead of a TCP connection and also how to start this process. Here is a diagram displaying the ordinary connection method (1) compared to the use of a proxy process (2).

SSHProxyCommand

To see this feature in action, let us play a bit with this and netcat. Netcat is an extremely useful tool which can establish a connection to a socket or listen on a socket, send its own input to this socket and print out whatever it sees on the socket.

Let us now open a terminal and run the command

nc -l 1234

which will ask netcat to listen for incoming connections on port 1234. In a second terminal windows, run

ssh -vvv \
    -o "ProxyCommand nc localhost 1234" \
    test@172.0.0.1

Here the IP address at the end can be any IP address (in fact, SSH will not even try to establish a connection to this IP as it uses the proxy process to communicate with the apparent server). The flag -vvv has nothing to do with the proxy, but just produces some more output to better see what is going on. Finally, the ProxyCommand flag will specify to use nc localhost 1234 as proxy process, i.e. an instance of netcat connecting to port 1234 (and thus to the instance of netcat in our second terminal).

When you run this, you should see a string similar to

SSH-2.0-OpenSSH_7.6p1 Ubuntu-4

on the screen in the second terminal, and in the first terminal, SSH will seem to wait for something. Copy this string and insert it again in the second terminal (in which netcat is running) below this output. At this point, some additional output should appear, starting with some binary garbage and then printing a few strings that seem to be key types.

This is confusing, but actually expected – let us try to understand why. When we start the SSH client, it will first launch the proxy process, i.e. an instance of netcat – let us call this nc1. This netcat instance receives 1234 as the only parameter, so it will happily try to establish a connection to port 1234. As our second netcat instance – which we call nc2 – is listening on this port, it will connect to nc2.

From the clients point of view, the channel to the SSH server is now established, and the protocol version exchange as described in section 4.2 of RFC4253 starts. Thus the client sends a version string – the string you see appearing first in the second terminal – to its communication channel. In our case, this is STDIN of the proxy process nc1, which takes that string and sends it to nc2, which in turn prints it on the screen.

The SSH client is now waiting for the servers version string as the response. When we copied that string to the second terminal, we provided it as input to nc2, which in turn did send it to nc1 where it was printed on STDOUT of nc1. This data is seen as coming across the communication channel by the SSH client, i.e. from our fictitious SSH server. The client is happy and continues with the next phase of the protocol – the key exchange (KEX) phase described in section 7 of the RFC. Thus the client sends a list of key types that it supports, in the packet format specified in section 6, and this is the garbage followed by some strings that we see. Nice…

STDIO forwarding with SSH

Let us now continue to study a second feature of OpenSSH called standard input and output tunneling which is activated with the -W switch.

The man page is a bit short at this point, stating that this switch requests that standard input and output on the client be forwarded to host on port over the secure channel. Let us try to make this a bit clearer.

First, when you start the SSH client, it will, like any interactive process, connect its STDOUT and STDIN file descriptors to the terminal. When you use the option -W, no command will be executed on the SSH server, but instead the connection will remain open and SSH will establish a connection from the SSH server to a remote host and port that you specify along with the -W switch. Then any input that you provide to the SSH client will travel across the SSH connection and be fed into that connection, whereas anything that is received from this connection from the remote host is sent back via the SSH connection to the client – a bit like a remote version of netcat.

SSHSTDIOTunnel

Again, let us try this out. To do this, we need some host and port combination that we can use with -W which will provide some meaningful output. I have decided to use httpbin.org which, among other things, will give you back your own IP address. Let us first try this locally. We will use the shell’s built-in printf statement to prepare a HTTP GET request and feed that into netcat which will connect to httpbin.org, send our request and read the response.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | nc httpbin.org 80
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:20:20 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 45
Connection: Close

{
  "origin": "46.183.103.8, 46.183.103.8"
}

The last part is the actual body data returned with the response, which is our own IP address in JSON format. Now replace our local netcat with its remote version implemented via the SSH -W flag. If you have followed the setup described above, you will have provisioned a remote host in the cloud which we can use as SSH target, and a user called vagrant on that machine. Here is our example code.

$ printf "GET /ip HTTP/1.0\r\n\r\n" | ssh -i ~/.ssh/gcp-default-key -W httpbin.org:80 vagrant@34.89.221.226
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Tue, 17 Dec 2019 18:22:17 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 47
Connection: Close

{
  "origin": "34.89.221.226, 34.89.221.226"
}

Of course, you will have to replace 34.89.221.226 with the public IP address of your cloud instance, and ~/.ssh/gcp-default-key with your private SSH key for this host. We see that this time, the IP address of the host is displayed. What happens is that SSH makes a connection to this host, and the SSH server on the host in turn reaches out to httpbin.org, opens a connection on port 80, sends the string received via the SSH client’s STDIN to the server, gets the response back and sends it back over the SSH connection to the client where it is finally displayed.

TCP/IP tunneling with SSH

Instead of tunneling stdinput / stdoutput through an SSH connection, SSH can also tunnel a TCP/IP connection using the flag -L. In this mode, you specify a local port, a remote host (reachable from the SSH server) and a remote port. The SSH daemon on the server will then establish a connection to the remote host and remote port, and the SSH client will listen on the local port. If a connection is made to the local port, the connection will be forwarded through the SSH tunnel to the remote host.

SSHTunnelTCP

There is a very similar switch -R which establishes the same mechanism, but with the role of client and server exchanged. Thus the client will connect to the specified target host and port, and the server will listen on a local port on the server. Incoming connections to this port will then be forwarded via the tunnel to the connection held by the client.

Putting it all together – the five methods to use SSH jump hosts

We now have all the tools in our hands to use jump hosts with SSH. It turns out that these tools can be combined in five different ways to achieve our goal (and there might even be more, if you are only creative enough). Let us go through them one by one.

Method 1

We could, of course, simply place the private SSH key for the target host on the jump host and then use ssh to run ssh on the jump host.

scp -i ~/.ssh/gcp-default-key \
    ~/.ssh/gcp-default-key \
    vagrant@34.89.221.226:/home/vagrant/key-on-jump-host
ssh -t -i ~/.ssh/gcp-default-key \
  vagrant@34.89.221.226 \
    ssh -i key-on-jump-host vagrant@192.168.178.3

You will have to adjust the IP addresses to your setup – 34.89.221.226 is the public IP address of the jump host, and 192.168.178.3 is the IP address under which our target host is reachable from the jump host. Also note the -t flag which is required to make the inner SSH process feel that it is connected to a terminal.

This simple approach works, but has the major disadvantage that is forces you to store the private key on the jump host. This makes your jump host a single point of failure in you security architecture, which is not a good thing as this host is typically exposed at a network edge – not a good idea. In addition, this can quickly undermine any serious attempt to establish a central key management in an organisation. So we are looking for methods that will allow you to keep the private key on the local host.

Method two

To achieve this, there is a method which seems to be the “traditional” approach that can find in most tutorials and that uses a combination of netcat and the ProxyCommand flag. Here is the command that we use.

ssh -i ~/.ssh/gcp-default-key \
   -o "ProxyCommand \
     ssh -i ~/.ssh/gcp-default-key vagrant@34.89.221.226\
     nc %h 22" \
   vagrant@192.168.178.3 

Again, you will have to adjust the IP addresses in this example as explained above. When you run this, you should be greeted by a shell prompt on the target host – and we can now understand why this works. SSH will first run the proxy command on the client machine, which in turn will invoke another “inner” SSH client establishing a session to the jump host. In this session, netcat will be started on the jump host and connect to the target host.

We have now established a direct channel from standard input / output of the second SSH client – the proxy process – to port 22 of the target host. Using this channel, the first “outer” SSH client can now proceed, negotiate versions, exchange keys and establish the actual SSH session to the target host.

SSHJumpHostViaNetcat.png

It is interesting to use ps on the client and the jump host and netstat on the jump host to verify this diagram. On the client, you will see two SSH processes, one with the full command line (the outer client) and a second one, spawned by the first one, representing the proxy command. On the jump host, you will see the netcat process that the SSH daemon sshd has spawned, and the TCP connection to the SSH daemon on the target host established by the netcat process.

There is one more mechanism being used here – the symbol %h in the proxy command is an example for what the man page calls a token – a placeholder which is replaced by SSH at runtime. Here, %h is replaced by the host name to which we connect (in the outer SSH command!), i.e. the name or IP of the target host. Instead of hardcoding the port number 22, we could also use the %p token for the port number.

Method three

This approach still has the disadvantage that it requires the extra netcat process on the jump host. In order to avoid this, we can use the same approach using a stdin / stdout tunnel instead of netcat.

ssh -i ~/.ssh/gcp-default-key \
  -o "ProxyCommand \
    ssh -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"\
  vagrant@192.168.178.3 

When you run this and then inspect processes and connections on the jump host, you will find that the netcat process has gone, and the connection from the jump host to the target is initiated by a (child process of) the SSH daemon running on the jump host.

SSHJumpHostViaStdioTunnel

This method is described in many sources and also in the Ansible FAQs.

Method four

Now let us turn to the fourth method that is available in recent versions of OpenSSH – a new ProxyJump directive that can be used in the SSH configuration. This approach is very convention when working with SSH configuration files, so let us take a closer look at it. Let us create a SSH configuration file (typically this is ~/.ssh/config) with the following content:

Host jump-host
  HostName 34.89.221.226
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 

Host target-host
  HostName 192.168.178.3
  IdentityFile ~/.ssh/gcp-default-key
  User vagrant 
  ProxyJump jump-host

To test this configuration, simply run

ssh target-host

and you should directly be taken to a prompt on the target host. What actually happens here is that SSH looks up the configuration for the host that you specify on the command line – target-host – in the configuration file. There, it will find the ProxyJump directive, referring to the host jump-host. SSH will follow that reference, retrieve the configuration from this host from the same file and use it to establish the connection.

It is instructive to run ps axf in a second terminal on the client after establishing the connection. The output of this command on my machine contains the following two lines.

  49 tty1     S      0:00  \_ ssh target-host
  50 tty1     S      0:00      \_ ssh -W [192.168.178.3]:22 jump-host

So what happens behind the scenes is that SSH will simply start a a second session to open a stdin/stdout tunnel, as we have done it manually before. Thus the ProxyJump option is nothing but a shortcut for what we have done previously.

The equivalent of the ProxyJump directive in the configuration file is the switch -J on the command line. Using this switch directly without a configuration file does, however, have the disadvantage that it is not possible to specify the SSH key to be used for connecting to the jump host. If you need this, you will have to use the -W option discussed above which will provide the same result.

Method five

Finally, there is another method that we could use – TCP/IP tunneling. On the client, we start a first SSH session that will open a local port with port number 8222 and establish a connection to port 22 of the target host.

ssh -i ~/.ssh/gcp-default-key \
  -L 8222:192.168.178.3:22 \
  vagrant@34.89.221.226

Then, in a second terminal on the client, we use this port as the target for an SSH connection. The connection request will then to through the tunnel and we will actually establish a connection to the target host, not to the local host.

ssh -i ~/.ssh/gcp-default-key \
    -p 8222 \
    -o "HostKeyAlias 192.168.178.3" \
    vagrant@127.0.0.1

Why do we need the additional option HostKeyAlias here? Without this option, SSH will take the target host specified on the command line, i.e. 127.0.0.1, and use this host name to look up the host key in the database of known host keys. However, the host key it actually receives during the attempt to establish a connection is the host key of the target host. Therefore, the keys will not match, and SSH will complain that the host key is not known. The HostKeyAlias 192.168.178.3 instructs SSH to use 192.168.178.3 as the host name for the lookup, and SSH will find the correct key.

Ansible configuration with jump hosts

Let us now discuss the configuration needed in Ansible to make this work. As explained in the Ansible FAQs, Ansible has a configuration parameter ansible_ssh_common_args that can be used to define additional parameters to be added to the SSH command used to connect to a host. In our case, we could set this variable as follows.

ansible_ssh_common_args:  '-o "ProxyCommand ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/gcp-default-key -W %h:%p vagrant@34.89.221.226"'

Here the first few options are used to avoid issues with unknown or changed SSH keys (you can also set them here for the outer SSH connection if this is not yet done in your ansible.cfg file). There are several options that you have to set this variable. As all variables in Ansible, this is a per-host variable which we could set in the inventory or (as the documentation suggests) in a group_vars folder on the file system.

In my repository, I have created an example that uses the technique explained in an earlier post to build an inventory dynamically by capturing Terraform output. In this inventory, we set the ansible_ssh_common_args variable as above to be able to reach our target host via the jump host. To run the example, follow the initial configuration steps as explained above and then do

git clone https://github.com/christianb93/ansible-samples/
cd ansible-samples/jumphost
terraform init
ansible-playbook site.yaml

This playbook will not do anything except running Terraform (which will create the environment if you have not done this yet), capturing the output, building the inventory and connecting once to each host to verify that they can be reached.

There are many more tunneling features offered by SSH which I have not touched upon in this post – X-forwarding, for instance, or device tunneling which creates tun and tap devices on the local and the remote machine and therefore allows us to easily build a simple VPN solution. You might want to play with the SSH man pages (and your favorite search engine) to find out more on those features. Enjoy!

1 Comment

Leave a Comment