Very often I refer to Salt as one of the most suitable tools for managing large scale deployment - either servers or network devices, but not limited to. While on a server one can install the regular Minion directly, and there are already many well known success stories in the industry, unfortunately, we don’t have the same flexibility on traditional network gear, as the network operating systems are generally extremely limited (to the end user) and closed.

For this reasoning, we have the Salt Proxy Minion, which is a process capable to run anywhere, as long as: 1. it is able to contact the Master, and 2. it can connect to the remote network device. One Proxy Minion per device. The theory sounds good so far.

The challenge

As mentioned above, and it is good to remember that for one device managed, you need to start up a Proxy Minion process. The reader will immediately notice that for 1000s or network devices you need to start and maintain 1000s of processes. And this not particularly trivial… when you do it manually. :-) But, as you’ll see below, with the right approach, it can be reduced very easily to a 10 liner Jinja and YAML with very little effort.

To fully understand the potential solutions I will present below, it is good to understand the Salt basics and be familiar with different concepts such as SLS, Pillar or State. Seth House and I have expanded on these topics (and others ) in our Network Automation at Scale book published at O’Reilly.

In the examples below, I will assume that you have a source of truth (either a database which you can use to ingest data into Salt via the External Pillar, or simply a SLS Pillar as plain file YAML / JSON / etc.). In other words, I will assume that under the devices (can be any other name, of course) Pillar key you have the data for the list of (network) devices to be managed.

Similarly, to fully understand the examples, make sure you have understood how the location of the files suggested relates to the file_roots option in the Master config. All the examples assume that you can have, at least, the directory /etc/salt/states listed as one of the file roots in the base environment, e.g.,

file_roots:
  base:
    - /etc/salt
    - /etc/salt/states

This option is important for the Salt fileserver to access the files, including the States I will be using as examples.

A Partial Solution

A very frequent approach is starting up all the processes on a single server. It is perhaps the least optimal among the solution I included in this post, but the easiest to understand and implement. It is just fine as long as you don’t have a very large amount of devices to manage.

A Proxy Minion in general consumes about 60MB of memory, though this can go sometimes beyond 80MB per process for SSH-based Proxies for which we didn’t find yet a multiprocessing solution (SSH-based Proxies are still limited to multithreading - the root cause goes somewhere into a library named Paramiko, and the subject is too broad to explore it right now). I guess is the price paid to receive the blessing of the event-driven automation, PCI compliance for free, a very flexible and powerful templating language and many others.

Anyway, either 60MB or 80MB of memory is not that much, and RAM is incredibly cheap nowdays. For example, if your network has 100 nodes, you only need 8GB of RAM, which - I assume any decent server should have. As a matter of fact, my old MacBook has 16GB RAM which could easily be used to automate a small to medium sized network, and I do know global networks much smaller than that; in other words, my personal computer would suffice to manage their entire infrastructure. :-)

Ingredients

The regular Minion to be installed and running on the server where the Proxy processes will be running.
(Derived from 1) Executing salt-call pillar.get devices on the regular Minion provides the list of devices to be managed. Again, this is up to you in what form, or from where to load this data (i.e., Pillar SLS file(s) or External Pillar, e.g., CSV file, SQL database, HTTP API etc.)
The service.running State function – check the documentation.

Leveraging the power of the SLS, the State is as simple as:

{%- for device in pillar.devices %}
Startup the Proxy for {{ device }}:
  service.running:
    - name: salt-proxy@{{ device }}
{%- endfor %}

Save this into an SLS file, say /etc/salt/states/start_proxies.sls, then all the Proxies can be started by simply executing salt-call state.apply start_proxies. This is the equivalent of you running N times systemctl start salt-proxy@<device_name> from the CLI of the server where N is the number of devices, and salt-proxy@<device_name> is the systemd service that manages the Proxy process for <device_name>.

That’s all. Both the command you execute and the content of the SLS will forever be the same regardless on how many devices you have: as data (which goes into the Pillar) is decoupled from the automation logic (into the State), you’ll only work with data from here on, i.e,, just update your source of truth.

A quick example

For completeness, I will also provide an example on how is the Pillar Top file structured, and the rest of the Pillars:

/etc/salt/pillar/top.sls

base:
  my-server:
    - all_my_network_devices
  netdev1:
    - netdev1_pillar
  netdev2:
    - netdev2_pillar

The regular Minion managing the server where the Proxy Minions will be running identified using the Minion ID my-server, should have (at minimum) the following Pillar data, as the list of network devices to be managed:

/etc/salt/pillar/all_my_network_devices.sls

devices:
  - netdev1
  - netdev2

Therefore, the my-server Minion will have the following data available:

$ salt 'my-server' pillar.get devices
my-server:
    - netdev1
    - netdev2

Executing salt-call state.apply start_proxies on the regular Minion, or from the Master, to invoke the start_proxies State earlier defined:

$ salt 'my-server' state.apply start_proxies
my-server:
----------
          ID: salt-proxy@netdev1
    Function: service.running
      Result: True
     Comment: Service salt-proxy@netdev1 has been enabled, and is running
     Started: 08:20:28.268712
    Duration: 791.067 ms
     Changes:
              ----------
              salt-proxy@netdev1:
                  True
----------
          ID: salt-proxy@netdev2
    Function: service.running
      Result: True
     Comment: Service salt-proxy@netdev2 has been enabled, and is running
     Started: 08:20:29.062585
    Duration: 781.164 ms
     Changes:
              ----------
              salt-proxy@netdev2:
                  True

Summary for my-server
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time:   1.572 s

Distributed Proxies

While running all the Proxies on a single server can work fine, it does have the obvious limitation that you can’t run 1000s of processes on the same box, for a variety of reasons: not enough memory, single point of failure, limited number of available ports, and so on.

The same approach as above can be used to distribute the Proxies on multiple servers (not just one), by either inserting an additional check into the start_proxies.sls State, or play with the Pillar data distributed to each Minion. This depends very much on your environmental constraints, the type of the network, density of devices, the type of connection, and many other angles. There is no general solution, it really depends on you to figure out what’s the most suitable way for your network(s) and how to distribute the Proxy processes on what machines and where.

I will however provide a couple of very quick examples. It pretty much boils down to what Minions (servers) you spread the Pillar data.

A static, probably easier to follow approach

Consider we have two servers, and we want to have the Proxies distributed as follows: netdev1, netdev2, and netdev3 running on server1, while netdev4 and netdev5 on server2. In that case, we could have the Pillar Top file below:

base:
  server1:
    - proxies_on_server1
  server2:
    - proxies_on_server2

While the Pillars are:

/etc/salt/pillar/proxies_on_server1.sls

devices:
  - netdev1
  - netdev2
  - netdev3

/etc/salt/pillar/proxies_on_server2.sls

devices:
  - netdev4
  - netdev5

The State wouldn’t change in this case, as the automation logic remains the same, only the data has changed, therefore only the target is different: salt -L server1,server2 state.apply start_proxies which would provide a similar output as previously.

A more dynamic example

While the above is sufficient to understand the concept, I wouldn’t say it the way to go. First of all, you might want a smarter way to have the Pillar available everywhere than a static Top file. Secondly, the Pillar file itself doesn’t scale when it’s static. A good solution is having a single Pillar SLS (surely, an External Pillar would work very well too), that is available to all the Minions. In that case, the distribution of the Pillar data is made inside that SLS file. I know I keep repeating myself, but it’s good to remember that SLS is not just plain YAML, but much, much more powerful that that. For example:

/etc/salt/pillar/top.sls

base:
  'server*':
    - proxies_on_servers

That configuration in the Pillar Top file instructs Salt to render the proxies_on_servers.sls file and make it available to any Minion whose ID starts with server.

Consider the following Pillar SLS file:

/etc/salt/pillar/proxies_on_servers.sls

{%- set proxy_count = 100 %}
{%- set server_id = opts.id.replace('server', '') | int %}
{#
    get the ID of the server from the Minion ID
    server_id is therefore an integer
#}
{%- set proxy_start = (server_id - 1) * proxy_count %}
devices:
{%- for proxy_id in range(1, proxy_count+1) %}
  - netdev{{ proxy_start + proxy_id }}
{%- endfor %}

The above is a Pillar, yes: it provides the data to feed the start_proxies State. It is very custom for the particular case I exemplified earlier, the ID of the server (the server_id variable) being extracted from the Minion ID (opts.id). The for loop constructs the list of devices, given an arbitrary maximum number of Proxy Minions we start per server, e.g., 100.

Using the logic above, the Pillar data for server1 becomes a list of devices from 1 to 100, while the Pillar data for server2 is the list of devices from 101 to 200, and so on:

# salt server1 pillar.get devices
server1:
     - netdev1
     - netdev2
     ... snip ...
     - netdev99
     - netdev100
#
# salt server2 pillar.get devices
server2:
    - netdev101
    - netdev101
    ... snip ...
    - netdev199
    - netdev200

As mentioned above, the State SLS doesn’t need to change, as the automation logic is the same, decoupled from data.

Containers

Similarly to when running the Proxy Minion as system services, when managing them as (Docker) containers we can have them centralised or distributed to run in multiple places. The implementation is very similar, as only the state is slightly changed (the Pillars can remain the same as previously):

/etc/salt/states/docker_proxies.sls

{%- for device in pillar.devices %}
Startup the Docker container for {{ device }}:
  docker_container.running:
    - name: {{ device }}
    - image: 'mirceaulinic/salt-proxy:2017.7.5'
    - binds:
      - /etc/salt/proxy:/etc/salt/proxy
    - environment:
      - PROXYID: {{ device }}
{%- endfor %}

This would suffice to start the Docker containers for the local or remote Proxy Minions, when you need to manage 1, 100s or 1000s of network devices. In this example I used the Docker image I mentioned in one of my previous posts, mirceaulinic/salt-proxy. As the image I built expects the PROXYID environment variable, the state configures it with the name of the network device (Proxy Minion ID). The file /etc/salt/proxy is mounted to the same path inside the container.

All of the above assume that the targeted servers have all the dependencies for the docker_container State module properly installed.

You can probably implement the same using other flavours of containers, e.g., LXD, or other virtualisation systems.

Kubernetes

In-line with the previous paragraphs, Simon Metzger wrote a very nice blog post explaining how to manage your network using Kuberetes to orchestrate Proxy Minions as Docker containers. Kubernetes is similarly built for large scale applications, and there are plenty of good resources and blog posts on how to achieve that.

Cloud

No servers? No problem! I heard of several large organisations that have their entire automation platform moved into the cloud. I am not going to discuss whether this is a good idea, as it can be very subjective, and sometimes may even be a great idea!

Salt has very good support to help you manage cloud hosts from a variety of providers, including: Amazon AWS, Digital Ocean, Linode, Azure, HP Cloud, Google Compute Engine, etc.

The configuration is very easy; to have the right background, you can start reading from https://docs.saltstack.com/en/latest/topics/cloud/.

In this particular case I will be using Linode as my provider of choice - check the Salt Cloud documentation for Linode.

I have added the following configuration:

/etc/salt/cloud

start_action: state.highstate

/etc/salt/cloud.providers.d/linode.conf

my-linode-config:
  apikey: <my-api-key>
  password: <my-password>
  driver: linode

/etc/salt/cloud.profiles.d/minion1.conf

linode_minion1:
  provider: my-linode-config
  size: Nanode 1GB
  image: Debian 9
  location: London, England, UK

I have picked the smallest possible linode (Nanode 1GB) that would suffice to manage several network devices.

Executing: salt-cloud -p linode_minion1 server1 would bring up the new VM. When executing the command from the Salt Master, the key of server1 will automatically be signed. To make sure that the Proxy Minions are started once the new VM is up and running, have start_action: state.highstate in the cloud config file (/etc/salt/cloud), and the start_proxies included in the State Top file:

/etc/salt/states/top.sls

base:
  'server*':
    - start_proxies

Having start_proxies.sls referenced in the State Top file means that the State defined earlier is part of the Highstate.

With this simple setup, you can start up and manage multiple network devices without having to worry about maintaining your own infrastructure. This can be easily scaled to hundreds / thousands VMs managed by the Salt Cloud by moving the Cloud Profiles (i.e., the configuration from /etc/salt/cloud.profiles.d/) into the Pillar of a Salt Minion (see https://docs.saltstack.com/en/latest/topics/cloud/config.html#pillar-configuration) and the cloud Execution Module which can be used to spin as many instances as needed.

Which one is the best

In this post I put together a few approaches I could think of, or heard from the community. This list of possibilities is open ended, I do not exclude that other engineers might have though about other (better) implementations. Which one is the best? It depends: this is multi-dimensional problem that doesn’t have one single good answer - you have to evaluate which suits the best your needs, and fulfils the requirements. If you have another approach and you are happy to share it with the community, please do write about it, or contact me to discuss and I would gladly add it here.

PS: I have started writing this blog post a few months ago, and in the meanwhile I think I finally found a way to manage your network infrastructure without even having Minions – with some limitations, yet with many goodies Salt has to offer, including, of course, the event-driven infrastructure. I will soon have a dedicated post exploring this possibility!

Network automation using Salt for large scale deployments