Vault was the the first component of Hashi@home, the keys to the castle if you will.
As such, I configured it as simply as possible, just to get something up and running.
It is a testament to the quality of the software that the simple file-backed single instance worked for months without a hitch.
Once I had become comfortable with operating Vault and started using it to store actual secrets in it, murmurs of anxiety began to creep up on me.
After all, the SD card I’m using to run the raspberry pi on isn’t the most reliable bit of hardware out there!
And so it was that I began to explore the various Vault storage backends
Options
In the spirit of Hashi@Home, I wanted to select the option which reduced the tool surface area.
Of the options available at the time (mid-late 2020, or Vault 1.4.x), the only ones which seemed valid were Consul and S3.
The others either required forking out some cash, or introduced an unacceptable technology overhead.
Between the two options, it seemed clear to me that Consul was the better option, since it was already running locally, an integrated part of the Hashi stack and of course free of cost.
The consul backend was duly enabled and voilà Vault was now storing secrets in the Consul KV store.
Integrated storage
Starting in version 1.4, Vault supports integrated storage using the Raft protocol.
This represented a significant improvement from my point of view, because it removes the chicken-and-egg problem of which service to start first, Consul or Vault.
Vault could be self-consistently deployed by itself in high-availability mode, across several of the computatoms.
Indeed, this seemed like the perfect use case for the several pi zeros I had, since there is no Nomad client support for them, so what better use for them than to act as the voters in the Vault quorum?
Apart from the reduction in complexity by using the integrated storage, further benefits include the increased resilience of the service against machine outages, thanks to the HA configuration.
The migration also gives me the ability to declare the Vault storage in Consul.
Fault tolerance
As it turns out, the three of the pi-zeros are attached to separate USB hubs. Yes, the power still comes through the central AC to the house, but at least I could turn them on and off independently, via the USB hub power switch.
It would be a nice way to demonstrate fault tolerance.
Applying the upgrade
Since I wanted Vault to have zero platform dependencies, it would be defined as as an Ansible playbook.
The first thing to do is to declare a new group in the inventory: vault_servers, which would include all pi-zeros and the Raspberry Pi 4 currently hosting the Sense HAT.
The first to do would be to make a snapshot of the Vault data in order to provide a backup in case of disaster.
Once this was done, I could apply the new configuration, via an Ansible template task in the role:
Since we do not assume the presence of Consul on the hosts, we cannot use a Consul template to deliver the Vault configuration, and have to deliver it with an Ansible playbook run, which would install the desired version of Vault, systemd units and raft storage destinations, as well as the templated configuration file.
Joining and Unsealing new raft members
Shown in the Vault configuration template is a hardcoded leader address, the Raspberry PI4, sense.
This configuration is a bit of a special case, since I had decided that this machine would be the Vault leader a-priori.
There would therefore not necessarily be a leader election for the Raft storage initially, but the new raft members would join this existing one.
Upon start of the service on e.g. the pizero mounted on the wall (wallpi), we see the following in the Vault log, confirming the retry join of the raft cluster:
At this point, the vault is still sealed:
Using the unseal tokens from the instance running on the Sense HAT machine, I unseal the service when it comes up:
At this point, we can see that the new node in the raft cluster:
Next steps
Now that we have the cluster up and secrets in storage replicated across a Raft cluster, we have a somewhat more resilient and reliable setup.
The desired effect would be that even if the beefy leader went down, we would still be able to access our secrets via the other nodes, discovering them with the Consul DNS interface.
This is not 100% foolproof though; to protect from true data loss and not just a temporary outage, we need to make backups of raft snapshots to be able to restore from a known state in the case of disaster.
Other steps would be to enable TLS for the listeners, using the Vault CA, which will be described in a future post.
For now, let us merely celebrate this small win and tell ourselves a small secret: