Automating Storage and Management of Terraform Variables
One issue with Terraform I've encountered whilst collaborating on the deploys as a team, is how we manage the .tfvars
files. They essentially have the same issue as .env
files, if you have ever been developing on an application locally. If someone adds a feature, or introduces a new variable, that needs to be shared with the team.
As the .tfvars
file gets updated on Person A's local machine and deploys an infrastructure, odd things can happen if Person B's local machine doesn't have the same Terraform variables.
The first question is where to store these. We began by storing them in 1Password, or sharing non-secret variables via chat, but it always required someone to manually update (and also remember to update) the variables in the remote store. Worse case scenario, is that someone would run a terraform apply
without realising that they had out of date variables, and accidentally revert important infrastructure.
The first and easiest part of this solution was to store them within either an AWS S3 bucket, Azure Key Store or any other store that the projects cloud provider has to offer. All we need to do is add the Terraform variables file as a resource to be stored. Here, I've added it as an Azure Key Vault Secret (to a Key Vault already created):
resource "azurerm_key_vault_secret" "tfvars" {
name = "${terraform.workspace}-tfvars"
value = base64encode(file(var.tfvar_filename))
key_vault_id = azurerm_key_vault.tfvars.id
content_type = "text/plain+base64"
}
The great thing about doing this, is that the remote store will always contain the latest Terraform variables, because it will be updated every time a terraform apply
is ran
However it still required the next team member to remember to download the latest Terraform variables before deploying. So we had to find a way that would prevent someone form inadvertently reverting the remote store.
I found the simplest way of achieving this would be to compare the updated date/time of the Key Vault Secret, and the modified date/time of the Terraform variables file store on a persons local machine. I would then use the comparison as a condition to decide wether or not to update the remote store. This is by no means fault proof, but it's better than nothing 🙂
Unfortunately, the azurerm_key_vault_secret
resource doesn't provide the updated time. Nor does the local_file
data resource provide the modified time. So I decided to use the local-exec
provisioner to run a custom script. The use of provisioners are mostly 'Last Resort' ... and this felt like one of those times.
We can get the updated time (attributes.updated
) of the Key Vault Secret using az keyvault secret list --vault-name <vault_name>
, which produces something like this:
[
{
"attributes": {
"created": "2021-11-03T08:27:39+00:00",
"enabled": true,
"expires": null,
"notBefore": null,
"recoveryLevel": "Recoverable+Purgeable",
"updated": "2021-11-03T08:27:39+00:00"
},
"contentType": "text/plain+base64",
"id": "https://<vault_name>.vault.azure.net/secrets/secret1",
"kid": null,
"managed": null,
"managedAttributes": null,
"name": "staging-tfvars",
"vaultUrl": "https://<vault_name>.vault.azure.net"
}
]
And we can simply get the timestamp of the file using date
:
date -r staging.tfvars +%s
1691679699
I came up with this script, which:
- Takes in the 'Vault Name', 'Secret Name' and 'TFvars Filename' as parameters
- Ensures the Key Vault exists
- Ensures the Key Vault Secret exists
- Gets the updated date/time of the secret, converted to seconds
- Gets the modified date/time of the file, converted to seconds
- If the TFvars seconds is less than the Key Vault Secret seconds, exit with error
#!/bin/bash
# exit on failures
set -e
set -o pipefail
usage() {
echo "Usage: $(basename "$0") [OPTIONS]" 1>&2
echo " -h - help"
echo " -v <vault_name> - Azure Key Vault name"
echo " -s <secret_name> - Azure Key Vault Secret name"
echo " -f <local_tfars_filename> - Local tfvars file name"
exit 1
}
# if there are not arguments passed exit with usage
if [ $# -eq 0 ]
then
usage
fi
while getopts "v:s:f:h" opt; do
case $opt in
v)
VAULT_NAME=$OPTARG
;;
s)
SECRET_NAME=$OPTARG
;;
f)
LOCAL_TFVARS_FILE_NAME=$OPTARG
;;
h)
usage
;;
*)
usage
;;
esac
done
if [[
-z "$VAULT_NAME" ||
-z "$SECRET_NAME" ||
-z "$LOCAL_TFVARS_FILE_NAME"
]]
then
usage
fi
set +e
KEY_VAULT_CHECK=$(az keyvault secret list --vault-name "$VAULT_NAME" 2>&1)
set -e
if ! jq -r >/dev/null 2>&1 <<< "$KEY_VAULT_CHECK"
then
exit 0
fi
SECRET_UPDATED=$(jq -r \
--arg secret_name "$SECRET_NAME" \
'.[] | select(.name==$secret_name) | .attributes.updated' \
<<< "$KEY_VAULT_CHECK")
if [ -z "$SECRET_UPDATED" ]
then
exit 0
fi
SECRET_UPDATED=$(echo "$SECRET_UPDATED" | cut -d'+' -f1)
SECRET_UPDATED_SECONDS=$(date -j -f "%Y-%m-%dT%H:%M:%S" "$SECRET_UPDATED" "+%s")
if [ "$SECRET_UPDATED_SECONDS" -gt "$(date -r "$LOCAL_TFVARS_FILE_NAME" +%s)" ]
then
echo ""
echo ""
echo "Error: Your local tfvars file is older than the remote!"
echo ""
echo "Ensure you have the latest tfvars by running:"
echo ""
echo " mv $LOCAL_TFVARS_FILE_NAME $LOCAL_TFVARS_FILE_NAME.old"
echo " az keyvault secret download \\"
echo " --file $LOCAL_TFVARS_FILE_NAME \\"
echo " --encoding base64 \\"
echo " --vault-name $VAULT_NAME \\"
echo " --name $SECRET_NAME"
echo ""
echo "Or if you are sure your local tfvars are correct, just update the modified time by running:"
echo ""
echo " touch $LOCAL_TFVARS_FILE_NAME"
echo ""
exit 1
fi
We can now run this script using the `null_resource` with provisioner:
resource "null_resource" "check_key_vault_secret_age_against_local_tfvars" {
count = local.enable_tfvars_file_age_check ? 1 : 0
provisioner "local-exec" {
interpreter = ["/bin/bash", "-c"]
command = "${path.module}/scripts/check-key-vault-secret-age-against-local-tfvars.sh -v \"${azurerm_key_vault.tfvars.name}\" -s \"${local.resource_prefix}-tfvars\" -f ${local.tfvars_filename}"
}
}
However, this will only run once, at the time that Terraform recognises it as a new resource.
I did initially try just running it on every deploy, by adding a trigger block into the resource that causes it to always run:
[...]
triggers = {
always_run = timestamp()
}
[...]
But this would mean than if I successfully ran an apply, my next deploy would cause the script to fail if I didn't update my local tfavrs file (because it would now have a modified time older than the updated time of the Key Vault Secret).
I decided a better solution would be to set the trigger as the MD5 hash of the Terrform variables file:
[...]
triggers = {
tfvar_file_md5 = filemd5(local.tfvars_filename)
}
[...]
This actually solved 2 issues:
- Ensured the script doesn't run if I don't update my local file
- And because the MD5 of the file would be stored in the Terraform State, it would cause the script to run if another team member attempted a deploy with a different file (wether that had updated it or not) ...
Now I just needed to add a depends_on
block to the azurerm_key_vault_secret
r/esource:
resource "azurerm_key_vault_secret" "tfvars" {
name = "${terraform.workspace}-tfvars"
value = base64encode(file(var.tfvar_filename))
key_vault_id = azurerm_key_vault.tfvars.id
content_type = "text/plain+base64"
depends_on = [
null_resource.check_key_vault_secret_age_against_local_tfvars
]
}
This would mean that if the script fails (eg. the local file is older than the remote key store secret), the file won't overwrite the remote store.
A complete PR I made for one of our projects can be found here: https://github.com/DFE-Digital/terraform-azurerm-key-vault-tfvars/pull/63/files