====================== Vault\* System Issues ====================== Here is a list of common Vault\* system issues and tips to resolve them. Health Check ============ A good way to tell whether Vault is healthy or not is to look at its pod readiness. All 3 instances should be ready `3/3`. .. code:: shell $ kubectl -n orch-platform get pods | grep -E 'vault-\d+' vault-0 3/3 Running 0 23h vault-1 3/3 Running 0 23h vault-2 3/3 Running 0 23h Common Failures =============== Vault is sealed --------------- Vault will be sealed automatically after pod/node restart. It needs to be manually unsealed by the operator for security reasons. Symptom ~~~~~~~ For most of the time, a vault pod showing `2/3` ready indicates that it's sealed. .. code:: shell $ kubectl -n orch-platform get pods | grep -E 'vault-\d+' vault-0 2/3 Running 0 23h vault-1 3/3 Running 0 23h vault-2 3/3 Running 0 23h You can further check the vault status to be sure. `$1` should be the name of the Vault pod, for example, `vault-0`. .. code:: shell $ kubectl -n orch-platform exec -it $1 -- vault status | grep Sealed Sealed false Recovery method for environment **without auto unseal** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. note:: Each Vault instance needs to be unsealed separately. Run the following command and supply the unseal key to unseal vault. `$1` should be the name of the Vault pod, for example, `vault-0`. .. code:: shell $ kubectl -n orch-platform -it $1 -- vault operator unseal The unseal key must be securely stored during initial deployment. Recovery method for environment **with auto unseal** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A Vault pod must be unsealed automatically when it is restarted. You can restart the pod by the following command. `$1` should be the name of the Vault pod, for example, `vault-0`. .. code:: shell kubectl -n orch-platform delete pod $1 Alternatively, you can take the `recovery_keys` and unseal the instance using the same method described above. .. code:: shell kubectl -n orch-platform exec -it $1 -- vault operator unseal Unseal Key (will be hidden): The recovery key must be securely stored during initial deployment. Catastrophic Failures ===================== Vault is unrecoverable ---------------------- In the event of catastrophic failure such as unable to retrieve unseal keys, you must wipe out Vault databases and start all over. .. warning:: Doing this will result in loss of critical data such as our root PKI, and therefore should only be performed in extreme situations where Vault is unrecoverable. Consult a Platform team member before you proceed. Recovery Method ~~~~~~~~~~~~~~~ You first need to access the Postgres shell. .. code:: shell kubectl -n orch-database exec -it postgresql-0 -- psql -U orch-platform-vault_user \ -d orch-platform-vault You will be prompted for a password, which can be found in the following secret. .. code:: shell kubectl -n orch-platform get secret vault-local-postgresql -o yaml | \ yq .data.PGPASSWORD | base64 -d There are two tables Vault is using, `vault_ha_locks` and `vault_kv_store`. Once we are in the Postgres shell, run the following commands to drop tables. .. code:: shell orch-platform-vault=> drop table vault_ha_locks, vault_kv_store; Run the following command to initialize Vault. Unlike the unseal instruction, this only needs to be done on one of the Vault instances. .. code:: shell $ kubectl -n orch-platform exec -it vault-0 -- vault operator init \ -key-shares=1 -key-threshold=1 (unseal key and root token will be shown here) Safely store the unseal key and root token in a secure location. Seal Migration ============== It is possible to switch between **Shamir** seal (manual unseal) and **AWS KMS** seal (auto unseal) without the need to reinitialize Vault. Step 1: Update Vault configuration ---------------------------------- The seal method is configured in Helm value under `.server.ha.config`. - To use Shamir seal (manual unseal): .. code:: server: ha: config: | ... seal "shamir" { } seal "awskms" { disabled = "true" } Shamir seal is the default and typically it does not require any seal config to use it. However, when doing seal migration, it is required to explicitly specify `disabled = "true"` in the previous seal method. - To use AWS KMS seal (auto unseal): .. code:: server: ha: config: | ... seal "awskms" { } Note that the `shamir` section must be completely removed instead of using `disabled = "true"` when using AWS KMS seal. Step 2: Kill all **standby** instances and let Kubernetes restart them ---------------------------------------------------------------------- .. code:: shell kubectl -n orch-platform delete pod ... You can confirm whether each instance is active or standby by running: .. code:: shell kubectl -n orch-platform exec -it -- vault status Look for the **HA Mode** field. Step 3: Unseal all restarted **standby** instances with migration flag ---------------------------------------------------------------------- .. code:: shell kubectl -n orch-platform exec -it -- vault operator unseal -migrate # supply unseal key kubectl -n orch-platform exec -it -- vault operator unseal -migrate # supply unseal key Confirm that we are in seal migration mode by running: .. code:: shell kubectl -n orch-platform exec -it -- vault status Look for the **Seal Migration in Progress** field. Step 4: Step down the **active** instance ----------------------------------------- One of the **standby** instances will take over and become **active** automatically. .. code:: shell kubectl -n orch-platform exec -it -- vault login # supply root token kubectl -n orch-platform exec -it -- vault operator step-down Step 5: Finally, restart the **original active** instance --------------------------------------------------------- .. code:: shell kubectl -n orch-platform delete pod Reference --------- - `Seal migration `_