Why every team eventually has this conversation
The path is the same everywhere. A small team launches a service and puts the database password in
an environment variable. The second service shows up and the password ends up in a .env file checked into Git. By the third service, someone has noticed and added a .gitignore, but the file is still on every developer's laptop. By the fifth service
there's a private Confluence page with a copy. By the tenth, an engineer rotates the password and
four services break, and someone finally says: "we should put this in a real secret store."
The problem isn't that the team doesn't know they should use a vault. They know. The problem is that picking a vault, migrating to it, and not breaking deployments in the process is genuinely hard. Half the secrets-management projects we see started a year ago and stalled at 60% completion, with the production-critical secrets still in the old place because nobody wanted to risk the cutover.
This is the migration playbook we run on engagements. The decision points up front, the sequencing that has worked, and the patterns to avoid.
1. The tools, briefly
You'll be choosing between five categories of tool. Each has a sweet spot. Don't pick based on hype — pick based on which one fits your stack.
- HashiCorp Vault. The most flexible option. Self-hosted (Open Source) or HCP-hosted. Supports dynamic secrets, secret leasing, just-in-time database credentials, and a broad set of auth methods. Worth the operational complexity if you have a security team to run it; overkill if you don't.
- Cloud KMS / Secret Manager. AWS Secrets Manager, AWS SSM Parameter Store (SecureString), Azure Key Vault, GCP Secret Manager. Native to your cloud, no servers to run, integrated with cloud IAM. The right choice if you're single-cloud and don't need dynamic secrets.
- Sealed Secrets (Bitnami). A Kubernetes-native pattern where secrets are encrypted with an asymmetric key and the encrypted blob lives in Git. The cluster has the decryption key. Good for GitOps, zero runtime dependency on a vault, but secrets are static and rotation is manual.
- External Secrets Operator (ESO). A Kubernetes operator that pulls secrets from a backing store (Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, and others) and materializes them as Kubernetes Secrets. Lets you use cloud-native vaults with cluster-native consumption.
- SOPS (Mozilla). File-level encryption for YAML/JSON/INI/ENV files using KMS, PGP, or age keys. The encrypted file lives in Git, decrypted at deploy time. Great for small teams and static config; doesn't scale to dynamic secrets or large environments.
2. The decision matrix
Here's the short version. None of these are absolute, but they're the heuristics we use to get a team to a defensible answer in one meeting.
- If you're single-cloud and don't have a dedicated platform team: use the cloud's native secret store (Secrets Manager, Key Vault, GCP Secret Manager). The integration with cloud IAM is the win. Don't run Vault if you don't have someone who'll own it.
- If you're multi-cloud or you need dynamic database credentials: Vault is probably right. The dynamic-secrets feature alone justifies the operational cost — short-lived database credentials issued just-in-time eliminate an entire class of leaked-credential incidents.
- If your workloads are entirely on Kubernetes and you want a GitOps-native pattern: External Secrets Operator on top of your cloud secret manager. Best of both worlds: secrets stored in a managed cloud service, consumed via the operator as native K8s Secrets.
- If you have a small team and mostly static config: Sealed Secrets or SOPS. Both are simple, both work, both are fine until you outgrow them. Don't let perfect be the enemy of good — these are real solutions, not toys.
- If you're an enterprise with compliance requirements: Vault, with HSM backing if HSMs are required. The audit logging and access control features are mature in a way the cloud-native options aren't.
3. Phase 0: stop the bleeding
Before you migrate anything, stop the bleeding. There are two specific things to do, both in the first week.
First, turn on push protection in your Git host. GitHub has it built in (Secret Scanning Push
Protection). GitLab has Secret Push Protection in the paid tiers. Bitbucket has equivalent
functionality. This blocks new secrets from being committed at the moment of git push, which means the problem stops getting worse from day one.
Second, run a Git history scan on every repository to find existing secrets. Tools: gitleaks (free, fast, good signal-to-noise), trufflehog (broader coverage,
good for verifying which secrets are actually live). Both work in CI; both can scan history.
Critical point: secrets found in Git history are compromised. They have to be rotated, not just removed from history. Removing them from history is hygiene. Rotating them is the actual fix. Plan for both.
4. Phase 1: pick the tool, prove it on one workload
Don't migrate the whole org at once. Pick one workload — ideally a non-critical one with a team that's enthusiastic about being the pilot — and migrate it end to end. The goal of phase 1 is to prove the tooling works in your environment, not to make a dent in the migration backlog.
Set up the vault. Document the setup. Migrate the pilot workload's secrets into the vault. Update the pilot workload's deployment to read from the vault instead of from environment variables. Run it for a week. Measure: did it work, was the developer experience acceptable, what broke, what didn't break that you expected to.
Phase 1 should produce a written runbook: "to add a new secret, do these steps. To rotate a secret, do these steps. To grant a workload access to a secret, do these steps." If the runbook is more than a page, the tool choice may be wrong, or the integration is more complicated than it needs to be.
5. Phase 2: secrets inventory
Now you need to know what you have. Build an inventory of every secret in use across the organization. The categories to enumerate:
- Database credentials (per-environment, per-database)
- API keys for third-party services (Stripe, SendGrid, Twilio, etc.)
- OAuth client secrets and signing keys
- Cloud provider credentials (the ones you haven't moved to workload identity yet)
- Internal service-to-service auth tokens
- TLS private keys for internal CAs and cert signing
- SSH keys for deploy and CI
- GPG keys for signing
- Webhook signing secrets
- JWT signing keys
For each secret, record: where it currently lives (env var, file, Git, vault, manual), which workloads consume it, who owns the rotation, and when it was last rotated. This list is the migration backlog. It will be longer than you think.
6. Phase 3: migrate by criticality, not alphabetically
The order of migration matters. Two heuristics:
Migrate by blast radius, descending. The secrets with the largest blast radius if leaked go first. Cloud root credentials, database admin passwords, signing keys for your auth tokens. These are the secrets where a leak ends the company.
Then migrate by ease, ascending. Within a blast-radius bracket, take the easy ones first. Workloads that are well-tested, teams that have bandwidth, services that are about to deploy a normal release anyway. Build momentum.
Don't do alphabetical. Don't do "the workloads owned by the most enthusiastic team." Do criticality. The whole point is to reduce risk, and alphabetical doesn't reduce risk.
7. The cutover pattern that doesn't break things
For each workload, the safe cutover sequence is:
- Add the secret to the vault. The vault now has the value.
- Modify the workload's deployment to read from both the vault and the existing source (env var, file). If the vault returns a value, prefer it; otherwise fall back to the existing source. This is dual-read.
- Deploy. Verify the workload still works. The vault is now in the path but the legacy source is still the fallback.
- Verify in logs that the workload is reading from the vault, not the fallback. If it's still reading the fallback, debug.
- Remove the legacy source. Deploy again. The workload now reads only from the vault.
- Rotate the secret in the vault. This is the real test — if rotation works, the migration is complete.
The dual-read step is the one teams skip and regret. It costs you one extra deploy per secret, and it eliminates the "did the migration break anything" risk almost entirely.
8. Kubernetes-specific patterns
If your workloads are on Kubernetes, External Secrets Operator (ESO) is the pattern we recommend most often. The flow:
- Secrets live in your cloud secret manager (or Vault).
- An ESO
SecretStoreresource configures how the cluster authenticates to the backend. Use workload identity (IRSA / EKS Pod Identity, GKE Workload Identity Federation, Microsoft Entra Workload ID for Kubernetes) for this — no static credentials. Azure's olderaad-pod-identityproject is end-of-life and should not be used for new clusters. - An
ExternalSecretresource in each namespace declares which secrets to fetch and how to map them into a native Kubernetes Secret. - The ESO controller watches these resources and creates/updates the corresponding Secret objects. Workloads consume the Secret normally, via env var or volume.
- Set the refresh interval. Common values: 1 hour for non-rotating secrets, 15 minutes for rotating ones. Workloads that need to pick up rotated secrets without restart need to be built to reload, which is a separate problem.
The thing to know about ESO: the secret values still end up as native Kubernetes Secrets in etcd. That's fine if your etcd encryption-at-rest is on (it should be). But the model is "sync from a real vault into K8s," not "store secrets in a vault and never have them in etcd." If you need the second model, you need a sidecar pattern (Vault Agent Injector) or CSI Secret Store, both of which mount secrets directly into the pod without ever creating a K8s Secret object.
9. Rotation, the part everyone skips
Migrating secrets to a vault is half the work. The other half is rotation. A secret in a vault
that hasn't been rotated in two years is barely better than the same secret in a .env file — if it leaked at any point in those two years, it's still valid.
The model that works is automatic rotation, scheduled, with the vault driving it. Most cloud secret managers support this for database credentials out of the box: AWS Secrets Manager has Lambda-based rotation for RDS, Azure Key Vault has Event Grid-driven rotation, GCP has Secret Manager rotation events. Wire them up.
For secrets that don't have automatic rotation built in (third-party API keys, internal service tokens), rotation has to be manual but should still be scheduled. Pick a cadence — we recommend 90 days for high-value secrets, 12 months for low-value — and put it on the calendar. Build the runbook for each secret type. Test the runbook by actually rotating once.
The forcing function is that if you can't rotate a secret without breaking workloads, the workload has a bug and you need to fix it. The pain of the rotation drill is the pain of finding out your reload logic is wrong, and that's better to find on a Tuesday afternoon than during an incident.
10. Dynamic secrets, the endgame
The most defensible model for secrets is the one where most secrets don't exist as long-lived strings at all. Vault's dynamic-secrets engine for databases will create a per-workload database user with a 1-hour TTL on demand. The application requests credentials, gets a username and password back, uses them for the next hour, and they're revoked automatically. There's nothing to leak.
You don't need to start here, but it's the right endpoint. Once the static-secret migration is complete, the next project is: which secrets can be replaced with dynamic ones? Database credentials are the easy win. Cloud IAM with workload identity is the same idea, applied to cloud APIs. SSH certificates from a Vault SSH backend are the same idea, applied to infrastructure access.
The teams that get to this point stop having "leaked credential" incidents. The credentials just don't exist long enough to leak.
11. How to know it's working
Track three things, monthly:
- Secrets in Git history (decreasing). Run gitleaks weekly. The number should go down. When it hits zero, run a full history rewrite to remove the historical secrets.
- Workloads reading from the vault (increasing). Inventory percentage. Goal: 100%, but the curve matters more than the endpoint.
- Secrets that have been rotated in the last 90 days (increasing). Until rotation is automated, this metric is the canary for whether the program is mature.
The short version
Pick a tool that fits your stack — cloud-native secret manager for single-cloud teams, Vault for multi-cloud or dynamic-secret needs, ESO if you're Kubernetes-first, Sealed Secrets or SOPS for small teams. Stop the bleeding with push protection and history scanning. Pilot on one workload to prove the tooling. Inventory every secret. Migrate by blast radius descending, using a dual-read cutover that doesn't risk breaking deployments. Schedule rotation, automate it where you can, drill it where you can't. The endgame is dynamic secrets, where most credentials don't exist as static strings at all.
The migration takes 3–9 months for most teams. The teams that finish it stop having leaked- credential incidents. The teams that don't finish it keep having them.
Want us to lead the migration?
We've migrated teams off Git-stored secrets onto Vault, ESO, and cloud KMS. We pick the right tool for your stack, pair with your platform team, and hand it back working.