Why this matters
Look at any incident report from the last five years involving cloud data exposure and there's a good chance the root cause is the same thing: a long-lived credential ended up somewhere it shouldn't have. A leaked access key in a public repo. A CI build secret stolen from a compromised developer laptop. An old service account whose key was rotated four jobs ago and never revoked. The fix is well known — short-lived, identity-bound credentials issued at runtime. The migration is what almost no one finishes.
This is the playbook we use when we lead these migrations. It's vendor-neutral in shape and vendor-specific where it has to be. The big idea is the same on every cloud: stop minting credentials, start asserting identity, and let the cloud provider trade that identity for ephemeral credentials at the moment of use.
1. Inventory before you migrate
You cannot migrate what you cannot see. The first phase is always boring: build a list of every long-lived credential in your environment. The list is always longer than anyone expects, and about a third of the entries belong to systems no one remembers building.
- AWS: every IAM user with access keys, every access key on a root account
(should be zero), every
~/.aws/credentialsfile on a developer machine, every CI secret named anything likeAWS_*, every key in Secrets Manager - Azure: every service principal with a client secret or certificate, every app registration with a credential, every Run As account, every key in Key Vault that is itself a cloud credential
- GCP: every service account with a downloaded key file, every key in Secret
Manager, every
GOOGLE_APPLICATION_CREDENTIALSpath in a CI config
For each credential, capture three things: who's using it, what permissions it has, and when it was last used. The "last used" field is the lever — anything that hasn't been used in 90 days is a delete candidate, not a migrate candidate. You will retire 20-40% of your inventory just by doing this honestly.
2. Categorize the workloads
Not every credential should be migrated the same way, because not every workload runs in the same place. Sort the inventory into four buckets — the migration path is different for each.
Workloads running inside the cloud provider
The easiest case. Anything running on the same cloud as the resources it's accessing — EC2, Lambda, ECS, Fargate, App Service, Cloud Run, GKE, AKS, EKS — should use the cloud-native workload identity primitive. There's no excuse for an EC2 instance to have an IAM user's access key in an environment variable.
- AWS: EC2 instance profiles, ECS task roles, IRSA / Pod Identity for EKS
- Azure: System-assigned or user-assigned managed identities; for AKS, use Microsoft Entra ID Workload Identity
- GCP: Compute Engine default service accounts (replaced with attached service accounts), Workload Identity for GKE
Workloads running on a different cloud or on-prem
The interesting case. You need to assert identity from outside the cloud provider's perimeter without minting a long-lived credential. Every major cloud now has a workload-identity federation primitive for this — they all do roughly the same thing under different names.
- AWS IAM Roles Anywhere: the workload presents a certificate signed by a trust anchor (your CA), AWS validates it, and returns temporary STS credentials
- Microsoft Entra ID Workload Identity Federation: the workload presents an OIDC token from a trusted issuer, Microsoft Entra ID trades it for an access token
- GCP Workload Identity Federation: the workload presents an OIDC or SAML token from a trusted external IdP, GCP trades it for short-lived tokens
CI/CD pipelines
The highest-leverage migration target. CI systems hold long-lived cloud credentials so they can deploy on your behalf, and CI is the most-attacked surface in modern engineering orgs. Every major CI vendor now ships an OIDC issuer; every major cloud now accepts those tokens.
- GitHub Actions → AWS: configure GitHub's OIDC provider as a trust relationship on a target IAM role; the workflow assumes the role with no static secret
- GitHub Actions → Azure: federated credential on the app registration, scoped to a specific repo, branch, or environment
- GitHub Actions → GCP: Workload Identity Federation pool with a GitHub OIDC provider
- GitLab CI, CircleCI, Buildkite, Jenkins: all support OIDC issuance to varying degrees; check the docs and prefer this over storing keys in CI secrets
Developer laptops
Don't put long-lived credentials on developer machines. Use SSO-backed assume-role flows. Developers authenticate to your IdP (Okta, Entra, Google), which gates access to a script or CLI that returns short-lived credentials with a short TTL.
- AWS: IAM Identity Center (AWS SSO), aws-vault, granted, leapp
- Azure:
az loginwith conditional access policies enforced - GCP:
gcloud auth loginwith org-policy-enforced session length
3. Design the target state
Before you write any code, draw the target architecture. The migration is a graph reduction problem: every workload should end up with exactly one identity, every identity should map to exactly one purpose, and every credential should be issued at runtime.
- One identity per workload, not per environment: the prod and staging copies of the same service should share an identity definition, parameterized by environment, not have two hand-crafted roles that drift over time
- Trust relationships scoped as tightly as possible: a federated trust to GitHub Actions should be scoped to a single repo, ideally a single branch or environment, and a single workflow file path
- Permissions sized to actual usage, not feared usage: use access analyzer (AWS), policy insights (Azure), or recommender (GCP) to sample the actual API calls and shrink the policy to fit
- No wildcards on resources: every policy should name the resources it operates
on, not
*. This is the single most common source of blast-radius regret - Break-glass accounts isolated and audited: there are still good reasons for a long-lived credential to exist (vendor integrations, certain SaaS tools). Move them all into one labeled vault, alert on every use
4. Roll it out without breaking production
The hard part of this migration is not the technology. It's that you're changing the authentication model on running services, in production, without an outage. The pattern that works:
- Dual-credential phase: the workload supports both the old credential and the new identity simultaneously. Default to the new path, fall back to the old. Deploy to non-prod for at least a week
- Observability first: log every authentication attempt with which credential path was used. You need to be able to say "the service has not used the old credential for 7 days" before you delete it
- Disable, don't delete: when the metric clears, disable the old credential first and watch for fallout for a few days. Then delete it
- One workload at a time: resist the urge to do a "big bang" cutover. The blast radius of a misconfigured federation is "every workload that uses it." Migrate one service per change
- Have the rollback ready before the rollout: if you can't re-enable the old credential in 60 seconds, you don't have a rollback plan
5. Common mistakes we keep seeing
- Migrating the production workload to workload identity but leaving a developer's
aws_access_key_idcommitted in a buried config file from 2022. Run a credential scan after the migration, not before - Federating GitHub Actions to a single IAM role with cluster-admin and saying "we're done." A federated trust still needs a sized policy on the receiving end
- Setting the federated credential's audience or subject claim to a wildcard ("any branch in the repo"). Now any pull request from a fork can run a workflow that assumes your prod role
- Missing the long tail: third-party SaaS integrations that store an IAM user key in their platform. Many of them now support assume-role with external ID — check before assuming you're stuck
- Forgetting to rotate the certificate authority or trust anchor used by IAM Roles Anywhere. The new CA needs the same rotation discipline as the credentials it replaced
- Believing that "no long-lived keys in source control" is the same as "no long-lived keys." Check Secrets Manager, Key Vault, parameter stores, and any in-house secret broker
6. Where this fits in a Zero Trust architecture
Workload identity is the foundational layer of any serious Zero Trust effort on cloud infrastructure. The principle "never trust, always verify" only works if the verification is per-request and tied to a strong identity. Static credentials are the opposite of that — they're a bearer token issued once and trusted forever, until someone rotates them or they leak.
Once workload identity is in place, you can start enforcing the rest of the Zero Trust controls that depend on it: per-request authorization with policy-as-code (Cedar, OPA), context-aware access (require MFA, require corporate device, require known network range), and service-to-service authentication via mTLS or signed tokens that carry the workload's identity rather than a shared secret. Trying to do any of these without workload identity first leads to a half-built Zero Trust system that still has long-lived keys at the bottom of it.
The short version
The end-state of this migration is measurable. After it's done you should be able to say:
- There are no long-lived cloud credentials in any source repository, CI secret, or developer machine
- The number of IAM users / app registrations / service accounts is in single digits
- Every credential that does still exist has a documented owner, a documented use case, and is alerted on every time it's used
- Every workload identity is scoped to a single purpose with a sized-to-actual-use permission set
- A new service inherits an identity by configuration, not by a security ticket and a credentials handoff
Want us to lead the migration?
AZ-500 and AWS Security Specialty-led. We map your current IAM, design the target state, and pair with your platform team to ship it.