Cloud IAM Modernization: From Long-Lived Keys to Workload Identity

Why this matters

Look at any incident report from the last five years involving cloud data exposure and there's a good chance the root cause is the same thing: a long-lived credential ended up somewhere it shouldn't have. A leaked access key in a public repo. A CI build secret stolen from a compromised developer laptop. An old service account whose key was rotated four jobs ago and never revoked. The fix is well known — short-lived, identity-bound credentials issued at runtime. The migration is what almost no one finishes.

This is the playbook we use when we lead these migrations. It's vendor-neutral in shape and vendor-specific where it has to be. The big idea is the same on every cloud: stop minting credentials, start asserting identity, and let the cloud provider trade that identity for ephemeral credentials at the moment of use.

1. Inventory before you migrate

You cannot migrate what you cannot see. The first phase is always boring: build a list of every long-lived credential in your environment. The list is always longer than anyone expects, and about a third of the entries belong to systems no one remembers building.

AWS: every IAM user with access keys, every access key on a root account (should be zero), every ~/.aws/credentials file on a developer machine, every CI secret named anything like AWS_*, every key in Secrets Manager
Azure: every service principal with a client secret or certificate, every app registration with a credential, every Run As account, every key in Key Vault that is itself a cloud credential
GCP: every service account with a downloaded key file, every key in Secret Manager, every GOOGLE_APPLICATION_CREDENTIALS path in a CI config

For each credential, capture three things: who's using it, what permissions it has, and when it was last used. The "last used" field is the lever — anything that hasn't been used in 90 days is a delete candidate, not a migrate candidate. You will retire 20-40% of your inventory just by doing this honestly.

2. Categorize the workloads

Not every credential should be migrated the same way, because not every workload runs in the same place. Sort the inventory into four buckets — the migration path is different for each.

Workloads running inside the cloud provider

The easiest case. Anything running on the same cloud as the resources it's accessing — EC2, Lambda, ECS, Fargate, App Service, Cloud Run, GKE, AKS, EKS — should use the cloud-native workload identity primitive. There's no excuse for an EC2 instance to have an IAM user's access key in an environment variable.

AWS: EC2 instance profiles, ECS task roles, IRSA / Pod Identity for EKS
Azure: System-assigned or user-assigned managed identities; for AKS, use Microsoft Entra ID Workload Identity
GCP: Compute Engine default service accounts (replaced with attached service accounts), Workload Identity for GKE

Workloads running on a different cloud or on-prem

The interesting case. You need to assert identity from outside the cloud provider's perimeter without minting a long-lived credential. Every major cloud now has a workload-identity federation primitive for this — they all do roughly the same thing under different names.

AWS IAM Roles Anywhere: the workload presents a certificate signed by a trust anchor (your CA), AWS validates it, and returns temporary STS credentials
Microsoft Entra ID Workload Identity Federation: the workload presents an OIDC token from a trusted issuer, Microsoft Entra ID trades it for an access token
GCP Workload Identity Federation: the workload presents an OIDC or SAML token from a trusted external IdP, GCP trades it for short-lived tokens

CI/CD pipelines

The highest-leverage migration target. CI systems hold long-lived cloud credentials so they can deploy on your behalf, and CI is the most-attacked surface in modern engineering orgs. Every major CI vendor now ships an OIDC issuer; every major cloud now accepts those tokens.

GitHub Actions → AWS: configure GitHub's OIDC provider as a trust relationship on a target IAM role; the workflow assumes the role with no static secret
GitHub Actions → Azure: federated credential on the app registration, scoped to a specific repo, branch, or environment
GitHub Actions → GCP: Workload Identity Federation pool with a GitHub OIDC provider
GitLab CI, CircleCI, Buildkite, Jenkins: all support OIDC issuance to varying degrees; check the docs and prefer this over storing keys in CI secrets

Developer laptops

Don't put long-lived credentials on developer machines. Use SSO-backed assume-role flows. Developers authenticate to your IdP (Okta, Entra, Google), which gates access to a script or CLI that returns short-lived credentials with a short TTL.

AWS: IAM Identity Center (AWS SSO), aws-vault, granted, leapp
Azure: az login with conditional access policies enforced
GCP: gcloud auth login with org-policy-enforced session length

3. Design the target state

Before you write any code, draw the target architecture. The migration is a graph reduction problem: every workload should end up with exactly one identity, every identity should map to exactly one purpose, and every credential should be issued at runtime.

One identity per workload, not per environment: the prod and staging copies of the same service should share an identity definition, parameterized by environment, not have two hand-crafted roles that drift over time
Trust relationships scoped as tightly as possible: a federated trust to GitHub Actions should be scoped to a single repo, ideally a single branch or environment, and a single workflow file path
Permissions sized to actual usage, not feared usage: use access analyzer (AWS), policy insights (Azure), or recommender (GCP) to sample the actual API calls and shrink the policy to fit
No wildcards on resources: every policy should name the resources it operates on, not *. This is the single most common source of blast-radius regret
Break-glass accounts isolated and audited: there are still good reasons for a long-lived credential to exist (vendor integrations, certain SaaS tools). Move them all into one labeled vault, alert on every use

4. Roll it out without breaking production

The hard part of this migration is not the technology. It's that you're changing the authentication model on running services, in production, without an outage. The pattern that works:

Dual-credential phase: the workload supports both the old credential and the new identity simultaneously. Default to the new path, fall back to the old. Deploy to non-prod for at least a week
Observability first: log every authentication attempt with which credential path was used. You need to be able to say "the service has not used the old credential for 7 days" before you delete it
Disable, don't delete: when the metric clears, disable the old credential first and watch for fallout for a few days. Then delete it
One workload at a time: resist the urge to do a "big bang" cutover. The blast radius of a misconfigured federation is "every workload that uses it." Migrate one service per change
Have the rollback ready before the rollout: if you can't re-enable the old credential in 60 seconds, you don't have a rollback plan

5. Common mistakes we keep seeing

Migrating the production workload to workload identity but leaving a developer's aws_access_key_id committed in a buried config file from 2022. Run a credential scan after the migration, not before
Federating GitHub Actions to a single IAM role with cluster-admin and saying "we're done." A federated trust still needs a sized policy on the receiving end
Setting the federated credential's audience or subject claim to a wildcard ("any branch in the repo"). Now any pull request from a fork can run a workflow that assumes your prod role
Missing the long tail: third-party SaaS integrations that store an IAM user key in their platform. Many of them now support assume-role with external ID — check before assuming you're stuck
Forgetting to rotate the certificate authority or trust anchor used by IAM Roles Anywhere. The new CA needs the same rotation discipline as the credentials it replaced
Believing that "no long-lived keys in source control" is the same as "no long-lived keys." Check Secrets Manager, Key Vault, parameter stores, and any in-house secret broker

6. Where this fits in a Zero Trust architecture

Workload identity is the foundational layer of any serious Zero Trust effort on cloud infrastructure. The principle "never trust, always verify" only works if the verification is per-request and tied to a strong identity. Static credentials are the opposite of that — they're a bearer token issued once and trusted forever, until someone rotates them or they leak.

Once workload identity is in place, you can start enforcing the rest of the Zero Trust controls that depend on it: per-request authorization with policy-as-code (Cedar, OPA), context-aware access (require MFA, require corporate device, require known network range), and service-to-service authentication via mTLS or signed tokens that carry the workload's identity rather than a shared secret. Trying to do any of these without workload identity first leads to a half-built Zero Trust system that still has long-lived keys at the bottom of it.

The short version

The end-state of this migration is measurable. After it's done you should be able to say:

There are no long-lived cloud credentials in any source repository, CI secret, or developer machine
The number of IAM users / app registrations / service accounts is in single digits
Every credential that does still exist has a documented owner, a documented use case, and is alerted on every time it's used
Every workload identity is scoped to a single purpose with a sized-to-actual-use permission set
A new service inherits an identity by configuration, not by a security ticket and a credentials handoff

Want us to lead the migration?

AZ-500 and AWS Security Specialty-led. We map your current IAM, design the target state, and pair with your platform team to ship it.

Cloud Security service Book a 30-min diagnostic