The Cloud Incident Response Runbook: First 60 Minutes After a Compromised Key

Why you're reading this

A valid AWS access key lands in your Slack. Or CloudTrail shows login attempts from a datacenter you don't own. Or someone notices a Lambda function in production you don't remember shipping. You have two hours before your incident commander needs a containment plan, and you need to move fast without torching production.

This is a runbook your team runs first. Not a consulting report afterward — steps you execute immediately, with your own tools, in your own environment. CAASLABS doesn't do incident response; we help you build the prevention and detection layers that shrink the blast radius when this happens.

1. The signal that usually fires first

Attackers routinely exploit leaked AWS access keys within minutes of exposure — honeypot studies have documented the first unauthorized API call arriving in as little as one minute. You usually don't find it that way. Instead, CloudTrail shows API calls from an IP that doesn't match your CIDR blocks, or from a region where you don't operate. In Azure, it's the same — Activity Log shows login attempts from a geography outside your risk profile, or service principal access from an anomalous location.

Start here: log into your cloud provider's console or use the CLI to pull the last 90 days of CloudTrail events (AWS), Activity Log entries (Azure), or Audit Logs (GCP). Look for:

GetUser, ListUsers, ListAccessKeys — reconnaissance
AssumeRole calls targeting admin roles
Login events from unknown IPs or VPN providers
Failed authentication attempts followed by success from the same IP

Grep CloudTrail for the source IP and timestamp. You need that full context before you revoke anything.

2. First 15 minutes — revoke the key, scope the blast radius

Disable the compromised access key immediately. Do not delete it yet — you need the key ID in CloudTrail queries. In AWS:

aws iam update-access-key-status --access-key-id AKIAIOSFODNN7EXAMPLE \
  --status Inactive --user-name suspected-user

In Azure, disable the service principal or managed identity. In GCP, revoke the key from the service account.

While that's happening, pull a credential report to see if other keys exist for the same user:

aws iam generate-credential-report
aws iam get-credential-report

Look at the access key age. If it's weeks old and you didn't issue it, someone had persistence for weeks. Note the creation timestamp — you'll need to scope your investigation to activity after that date.

3. Enumerate what the key could reach

Run a simulation to see what this key's principal can do. AWS IAM has SimulatePrincipalPolicy — it will tell you what actions are allowed for a given principal and resource set. GCP and Azure have equivalents (identity and access management simulations in Azure AD, and gcloud's roles describe for GCP). Do this for both inline and attached policies:

aws iam simulate-principal-policy --policy-source-arn arn:aws:iam::123456789012:user/suspected-user \
  --action-names ec2:DescribeInstances s3:GetObject lambda:InvokeFunction rds:DescribeDBInstances kms:Decrypt \
  --resource-arns "*"

Note: SimulatePrincipalPolicy requires fully qualified action names — wildcards like ec2:* are not supported. Test the specific actions relevant to your environment. If the principal has wildcard permissions or admin access, treat this as a full account compromise. Escalate immediately. If it's scoped to a particular service or resource tag, you've narrowed your blast radius.

Cross-reference with resource tagging. If your team tags sensitive resources with Environment: production or CostCenter: finance, enumerate all resources with those tags and assume they were touched.

4. Hunt for persistence

The most common persistence mechanism we see in cloud compromises is a new IAM user with console access, hidden in a service account naming convention — something like svc-cloudwatch-monitoring that looks legitimate in a listing. Check for:

New IAM users created after the key's creation date: aws iam list-users
New roles with trust policy modifications: aws iam list-roles and inspect each
Lambda functions created or modified: aws lambda list-functions and check the modified date
SSH public keys added to EC2 instances: aws ec2 describe-key-pairs and cross-check against your deployment records
Modified trust policies on roles: aws iam get-role-policy for each role created after compromise start date

In Azure, check for new app registrations (Application.List.All), conditional access policy changes (Directory read), and new managed identities. In GCP, audit service account key creation and workload identity pool bindings.

5. Log preservation before takedown

Before you blow away resources or change passwords, export everything to an immutable location. CloudTrail events in S3 with object lock, or to a separate AWS account. Your forensics team needs:

Full CloudTrail export (JSON): aws s3 cp s3://cloudtrail-bucket/ ./forensics/ --recursive
VPC Flow Logs for the entire investigation window
Application logs from compromised services (RDS query logs, ELB access logs, Lambda CloudWatch logs)
Snapshots of any EC2 instances that were accessed, and EBS volume snapshots if data exfiltration is suspected

Copy this to a read-only S3 bucket in a separate account, or to a WORM storage service outside AWS. Do this now, before you start deleting things.

6. Cloud-specific persistence patterns

AWS: check Lambda@Edge for persistence in CloudFront distributions (runs on edge locations, hard to audit), CloudFormation drift for infrastructure changes, and Systems Manager Session Manager sessions (are there session history logs for sessions you don't recognize?).

Azure: look for conditional access policy changes (especially allowing legacy auth), new key credentials on app registrations, and enterprise app assignments. Azure invalidates refresh tokens on password reset, but don't rely on password reset alone for incident response — always explicitly revoke all active sessions via Revoke-MgUserSignInSession (Microsoft Graph PowerShell) or the Entra ID portal to ensure complete coverage.

GCP: check Firestore/Datastore access (it's easy to miss in audit logs), cross-account IAM bindings (especially user-impersonation roles), and changes to Organization Policy constraints.

7. Communication and escalation order

Your escalation chain should be: incident commander → CISO/security lead → infrastructure team → legal/comms (if data exposure is suspected). In parallel, notify your cloud provider's abuse team if you suspect external access. AWS, Azure, and GCP all monitor for compromised credentials and will reach out if they detect abuse. Getting ahead of it matters.

Document every step: timestamp, what you found, what you changed, who you told. Screenshots of CloudTrail events and policy outputs. Your forensics and legal teams will need this.

8. Post-incident structural fixes

This runbook stops a compromise in motion. What stops the next one:

Workload identity (IAM roles for service accounts). EC2 instances and containers should not use long-lived access keys at all. Use instance profiles, IRSA (EKS), or Azure managed identities exclusively.
Short-lived credentials. Wherever you must use access keys, rotate them every 90 days. Better: use temporary credentials from STS (AWS), managed identities (Azure), or workload federation (GCP).
Conditional access and MFA. Azure's Conditional Access and AWS's contextual policy enforcement catch anomalous login patterns. Require MFA on all human-initiated console access.
CloudTrail/Activity Log/Audit Log centralization. Logs must be immutable (S3 object lock, Azure immutable storage), in a separate account/subscription, and monitored in real time.
Prevent public key sprawl. Use EC2 Instance Connect via IAM instead of downloadable SSH key pairs. If key pairs are required, rotate them manually via Systems Manager automation on a 90-day cycle — AWS does not provide native automatic rotation for EC2 key pairs.

The short version

Disable the compromised credential immediately, then scope what it could reach via policy simulation. Hunt for new IAM users, roles, and infrastructure created after the key's birth date — persistence is usually hiding in service account naming conventions. Export all logs and snapshots to immutable storage before you delete anything. Check cloud-provider-specific persistence vectors (Lambda@Edge in AWS, refresh tokens in Azure, cross-account bindings in GCP). Document everything for forensics, escalate to your incident commander, and fix the root cause by eliminating long-lived keys, enforcing workload identity, and centralizing logs.

Want a cloud setup where this would not have happened?

AZ-500 and AWS Security Specialty-led. We migrate you off long-lived keys, wire workload identity, and build the conditional access policies that make key compromise a non-event.

Cloud Security service Book a 30-min diagnostic