Why threat modeling tutorials usually fail
If you've ever tried to learn threat modeling from a book or a course, you know the format. The instructor walks you through STRIDE on a "sample system" — a coffee shop ordering app, a library catalog, a vending machine. You learn the vocabulary. You practice on the toy. Then you sit down to threat-model your real system, which has thirty services, two clouds, three identity providers, and a RAG pipeline, and the toy walkthrough is no help at all.
The format works for vocabulary but not for skill. Threat modeling is mostly judgment about where to look and what to ignore, and the toy systems don't have anywhere worth looking. So this post is the opposite: a STRIDE walkthrough on a realistic system, with actual trust boundaries and actual decisions about prioritization. We'll build a data flow diagram, enumerate threats, decide what matters, and end with the action list. If you've read other threat modeling content and walked away with the vocabulary but not the muscle memory, this is for you.
1. The system we're modeling
The system is a B2B SaaS application. It has the following components:
- A web frontend (React, served from a CDN).
- A REST API backend (Go service in Kubernetes), with a Postgres database for application state.
- A document upload feature: users upload PDFs which are stored in object storage and processed asynchronously.
- A retrieval-augmented chat feature: the document text is chunked, embedded, and indexed into a vector database. A chat endpoint takes user queries, retrieves relevant chunks, and sends them along with the user's prompt to a hosted LLM API for a response.
- An admin console for company admins to manage users, roles, and billing. Single sign-on via the customer's IdP (SAML).
- Background workers (Sidekiq-equivalent) for the document processing pipeline.
- Hosted on AWS, in a single account, with an EKS cluster for the application workloads.
This is a recognizable shape — a typical mid-stage SaaS with a few interesting wrinkles (file upload, RAG, third-party LLM call). It's complex enough to be worth threat modeling but simple enough to walk through.
2. Step 1: Build the data flow diagram
The first deliverable of any threat model is a diagram. Not a network diagram, not an architecture diagram from your wiki — a data flow diagram (DFD) showing how data moves between components, with trust boundaries marked.
The notation is small. Boxes are external entities. Circles are processes. Open-ended rectangles are data stores. Arrows show data flow. Dashed lines are trust boundaries (places where data crosses from one trust domain to another). That's it.
For our system, the entities are: the end user (browser), the customer admin (browser), the customer's IdP (SAML), and the LLM provider (third-party API). The processes are: the frontend, the API, the document processor worker, the embedding worker, the chat endpoint. The data stores are: Postgres, the object storage bucket, the vector database. The in-context flows are obvious — user uploads file to API, API writes to bucket, processor reads from bucket, etc.
The trust boundaries are where it gets interesting. We draw lines around:
- The browser. Anything the browser sends is untrusted and may be controlled by an attacker.
- The application backend (everything inside our cluster). Trusted internally but exposed to the browser through the API.
- The customer's IdP. Trusted to assert identity, but only the identity it asserts — we can't trust the user's claims about anything else.
- The third-party LLM API. Trusted to process queries, but everything we send leaves our environment, and the responses are influenced by the input we send (which can include untrusted content).
- The object storage bucket. Inside our account, but the contents are user-provided and untrusted.
- The vector index. Same property — the embeddings derive from user-provided documents.
The diagram doesn't have to be pretty. We draw it on a whiteboard for the first pass, then redraw in a tool (draw.io, Lucid, OmniGraffle, even Mermaid in a markdown file) for the final document.
3. Step 2: Apply STRIDE to each element
STRIDE is six categories of threat: Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege. For each element on your diagram, walk the six categories and ask "could this happen here?" Most elements will yield two or three candidate threats per category; some will yield none.
The trick is to be exhaustive in this step and prioritize later. If you start filtering while you enumerate, you'll miss things. Write down everything that's plausible and worry about severity in step 3.
Frontend (browser-side React app)
Spoofing: attacker hosts a clone of our frontend at a similar domain to phish credentials.
Tampering: attacker modifies the JavaScript bundle in transit (mitigated by HTTPS + CDN, but consider supply-chain compromise of bundled dependencies).
Repudiation: client-side actions don't have server-side audit trails; user denies having taken an action.
Information disclosure: sensitive data cached in browser storage; exposed via XSS; leaked in browser history.
Denial of service: the frontend is mostly a CDN problem; threats here are user-experience issues, not security ones.
Elevation of privilege: XSS in user-provided content allows an attacker to act as another user in the same tenant.
API backend
Spoofing: attacker forges a session token; missing or weak JWT verification;
accepting tokens with the none algorithm.
Tampering: request body modification (mass-assignment, parameter pollution, IDOR), modifying records belonging to a different user by changing IDs in the request.
Repudiation: insufficient audit logging on write operations; logs don't include the actor; logs are mutable by the application itself.
Information disclosure: over-broad API responses returning fields the caller shouldn't see; verbose error messages leaking internal state; SQL injection (if parameterized queries aren't used everywhere); SSRF in any endpoint that accepts a URL.
Denial of service: unbounded query parameters causing expensive database scans; missing rate limits; resource-exhaustion via large request bodies.
Elevation of privilege: broken object-level authorization (BOLA) — the endpoint checks that the caller is authenticated but not that they own the resource; privilege escalation via role assignment endpoints; misconfigured RBAC.
Document upload and processing pipeline
Spoofing: not directly applicable.
Tampering: uploaded document type bypass — file extension says PDF but contents are an executable; storage bucket policy allows direct writes outside the API.
Repudiation: uploads not logged with the actor and timestamp.
Information disclosure: uploaded documents accessible to other tenants via predictable bucket key patterns; processed text containing PII not redacted; embeddings leaking content via embedding inversion attacks (low priority but worth noting).
Denial of service: upload of an extremely large document, exhausting worker memory or storage; "zip bomb" style content for any compressed types; PDF parsing exploits crashing the worker.
Elevation of privilege: uploaded PDF contains an exploit for the processing library, leading to RCE in the worker.
RAG chat endpoint
This is where it gets interesting. RAG is a relatively new pattern and the threats are not well-covered by classic STRIDE.
Spoofing: not directly applicable.
Tampering: indirect prompt injection — content in the indexed documents overrides the chat system prompt; vector index poisoning by uploading documents specifically designed to be retrieved for adversarial queries.
Repudiation: chat queries and responses not logged for audit; users deny having asked particular questions.
Information disclosure: retrieval crossing tenant boundaries (user A's chat retrieves chunks from user B's documents); LLM response containing data that should not have been retrievable; LLM training-data extraction (low priority for hosted models).
Denial of service: expensive prompts driving up LLM API costs (a known abuse pattern); vector queries returning enormous result sets.
Elevation of privilege: indirect injection causes the LLM to call a tool or take an action the user didn't authorize; injection causes the LLM to leak system instructions or other tenants' data.
Admin console + SSO
Spoofing: SAML response forgery if the assertion isn't signed and verified properly; replay of intercepted SAML assertions; session fixation in the SAML callback.
Tampering: modification of role assignments via the admin API by non-admin users (the BOLA pattern again).
Repudiation: admin actions (role changes, billing changes) not logged or logged without enough detail.
Information disclosure: the admin console returning more information than the admin should see (e.g., user data from other tenants if the multi-tenancy isolation has bugs).
Denial of service: not the highest concern here.
Elevation of privilege: a regular user finding a path to invoke admin APIs; an admin from one tenant accessing another tenant's data via a shared backend bug; SAML group attributes being trusted without validation.
4. Step 3: Prioritize
A pure STRIDE walkthrough on this system produces about 40 threats. You can't fix all of them, and most of them aren't equally important. The prioritization framework we use has three columns: likelihood (could an attacker actually trigger this?), impact (what happens if they do?), and cost to fix (how much engineering work?).
Score each threat low/medium/high on the three columns. Then sort by impact, then by likelihood, then by inverse cost. The top of the list is what to fix first.
For our system, the threats that bubble to the top are:
- Broken object-level authorization in the API. High impact (cross-tenant data access), high likelihood (this is the most common API bug), low-medium cost.
- Cross-tenant retrieval in the RAG pipeline. High impact (data disclosure), medium likelihood (depends on how retrieval is filtered), low cost (add tenant_id to vector queries).
- Indirect prompt injection driving cross-tenant leakage. High impact, medium-high likelihood, medium cost (architectural change to capability sandboxing).
- Document processing exploits in the parsing library. High impact (RCE in worker), medium likelihood (depends on library), medium cost (sandbox the worker).
- SAML signature verification bypass. High impact (full impersonation), low likelihood if using a mature SAML library, low cost (verify configuration).
- Verbose error messages leaking internal state. Medium impact, medium likelihood, low cost.
- Audit log gaps on admin actions. Medium impact (for compliance and incident response), medium likelihood, low cost.
Notice the structure. The top three are issues that are both high-impact and likely. They go to the top of the engineering backlog. The lower-priority items still need to be tracked but can be scheduled around feature work. The very low-priority items go on a "noted, not addressing" list with rationale.
5. Step 4: Decide on mitigations
For each prioritized threat, you need a mitigation. Mitigations come in four flavors: avoid (remove the feature or capability), transfer (push the risk to a third party who handles it), mitigate (engineering work that reduces likelihood or impact), and accept (acknowledge the risk and document why you're not fixing it).
Worked examples for the top threats:
- BOLA in API: mitigate. Add an authorization layer that verifies the caller has access to the requested resource on every endpoint that takes an ID. Use a policy framework (Casbin, OPA, custom middleware) rather than scattering checks. Audit every endpoint by replaying tests with a different tenant's session and confirming 403s.
- Cross-tenant RAG retrieval: mitigate. Tag every vector with tenant_id at index time. Filter every query by the current user's tenant_id. Add an integration test that submits a query as tenant A and confirms zero results from tenant B's documents. Add a runtime check that fails closed if the filter is missing.
- Indirect prompt injection: mitigate via capability sandboxing (see the prompt injection defense post). The chat endpoint has access to retrieved chunks and the LLM API; nothing else. The LLM cannot call internal tools, cannot fetch external URLs, cannot trigger any side effects. If the model is tricked, the worst it can do is produce a bad response. Wide adoption of the dual-LLM pattern is a future improvement; the architecture change is the immediate one.
- Document processing RCE: mitigate via sandboxing the worker. Run the document processing pod with no network access (NetworkPolicy default-deny except object-storage egress), drop all capabilities, run as non-root, read-only filesystem, seccomp default. If the parser is exploited, the attacker lands inside a container that can read the document and write back the processed text, and not much else.
- SAML verification: mitigate by audit. Use a mature SAML library (passport-saml, SAML2 in Go, etc.) and verify the configuration enforces signature validation, audience restriction, and assertion expiry. Don't write your own SAML handling.
- Verbose errors: mitigate. Centralize error handling. Return a generic error code to the client; log the detailed error server-side.
- Audit logging: mitigate. Add structured audit log entries for every mutating admin action, including actor, target, before/after state, and timestamp. Send to a tamper-evident log store (CloudWatch, ELK, Splunk).
6. Step 5: Write it down
The deliverable of a threat modeling exercise is a document. The format we use:
- System description. One page. What does this system do, who uses it, what data does it handle.
- Data flow diagram. Updated to reflect the discussion.
- Trust boundaries. Listed and described.
- Threats by element. The full STRIDE walk-through, even the low-priority ones, so the list is auditable later.
- Prioritized findings. The top items, with rationale and proposed mitigations.
- Decisions accepted as risk. Things you chose not to fix, and why.
- Action items. Concrete tickets, with owners and rough timelines.
The document is a living artifact. Update it when the architecture changes meaningfully. Re-run the exercise yearly.
7. Step 6: Revisit when the architecture changes
The biggest mistake teams make is treating threat modeling as a one-time event. The model is valid for the architecture you modeled. The architecture changes constantly. New features add new trust boundaries. New integrations create new third parties to model. Refactors merge or split components.
Rather than rerun the full exercise every quarter, run a focused threat model whenever:
- You add a new third-party integration.
- You add a new external entity (a public API, a partner system).
- You change how authentication or authorization works.
- You add or remove a major data store.
- You add an action path the model didn't have before — especially anything that touches money, sends communications, or modifies production state.
A focused threat model on a single change usually takes an hour. The full annual exercise takes a day. Both are cheap relative to the cost of finding the same issues in a pentest report or, worse, in an incident.
The short version
Threat modeling is a finite, repeatable workflow: build a data flow diagram, mark trust boundaries, walk STRIDE on each element, prioritize by likelihood and impact and cost, pick mitigations, write it down. The realistic version takes a day for a system the size of the one we just modeled, plus a few hours of follow-up to convert findings into tickets. The skill is in knowing where to look — which usually means knowing which categories of bug actually matter for your stack, and skipping the ones that don't.
Run it once on your system. The first time will surface things you didn't know were on the table. The second time, six months later, will surface much less, and that's the point.
Want us to threat model your system?
We facilitate the workshop, build the data flow diagrams, run the STRIDE pass, and hand you a prioritized action list your team can ship. Two-day engagement, no fluff.