Isn't this just prompt engineering?

No. Prompt engineering is about making the model do what you want. AI red teaming is about making the model do what an attacker wants, even when the developer tried to prevent it. It's adversarial testing, not capability testing.

We use a hosted model (OpenAI, Anthropic, Gemini). Doesn't that mean we're safe?

The foundation model vendor handles some risks — jailbreak filtering, training-data safety — but your application layer is still your responsibility. Indirect prompt injection, tool abuse, RAG exfiltration, and auth bypass are all application-layer issues that the vendor can't fix for you.

Do you test open-source models too?

Yes. Open models introduce additional supply-chain risk (fine-tuned weights, unverified model cards, training data provenance) that we'll include in scope.

How do you measure success in an AI red team engagement?

We scope specific abuse objectives up front (e.g. "can we exfiltrate a private document from the RAG index?" or "can we get the agent to call the refund API on an account we don't own?"). Success is measured against those objectives, not vague "safety" metrics.

03 Offensive

AI & LLM Security

Red-teaming for LLM applications, agentic systems, and the APIs they touch. The risks most pentest firms don't test for.

Book a 30-min diagnostic See how we work

What we put our name behind

Model and plumbing, tested together

We test the model and the traditional API/auth layer around it in the same engagement — indirect prompt injection through retrieved documents, tool-use abuse in agents, and data exfiltration via model outputs on one side; authz, rate-limiting, mass-assignment and broken object-level auth on the other. OWASP LLM Top 10 and OWASP API Top 10 in a single report, not two vendors.

Overview

Most traditional pentest firms still don't know how to test a language model. They run the same web scanner against the /chat endpoint and call it done. But the real risks in an LLM application live in places scanners can't see: indirect prompt injection through retrieved documents, tool-use abuse in agentic systems, data exfiltration via model outputs, and the model-supply-chain itself.

We red team LLM applications the way a real adversary would. Direct and indirect prompt injection. Jailbreak chains that bypass system prompts. Data exfiltration through retrieval-augmented generation (RAG) pipelines. Tool abuse in agents that can take real actions. System-prompt extraction. Model supply-chain risks from fine-tuned weights and third-party models. It's the layer of testing that scanners don't touch and that most boards now explicitly ask about.

What's included

Every engagement is senior-led and scoped in writing before kickoff.

Prompt injection testing

Direct prompt injection (user-controlled input) and indirect prompt injection (adversarial content hidden in documents, URLs, tool outputs, or retrieved context). Mapped to OWASP LLM01.

Jailbreak resilience

System-prompt bypass, safety-guardrail evasion, and multi-turn escalation techniques. We'll show you which published jailbreaks work against your deployment and which custom chains do.

Agent & tool-use abuse

For agentic systems: can an attacker trick the agent into calling tools outside the intended scope? Can they chain tool calls to achieve a goal the user never asked for?

RAG & data exfiltration

Testing whether adversarial queries can pull private documents out of the retrieval index, or whether poisoned documents in the index can manipulate downstream responses.

Model supply-chain review

Provenance of fine-tuned weights, third-party model dependencies, training-data contamination risks, and the security posture of the inference infrastructure.

API & backend testing

Every LLM application also has a traditional API and auth layer. We don't stop at the model — we test the plumbing around it with the same rigor as a standard pentest.

What you get

OWASP LLM Top 10 (2025 edition) coverage matrix against your application
Attack narrative with reproducible prompts and payloads
Risk-rated findings with business-impact scoring
Remediation guidance (input filtering, output sanitization, guardrail hardening, architecture changes)
Executive summary framed for non-technical leadership
Verification retest of every critical and high finding, completed within 60 days of report delivery (request by day 45)

Ideal for

Teams shipping LLM features to customers who need assurance beyond "we use OpenAI"
Agentic products that can take real actions on behalf of users
RAG applications handling sensitive or regulated data
Boards asking hard questions about AI risk that generic vendor questionnaires can't answer

Frequently asked

Isn't this just prompt engineering?: No. Prompt engineering is about making the model do what you want. AI red teaming is about making the model do what an attacker wants, even when the developer tried to prevent it. It's adversarial testing, not capability testing.
We use a hosted model (OpenAI, Anthropic, Gemini). Doesn't that mean we're safe?: The foundation model vendor handles some risks — jailbreak filtering, training-data safety — but your application layer is still your responsibility. Indirect prompt injection, tool abuse, RAG exfiltration, and auth bypass are all application-layer issues that the vendor can't fix for you.
Do you test open-source models too?: Yes. Open models introduce additional supply-chain risk (fine-tuned weights, unverified model cards, training data provenance) that we'll include in scope.
How do you measure success in an AI red team engagement?: We scope specific abuse objectives up front (e.g. "can we exfiltrate a private document from the RAG index?" or "can we get the agent to call the refund API on an account we don't own?"). Success is measured against those objectives, not vague "safety" metrics.

Ready to scope a ai & llm security engagement?

A 30-minute call with a senior specialist. Written scope before kickoff. No SDRs.

Book a 30-min diagnostic hello@caaslabs.com