Independent
tests. Half of
them open.
130 foundational tests published on GitHub under CC BY-SA 4.0 — vetted in the wild, used in education, aligned to the OWASP LLM Top 10. 380 proprietary tests ship inside paid engagements where adaptive escalation and adversarial chaining belong.
The baseline, in public.
Foundational OWASP LLM Top 10 coverage — the same tests that anchor every paid engagement. Vetted by the community, used by educators, free to read and free to run.
130 foundational tests, every category.
Baseline coverage of the OWASP LLM Top 10. Markdown definitions with expected-behaviour indicators, severity scoring guidance, and usage notes for manual testing. Designed for transparency and education first.
Trust through transparency.
OWASP CC BY-SA 4.0 compliance for the baseline. Community trust through public methodology. And a fair line: the foundational layer is open; the adaptive layer is what you hire us for.
Markdown definitions
Each test ships as a single Markdown file — attack prompt, expected refusal, expected vulnerable response, OWASP category, severity guidance. Readable in any editor.
Severity guides
Calibration notes for each OWASP category — what HIGH looks like, what CRITICAL looks like, and the line between them. The same scoring rubric we use internally.
OWASP-aligned
Based directly on the OWASP LLM Top 10 (2025). The same categories your security team is already reading.
The adaptive layer. Where the catastrophic findings live.
Static tests find the door. Adaptive tests walk through it. The 380 proprietary tests cover the chains, encodings, and multi-turn escalations that real attackers actually use — and that are not safe to ship publicly.
Adversarial chains are dual-use.
A working multi-turn jailbreak, an indirect-injection vector that survives RAG retrieval, an agent-tool exploitation chain — these are operational weapons as much as test cases. They live inside paid engagements where they reach defenders, not attackers.
Adaptive engine
The escalation logic that generated CS-001 from a single HIGH refusal to a CATASTROPHIC trigger word. Iterative attack generation, not single-shot.
Agentic chains
Multi-step tool abuse, MCP exploitation, RAG poisoning, credential harvesting, agent impersonation — 40+ chain tests built for real production architectures.
Encoded & evasive
Encoded payloads, obfuscated prompts, multi-turn baiting, persona drift — every published jailbreak family plus the adaptive variants the static set won't catch.
Open a PR. Or open an engagement.
Two ways to contribute: extend the open suite on GitHub, or run a paid engagement and let your remediated findings feed back into the methodology (anonymised, with your consent).
Pull requests welcome.
Issues, new test cases, severity refinements, framework cross-references — all welcome under CC BY-SA 4.0. Read the contribution guide on the repo before opening a PR.
Audit-grade evidence.
Want the 130 open tests plus the 380 adaptive ones, with forensic-grade evidence and cross-framework mapping? That's an engagement.
Run the open tests. Then hire us for the rest.
The 130-test community edition is yours, today, on GitHub. The 380-test adaptive engine ships with your audit — evidence-grade, cross-framework, signed.