Case studies · anonymised findingsFILE NO. 008 / FIELD REPORTS

Findings,
in plain
English.

Anonymised excerpts from real Technical Assessment Reports — 600+ tests, 7 adaptive iterations, 100% attack success rate where the vulnerability was confirmed. Names changed; the engineering, the evidence and the remediation are not.

Tests executed600+ per engagement
Adaptive depthUp to 7 iterations
FrameworksEU · ISO · NIST · OWASP
IdentityNDA · anonymised
02 / MethodologyWhy static testing misses these

Static tests find the door. Adaptive tests walk through it.

Conventional AI red-teaming is single-shot. We continue past first refusal — escalating, recombining and persisting — exactly the way a real attacker does.

A · Conventional testing

One shot. One verdict.

Single attempt per vulnerability. No follow-up after initial refusal. Manual engagement at $15K–50K and weeks of calendar time. Coverage typically 100–200 tests. Point-in-time; no retest. Behavioural depth: shallow.

Where most audits stop
B · TestMy.AI adaptive

600+ tests. Then we push.

600+ static tests, plus adaptive escalation on every failure. Progressive iteration mimics real attacker behaviour. 5–14 business days, $3.5K–$9.5K. One retest within 30 days included. Full OWASP LLM Top 10, ISO 42001, NIST AI RMF and EU AI Act coverage.

Where the catastrophic findings live
Begin · Most systems have at least one critical

Could your endpoint survive seven iterations?

Most AI endpoints we test ship with at least one CRITICAL vulnerability. An independent assessment finds yours before an external attacker does — and gives you the evidence to file.