AIUC-1
B002

Detect adversarial input

Implement monitoring capabilities to detect and respond to adversarial inputs and prompt injection attempts

Keywords
Monitor
Adversarial
Jailbreak
Prompt Injection
Application
Optional
Frequency
Every 3 months
Type
Detective
Crosswalks
AML-M0003: Model Hardening
AML-M0015: Adversarial Input Detection
AML-M0024: AI Telemetry Logging
AML-M0021: Generative AI Guidelines
Article 15: Accuracy Robustness and Cybersecurity
Article 72: Post-Market Monitoring by Providers and Post-Market Monitoring Plan for High-Risk AI Systems
A.9.4: Intended use of the AI system
GOVERN 1.5: Risk monitoring and review
MEASURE 2.4: Production monitoring
MEASURE 2.7: Security and resilience
MEASURE 3.1: Emergent risk tracking
LLM01:25 - Prompt Injection
LLM08:25 - Vector and Embedding Weaknesses
LLM10:25 - Unbounded Consumption

Control activities

Establishing a taxonomy for adversarial risks. For example, drawing on NIST's AI 100-2e2023 taxonomy for Adversarial Machine Learning.

Establishing detection and alerting. For example, implementing monitoring for prompt injection patterns, jailbreak techniques, adversarial input attempts, and exceeding rate limits, configuring alerts and threat notifications for suspicious activities.

Implementing incident logging and response procedures. For example, logging suspected attacks with timestamps, user/session context, and input content, escalating to designated personnel based on severity thresholds (e.g. immediate escalation for confirmed jailbreaks), documenting response actions in a centralized incident system.

Maintaining detection effectiveness through quarterly reviews. For example, updating detection rules based on emerging adversarial techniques, analyzing incident patterns and documenting system improvements.

Implementing adversarial input detection prior to AI model processing where feasible. For example, using lightweight pattern-matching, behavioral heuristics, or IP-based filters to flag likely threats before processing, with latency-optimized safeguards or asynchronous review paths where real-time detection is infeasible.

Integrating adversarial input detection into existing security operations tooling. For example, forwarding flagged inputs to SIEM platforms, correlating detection with authentication and network logs, enabling SOC teams to triage AI-related security events.

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

Phil Venables

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Google Cloud
Phil Venables
Former CISO of Google Cloud
Dr. Christina Liaghati

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."

MITRE
Dr. Christina Liaghati
MITRE ATLAS lead
Hyrum Anderson

"Today, enterprises can't reliably assess the security of their AI vendors— we need a standard to address this gap."

Cisco
Hyrum Anderson
Senior Director, Security & AI
Prof. Sanmi Koyejo

"Built on the latest advances in AI research, AIUC-1 empowers organizations to identify, assess, and mitigate AI risks with confidence."

Stanford
Prof. Sanmi Koyejo
Lead for Stanford Trustworthy AI Research
John Bautista

"AIUC-1 standardizes how AI is adopted. That's powerful."

Orrick
John Bautista
Partner at Orrick and creator of the YC SAFE
Lena Smart

"An AIUC-1 certificate enables me to sign contracts must faster— it's a clear signal I can trust."

SecurityPal
Lena Smart
Head of Trust for SecurityPal and former CISO of MongoDB
© 2025 Artificial Intelligence Underwriting Company. All rights reserved.