Detect adversarial input

Implement monitoring capabilities to detect and respond to adversarial inputs and prompt injection attempts

Keywords

Monitor

Adversarial

Jailbreak

Prompt Injection

Application

Optional

Frequency

Every 3 months

Type

Detective

Crosswalks

AML-M0003: Model Hardening

AML-M0015: Adversarial Input Detection

AML-M0024: AI Telemetry Logging

AML-M0021: Generative AI Guidelines

Article 15: Accuracy Robustness and Cybersecurity

Article 72: Post-Market Monitoring by Providers and Post-Market Monitoring Plan for High-Risk AI Systems

GOVERN 1.5: Risk monitoring and review

MEASURE 2.4: Production monitoring

MEASURE 2.7: Security and resilience

MEASURE 3.1: Emergent risk tracking

LLM01:25 - Prompt Injection

LLM08:25 - Vector and Embedding Weaknesses

LLM10:25 - Unbounded Consumption

CSA AICM

AIS-08: Input Validation

MDS-07: Robustness against Adversarial Attack / Model Hardening

TVM-01: Threat and Vulnerability Management Policy and Procedures

TVM-04: Detection Updates

UEM-09: Anti-Malware Detection and Prevention

Control activities

Establishing detection and alerting. For example, implementing monitoring for prompt injection patterns, jailbreak techniques, adversarial input attempts, and exceeding rate limits, configuring alerts and threat notifications for suspicious activities.

Implementing incident logging and response procedures. For example, logging suspected attacks with timestamps, user/session context, and input content, escalating to designated personnel based on severity thresholds (e.g. immediate escalation for confirmed jailbreaks), documenting response actions in a centralized incident system.

Maintaining detection effectiveness through quarterly reviews. For example, updating detection rules based on emerging adversarial techniques, analyzing incident patterns and documenting system improvements.

Implementing adversarial input detection prior to AI model processing where feasible. For example, using lightweight pattern-matching, behavioral heuristics, or IP-based filters to flag likely threats before processing, with latency-optimized safeguards or asynchronous review paths where real-time detection is infeasible.

Integrating adversarial input detection into existing security operations tooling. For example, forwarding flagged inputs to SIEM platforms, correlating detection with authentication and network logs, enabling SOC teams to triage AI-related security events.

Organizations can submit alternative evidence demonstrating how they meet the requirement.

AIUC-1 is built with industry leaders

"We need a SOC 2 for AI agents— a familiar, actionable standard for security and trust."

Phil Venables

Former CISO of Google Cloud

"Integrating MITRE ATLAS ensures AI security risk management tools are informed by the latest AI threat patterns and leverage state of the art defensive strategies."