Prompt injection - HITRUST AI Security Assessment and Certification Specification

Model inversion

TOPIC: Poisoning attacks

Description:

When an adversary crafts malicious user prompts as generative AI inputs that cause the AI system to act in unintended ways. These “prompt injections” are often designed to cause the model to bypass its original instructions and follow the adversary’s instructions instead.

Impact:

The impact of a successful prompt injection attack can vary greatly, depending on the context. Some prompt injection attacks attempt to cause the system to disclose confidential and/or sensitive information. For example, prompt extraction attacks aim to divulge the system prompt or other information in an LLMs context that would nominally be hidden from a user.

Applies to which types of AI models? Generative AI specifically

Which AI security requirements function against this threat? [?]

Control function: Corrective
- Updating incident response for AI specifics
Control function: Decision support
Control function: Detective
- Log AI system inputs and outputs
- Monitor AI system inputs and outputs
Control function: Directive
- Augment written policies to address AI specificities
- Documentation of AI specifics during system design and development
Control function: Preventative
- Input filtering
- Restrict access to interact with the AI model
Control function: Resistive
- Limit the release of technical info about the AI system
- Model Rate Limiting / Throttling
Control function: Variance reduction

Discussed in which authoritative sources? [?]

CSA Large Language Model (LLM) Threats Taxonomy
2024, © Cloud Security Alliance
- Where: 4. LLM Service Threat Categories > 4.1. Model Manipulation
Engaging with Artificial Intelligence
Jan. 2024, Australian Signals Directorate’s Australian Cyber Security Centre (ASD’s ACSC)
- Where: Challenges when engaging with AI > 2. Input manipulation attacks – Prompt injection and adversarial examples
Mitigating Artificial Intelligence (AI) Risk: Safety and Security Guidelines for Critical Infrastructure Owners and Operators
April 2024, © Department of Homeland Security (DHS)
- Where: Appendix A: Cross-sector AI risks and mitigation strategies > Risk category: AI design and implementation failures > Adversarial manipulation of AI algorithms or data
MITRE ATLAS
2024, © The MITRE Corporation
- Where:
Multilayer Framework for Good Cybersecurity Practices for AI
2023, © European Union Agency for Cybersecurity (ENISA)
- Where: 2. Framework for good cybersecurity practices for AI > 2.2. Layer II – AI fundamentals and cybersecurity > Model or data disclosure
NIST AI 100-2 E2023: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations
Jan. 2024, National Institute of Standards and Technology (NIST)
- Where:
  - 3. Generative AI Taxonomy > 3.3. Direct Prompt Injection Attacks and Mitigations
  - 3. Generative AI Taxonomy > 3.4. Indirect Prompt Injection Attacks and Mitigations
OWASP 2023 Top 10 for LLM Applications
Oct. 2023, © The OWASP Foundation
- Where: LLM01: Prompt injection
OWASP 2025 Top 10 for LLM Applications
2025, © The OWASP Foundation
- Where: LLM01: Prompt injection
OWASP AI Exchange
2024, © The OWASP Foundation
- Where: 2.2. Prompt injection

Discussed in which commercial sources? [?]

Databricks AI Security Framework
Sept. 2024, © Databricks
- Where:
  - Risks in AI System Components > Model serving – Inference requests 9.1: Prompt inject
  - Risks in AI System Components > Model serving – Inference requests 9.3: Model breakout
  - Risks in AI System Components > Model serving – Inference requests 9.9: Input resource control
HiddenLayer’s 2024 AI Threat Landscape Report
2024, © HiddenLayer
- Where: Part 2: Risks faced by AI-based systems > Prompt injection
StackAware AI Security Reference
2024, © StackAware
- Where:
  - AI Risks > Direct Prompt Injection
  - AI Risks > Indirect Prompt Injection

Additional information

The AI Prompt Injection Attacks portion of the StackAware AI Security Reference discusses several prompt injection techniques and attacker goals in detail.

Model inversion

TOPIC: Poisoning attacks