Model inversion - HITRUST AI Security Assessment and Certification Specification

Model extraction and theft

Prompt injection

Description:

A class of attacks that seeks to reconstruct class representatives from the training data of an AI model, which results in the generation of semantically similar data rather than direct reconstruction of the data (i.e., extraction). (Source: NIST AI 100-2, section 2.4.1)
Machine learning models’ training data could be reconstructed by exploiting the confidence scores that are available via an inference API. By querying the inference API strategically, adversaries can back out potentially private information embedded within the training data. (Source: MITRE ATLAS )
Model inversion (or data reconstruction) occurs when an attacker reconstructs a part of the training set by intensive experimentation during which the input is optimized to maximize indications of confidence level in the output of the model. (Source: OWASP AI Exchange )

Impact:

Can lead to a confidentiality breach of sensitive and/or confidential model training data. Depending on the model, this training data may include personally identifiable information, or other protected data.

Applies to which types of AI models? Predictive (non-generative) machine learning models

Which AI security requirements function against this threat? [?]

Control function: Corrective
- Updating incident response for AI specifics
Control function: Decision support
Control function: Detective
- Log AI system inputs and outputs
- Monitor AI system inputs and outputs
Control function: Directive
- Augment written policies to address AI specificities
- Documentation of AI specifics during system design and development
Control function: Preventative
Control function: Resistive
Control function: Variance reduction

Discussed in which authoritative sources? [?]

Engaging with Artificial Intelligence
Jan. 2024, Australian Signals Directorate’s Australian Cyber Security Centre (ASD’s ACSC)
- Where:
  - Challenges when engaging with AI > 5. Model stealing attack (discusses model inversion)
ISO/IEC TR 24028:2020: Information technology — Artificial intelligence — Overview of trustworthiness in artificial intelligence
2020, © International Standards Organization (ISO)/International Electrotechnical Commission (IEC)
- Where:
  - 8. Vulnerabilities, Risks, and Challenges > 8.3. AI-specific privacy threats > 8.3.4. Model query
Mitigating Artificial Intelligence (AI) Risk: Safety and Security Guidelines for Critical Infrastructure Owners and Operators
April 2024, © Department of Homeland Security (DHS)
- Where:
  - Appendix A: Cross-sector AI risks and mitigation strategies > Risk category: Attacks on AI > Model inversion and extraction
  - Appendix A: Cross-sector AI risks and mitigation strategies > Risk category: Attacks on AI > Loss of data
MITRE ATLAS
2024, © The MITRE Corporation
- Where:
  - AML.T0024.000: Exfiltration via ML Inference API: Infer Training Data Membership
  - AML.T0024.001: Exfiltration via ML Inference API: Invert ML Model
NIST AI 100-2 E2023: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations
Jan. 2024, National Institute of Standards and Technology (NIST)
- Where:
  - 2. Predictive AI Taxonomy > 2.4. Privacy Attacks
Securing Artificial Intelligence (SAI); AI Threat Ontology
2022, © European Telecommunications Standards Institute (ETSI)
- Where:
  - 6. Threat landscape > 6.4. Threat modeling > 6.4.2.4 > Deployment
Securing Machine Learning Algorithms
2021, © European Union Agency for Cybersecurity (ENISA)
- Where:
  - 3. ML Threats and Vulnerabilities > 3.1. Identification of Threats > Oracle

Discussed in which commercial sources? [?]

Databricks AI Security Framework
Sept. 2024, © Databricks
- Where:
  - Risks in AI System Components > Model management 8.4: Model inversion
  - Risks in AI System Components > Model serving – Inference requests 9.2: Model inversion
  - Risks in AI System Components > Model serving – Inference requests 9.5: Infer training data membership
Failure Modes in Machine Learning
Nov. 2022, © Microsoft
- Where:
  - Intentionally-Motivated Failures > Model Inversion
  - Intentionally-Motivated Failures > Membership Inference Attack

Model extraction and theft

Prompt injection