Description:

  • A class of attacks that seeks to reconstruct class representatives from the training data of an AI model, which results in the generation of semantically similar data rather than direct reconstruction of the data (i.e., extraction). (Source: NIST AI 100-2, section 2.4.1)
  • Machine learning models’ training data could be reconstructed by exploiting the confidence scores that are available via an inference API. By querying the inference API strategically, adversaries can back out potentially private information embedded within the training data. (Source: MITRE ATLAS )
  • Model inversion (or data reconstruction) occurs when an attacker reconstructs a part of the training set by intensive experimentation during which the input is optimized to maximize indications of confidence level in the output of the model. (Source: OWASP AI Exchange )

Impact:

  • Can lead to a confidentiality breach of sensitive and/or confidential model training data. Depending on the model, this training data may include personally identifiable information, or other protected data.

Applies to which types of AI models? Predictive (non-generative) machine learning models

Which AI security requirements function against this threat? [?]
Discussed in which authoritative sources? [?]
Discussed in which commercial sources? [?]