A published taxonomy. Not a regulation.
One paragraph from the abstract carries the load. The document is a taxonomy and a set of terminology. It is descriptive, not prescriptive. It does not say thou shalt. It says if your AI system is attacked, this is the vocabulary auditors and engineers should be using when you describe what happened and what you did about it.
The lineage matters. NIST AI 100-2 first appeared in 2019 as a draft. The 2023 edition gave it a stable taxonomy for predictive machine learning. The 2024 update extended the taxonomy to generative AI under the same four-class structure. The 2025 final, formally NIST.AI.100-2e2025, consolidated both into a single document.
What it is not. It is not an EU harmonised standard. It is not a NIST-issued conformity scheme. It is not, in itself, a defence in any litigation. What it is, in operational terms, is the closest thing engineering teams currently have to a shared dictionary for AI-specific attacks. That makes it the lowest-friction way to translate Article 15(5) of the EU AI Act into a per-decision evidence pattern.
The taxonomy is organised across three axes. The ML method axis distinguishes predictive AI from generative AI. The life-cycle axis separates training-time attacks from inference-time attacks. The attacker-goal axis names what the attacker is trying to achieve, whether that is integrity violation, availability violation, or privacy violation. The four top-level attack classes sit at the intersection of those axes.
Evasion · adversarial inputs at inference time.
An evasion attack is an inference-time attack on a predictive AI system. The model is already trained. The training pipeline is untouched. The attacker modifies the input so that the model's output is wrong in a way that benefits the attacker.
The canonical example is the adversarial image. A photograph of a stop sign with a precisely calibrated perturbation invisible to humans, classified by a vision model as a speed-limit sign. The mathematics generalises. Tabular features in a credit-scoring model. Tokenised text in a sentiment classifier. Network packets in an intrusion-detection system.
NIST AI 100-2 names the attacker's capability on a three-step ladder. White-box assumes the attacker has the model architecture and weights. Grey-box assumes partial knowledge, often the architecture but not the weights, or a known training corpus. Black-box assumes only query access through the production interface.
The engineering implication for the evidence record is direct. For any predictive decision the agent took, the trace must record the input that produced the decision and a typed indicator of whether that input passed an adversarial-input check. The check itself is layered. Statistical detection on input distribution. Distance from training-set neighbours. Optionally, a model-specific certified-bound check.
What the trace must not claim is immunity. NIST AI 100-2 is explicit that no current defence eliminates evasion. The honest signal is detection coverage with a known false-negative rate, not a binary passed.
Poisoning · training-time attacks.
Poisoning is the training-time counterpart to evasion. The attacker has access to the training data, or to some part of it, or to the pipeline that ingests it. The poisoning is in the corpus, not in the request.
NIST AI 100-2 separates poisoning by attacker goal. Availability poisoning degrades the model's accuracy generally. Integrity poisoning causes incorrect outputs for specific targeted inputs while leaving general accuracy intact. Backdoor attacks install a hidden trigger pattern such that any input carrying the trigger is misclassified to an attacker-chosen label.
Two capability levels. Full training-set access assumes the attacker controls or substantially modifies the corpus. Partial access, more realistic in 2026 supply chains, assumes the attacker contaminates a subset, perhaps a few percent of an open dataset, perhaps a single internet source that gets scraped.
The evidence pattern for poisoning is upstream of the per-decision trace. It lives in metadata about the model, not the request. Training-data provenance, dataset hashes, source attestation for fine-tune corpora, the integrity of any retrieval-augmented index. NIST AI 100-2 does not prescribe the artefacts. It names the class so that auditors can ask the right question.
For a 2026 generative-AI deployment, the operational reality is that almost no provider can prove the absence of poisoning in a foundation model. The defensible posture is documented provenance for everything inside the deployer's control, and a contractual chain of attestations for everything outside it. That is the cybersecurity posture Article 15(5) asks for, read alongside the technical documentation under Annex IV.
Privacy · extracting from the model.
The third class is privacy attacks. The attacker is not trying to misclassify an input or corrupt the training pipeline. The attacker is trying to extract information about the training data, the model parameters, or the individuals whose data was used to train.
NIST AI 100-2 names four sub-types. Membership inference determines whether a specific record was in the training set. Attribute inference recovers sensitive attributes of training records. Model inversion reconstructs representative training examples from model outputs. Training-data extraction, the strongest, recovers literal records from generative models that have memorised them.
Capability ranges from API-only access through full model-weights access. Differential privacy is the only mitigation in the taxonomy that offers a formal mathematical guarantee, at a measurable cost in utility. Everything else is empirical hardening: query rate-limiting, output filtering, post-hoc memorisation audits.
For the per-decision record, the evidence field is whether the action interacted with a privacy-sensitive surface, and if so, which differential-privacy or output-filtering control was active. The artefact does not claim the model is private. It records what privacy posture was in force at the time of the decision.
Abuse · the class the 2024 revision added.
The fourth class is the one that justifies the 2024 update and the 2025 final. The original 2023 taxonomy did not have an Abuse category. With generative AI in widespread production, the attacker stopped trying to break the model and started trying to instruct it.
Abuse covers adversarial use of generative AI systems. The model is functioning correctly in the predictive-AI sense. The vulnerability is that functioning correctly for a chat or agent system means doing what the input asked. When the input is hostile, the model executes the hostile instruction. The threat surface is the prompt.
Three named patterns sit inside Abuse. Direct prompt injection is an instruction in the user prompt that overrides the system prompt. Indirect prompt injection is the same instruction delivered through content the model reads from another source, a retrieved document, a tool's return value, an email body, a webpage. Jailbreak is a stylised prompt that circumvents the model's safety training to produce content the operator did not authorise.
The capability story differs from predictive AI. Black-box is the default. The attacker rarely needs weights. They need the prompt surface, plus, increasingly, any path through which untrusted content reaches the model. For a retrieval-augmented agent, every retrieval source is a prompt-injection vector. For a tool-using agent, every tool's return value is.
The mitigation story is empirical and unsettled. System-prompt hardening, instruction-tuning for resistance, content classifiers on inputs, content classifiers on outputs, structural separation between trusted system instructions and untrusted user or retrieved content. NIST AI 100-2 enumerates the techniques without ranking them. The 2026 honest engineering answer is layered defence with structured per-decision evidence of which defences were in force.
Mitigation taxonomy · what defences map to what attacks.
NIST AI 100-2 pairs the attack taxonomy with a mitigation taxonomy. Three observations matter for engineering teams.
First. Mitigations have capability and knowledge requirements of their own. Adversarial training requires retraining the model on adversarially perturbed examples. Certified defences require model architectures amenable to formal bounds and impose accuracy costs. Differential privacy requires bounded privacy budgets and reduces utility. Input sanitisation requires distributional knowledge of legitimate inputs. Output filtering requires a classifier downstream of the model.
Second. No mitigation generalises across all four attack classes. A defence that hardens evasion may have no effect on poisoning. A defence that mitigates membership inference may be irrelevant to prompt injection. The taxonomy is explicit on this point.
Third, and most useful for an attestation discipline. The mitigation taxonomy is the natural home for per-decision evidence. The audit question is rarely do you defend against evasion. The audit question is which evasion defence was in force when this decision was taken on this person on this date, and what did it produce. That maps to a structured field in a trace, not a marketing claim in a product page.
Article 15(5) · the regulator's sentence.
One sentence. Two operative phrases. The first, attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities, is the threat surface. The second, appropriate to the relevant circumstances and the risks, is the proportionality test the regulator will apply when the provider's defence is challenged.
The Regulation does not enumerate the system vulnerabilities it has in mind. The standards body cannot enumerate them either, because the literature evolves quarterly. So the Regulation leaves the question open and points to harmonised standards under Articles 40 and 41 to fill in the technical content over time. Until those harmonised standards are published, providers and notified bodies fall back on the recognised state of the art.
NIST AI 100-2 is one widely recognised statement of that state of the art for AI-specific threats. It is not cited by name in the Regulation. It is not declared by the Commission to confer presumption of conformity. It is, in 2026, the most defensible single document an engineering team can point to when an auditor asks what system vulnerabilities they have in fact considered.
The other side of the bridge is the recital. Recital 76 of the Regulation specifies that cybersecurity for high-risk AI systems includes data poisoning, adversarial examples, model evasion, confidentiality attacks and model flaws. That list sits inside the NIST four-class taxonomy almost without translation. For the wider Article 15 obligation (accuracy, robustness, and cybersecurity together), see the Article 15 reading filed alongside. For the application-layer companion taxonomy, see the OWASP LLM Top 10.
Where Warrant maps NIST AI 100-2 into the trace.
Warrant turns the four-class taxonomy into four typed evidence fields per action. The mapping is mechanical and the field names are stable across deployments.
The fields are not assertions of immunity. They are records of what was checked, with what detector, against what threshold, with what result. An auditor reading the trace can verify that the four NIST classes were considered for the decision in front of them, and can read the false-negative posture out of the record itself.
Cross-reference · NIST taxonomy versus OWASP LLM Top 10.
The OWASP LLM Top 10 is the parallel artefact most engineering teams know by name. The two documents serve different functions. NIST AI 100-2 is the taxonomy. OWASP LLM is the prioritised practitioner list.
The overlap is partial and non-controversial. OWASP LLM01 prompt injection sits squarely inside NIST Abuse. OWASP LLM02 insecure output handling intersects NIST Abuse and, where outputs feed downstream classifiers, NIST Evasion. OWASP LLM03 training-data poisoning is a one-to-one with NIST Poisoning. OWASP LLM06 sensitive information disclosure maps to NIST Privacy.
What OWASP adds that NIST does not, and vice versa, is a useful filter. OWASP is more applied: each item is a category of finding an engineer can fix in code or configuration this quarter. NIST is more structural: each class is a frame an auditor can apply across an entire system. The compliant 2026 posture cites both.
The Article 15(5) auditor, on a current reading, will accept either as evidence the team considered AI-specific threats systematically. A team that cites neither will be asked which published reference they did consider. The answer cannot be silence.
Questions a security officer asks first.
Read the source directly.
- NIST AI 100-2e2025 · Adversarial Machine Learning · Taxonomy and Terminology of Attacks and Mitigations · PDF
- NIST CSRC publication landing page · NIST.AI.100-2e2025
- ISO/IEC 27090 · Cybersecurity · Guidance for addressing security threats to AI systems
- Regulation (EU) 2024/1689 · Article 15 · EUR-Lex CELEX:32024R1689
- OWASP Top 10 for Large Language Model Applications
- Article 15 · the regulator-side reading filed alongside
Authored by Warrant Engineering, the engineering function at Warrant. [email protected]. Editorial commentary on a published technical taxonomy. Not legal advice. The publication identifier NIST.AI.100-2e2025 and the title Adversarial Machine Learning · A Taxonomy and Terminology of Attacks and Mitigations reflect the NIST CSRC final release of 24 March 2025. Where the taxonomy text is paraphrased or summarised, the canonical source is the PDF linked above.