EU AI Act Article 15: accuracy and robustness, line by line

01 · § 1 · DESIGN OBLIGATION

Designed in. Held over the lifecycle.

High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle. Regulation (EU) 2024/1689 · Article 15(1) · 13 June 2024

One sentence does the structural work for the entire article. It binds the system as designed, not the operator's process. It uses the operative shall. It scopes the holding horizon to the lifecycle of the system, not the date of placement on the market.

The three nouns are not synonyms. Accuracy attaches to the predictive or decisional output of the system against the task it was designed for. Robustness, in the statutory sense, attaches to fault tolerance and resilience under perturbation, error, fault, and inconsistency · the article unpacks the meaning across paragraphs (4) and (5). Cybersecurity attaches to resistance against unauthorised third parties altering use, outputs, or performance through exploitation of system vulnerabilities. Three pillars, one design discipline.

The phrase perform consistently in those respects throughout their lifecycle is the load-bearing piece. It rules out the deployment pattern in which a system is benchmarked at conformity assessment, placed on the market, and never re-evaluated. The standard is held continuously. The evidence required is therefore continuous.

"Three pillars. One design discipline. Held continuously over the lifecycle."Warrant Compliance · 2026-05-11

Placement matters. Article 15 sits inside Section 2 of Chapter III · the requirements the provider must satisfy before placing the system on the Union market. Article 16(a) reads those requirements back as a provider obligation. Article 99(4)(a) reads Article 16 back as a fineable failure at EUR 15 million or 3 percent of global turnover. The route from one sentence in Article 15(1) to the EUR 15 million ceiling is three statutory steps.

02 · § 2 · COMMISSION BENCHMARKS

Benchmarks and measurement methodologies.

To address the technical aspects of how to measure the appropriate levels of accuracy and robustness set out in paragraph 1 and any other relevant performance metrics, the Commission shall, in cooperation with relevant stakeholders and organisations such as metrology and benchmarking authorities, encourage, as appropriate, the development of benchmarks and measurement methodologies. Regulation (EU) 2024/1689 · Article 15(2) · 13 June 2024

Paragraph (2) is a delegation of the technical-method question to a slower, separate track. The verb is encourage, not adopt. The Commission does not, in this paragraph, fix the measurement floor by regulation. It instructs itself to coordinate the development of benchmarks and measurement methodologies with the bodies most likely to produce them.

Three institutions are operative in practice. The Joint Research Centre is the Commission's in-house science service and the natural anchor for AI benchmark work. ENISA, the European Union Agency for Cybersecurity, attaches to the cybersecurity dimension of paragraph (5). CEN and CENELEC, through Joint Technical Committee 21 on Artificial Intelligence, are the European standardisation organisations executing the Commission's standardisation request of 22 May 2023, which includes accuracy, robustness, and cybersecurity within scope.

Until a harmonised standard is published in the Official Journal of the European Union and triggers the presumption of conformity under Article 40, the operational picture is this · the provider declares the level, justifies the methodology, records the test set, and accepts that a regulator may, on review, find the methodology insufficient. The benchmark track is the long game. The Article 13 declaration is what binds today.

03 · § 3 · IFU DECLARATION

Accuracy declared in the instructions for use.

The levels of accuracy and the relevant accuracy metrics of high-risk AI systems shall be declared in the accompanying instructions of use. Regulation (EU) 2024/1689 · Article 15(3) · 13 June 2024

Paragraph (3) is short and operationally heavy. It binds the provider to declare both the levels and the metrics. Both. A declaration that the system achieves high accuracy without identifying the metric is non-compliant. A declaration that the metric is F1 without specifying the achieved level is non-compliant. The pair is the obligation.

The declaration sits inside the instructions for use, governed by Article 13. Article 13(3)(b)(ii) names the cross-reference directly · the instructions shall contain, where appropriate, the level of accuracy, including its metrics, robustness and cybersecurity referred to in Article 15 against which the high-risk AI system has been tested and validated and which can be expected, and any known and foreseeable circumstances that may have an impact on that expected level.

Two operational consequences follow. First, the declaration anchors what an Annex III deployer is entitled to expect. Second, it is the document a national competent authority will read first under Article 21. If the IFU declaration says 0.91 F1 against test set X under conditions Y, the regulator will ask for the test set, the conditions, and the production performance against the same metric.

04 · § 4 · RESILIENCE

Resilience against errors, faults, inconsistencies.

High-risk AI systems shall be as resilient as possible regarding errors, faults or inconsistencies that may occur within the system or the environment in which the system operates, in particular due to their interaction with natural persons or other systems. Technical and organisational measures shall be taken in this regard. The robustness of high-risk AI systems may be achieved through technical redundancy solutions, which may include backup or fail-safe plans. Regulation (EU) 2024/1689 · Article 15(4), first sentences · 13 June 2024

Paragraph (4) does the substantive work behind the word robustness in paragraph (1). The standard is as resilient as possible. The qualifier as possible is not an excuse. It is a proportionality test against the state of the art and the intended purpose of the system. The provider that deploys an architecture known to fail under a class of perturbation, without engineering against that class, is not as resilient as possible.

The text scopes the perturbation surface deliberately wide. Errors, faults or inconsistencies covers the internal failure modes of the system itself. The environment in which the system operates covers infrastructure faults, upstream data-source disruption, and clock skew. Interaction with natural persons or other systems covers the cases that matter most in 2026 · the agentic loop, the tool-use chain, the human-in-the-loop override.

The mechanism is mixed. Technical and organisational measures is the same construction the GDPR uses · the obligation is not satisfied by code alone, and not satisfied by policy alone. The article then names two technical patterns explicitly · technical redundancy solutions and backup or fail-safe plans. Either pattern, where the architecture supports it, is a positive evidence of compliance. Neither is mandatory in every case. The choice is the provider's, the justification is recorded.

04b · § 4 · FEEDBACK LOOPS

Continuously learning systems and feedback loops.

High-risk AI systems that continue to learn after being placed on the market or put into service shall be developed in such a way as to eliminate or reduce as far as possible the risk of possibly biased outputs influencing input for future operations (feedback loops), and as to ensure that any such feedback loops are duly addressed with appropriate mitigation measures. Regulation (EU) 2024/1689 · Article 15(4), final sentence · 13 June 2024

This sentence sits inside paragraph (4). It addresses the deployment pattern in which the system continues to learn after being placed on the market · online learning, continuous fine-tuning on production data, retrieval-augmented generation pipelines that update an index from production traffic.

The drafters did not ban the pattern. They named the failure mode they want addressed · possibly biased outputs influencing input for future operations. The technical literature calls this concept drift, model collapse on synthetic data, or, in the specific case where production outputs become training inputs, a feedback loop. The statute uses the latter term in parentheses, which fixes the meaning.

The standard is the same proportionality construction as elsewhere in the article. Eliminate or reduce as far as possible. Plus a process step · any such feedback loops are duly addressed with appropriate mitigation measures. The mitigation regime that satisfies in practice has four parts. Drift detection on a held-out reference distribution. Sampling of production outputs against ground-truth labels. Retraining gates that block updates if a fairness or accuracy metric regresses. A documented bias control regime that reads against Article 10(2)(f) and Article 10(2)(g) on appropriate measures to detect, prevent, and mitigate biases.

05 · § 5 · CYBERSECURITY

Resilience against unauthorised third parties.

High-risk AI systems shall be resilient against attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities. The technical solutions aiming to ensure the cybersecurity of high-risk AI systems shall be appropriate to the relevant circumstances and the risks. The technical solutions to address AI specific vulnerabilities shall include, where appropriate, measures to prevent, detect, respond to, resolve and control for attacks trying to manipulate the training data set (data poisoning), or pre-trained components used in training (model poisoning), inputs designed to cause the AI model to make a mistake (adversarial examples or model evasion), confidentiality attacks or model flaws. Regulation (EU) 2024/1689 · Article 15(5) · 13 June 2024

Paragraph (5) is the cybersecurity paragraph. The first sentence sets the standard · resilient against attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities. The second sets the proportionality test · appropriate to the relevant circumstances and the risks. The third names the AI-specific threat surface that must be considered where appropriate.

Five threat classes are named in the text · data poisoning, model poisoning, adversarial examples or model evasion, confidentiality attacks, and model flaws. The list is non-exhaustive. The phrase shall include, where appropriate makes the listed threats presumptive while leaving room for additional threats the system's threat model surfaces.

This is the only place in the AI Act where the AI-specific cybersecurity threat surface is enumerated as statute. Every Article 15(5) compliance file should map against the named five.

06 · THREAT SURFACE

What Article 15(5) actually covers.

The five named threat classes are not novel terms invented by the drafters. Each maps to a body of academic and standards work that pre-dates the regulation.

DATA POISONING

Manipulation of the training data set so the resulting model misbehaves at inference. Targeted variants insert backdoors triggered by a specific input pattern. Untargeted variants degrade overall accuracy. REFERENCE · NIST AI 100-2 (Adversarial Machine Learning · Taxonomy and Terminology), poisoning attacks section.

MODEL POISONING

Manipulation of pre-trained components used in training · public checkpoints, fine-tuning datasets, third-party adapters. The supply-chain analogue of dependency confusion. REFERENCE · OWASP LLM Top 10, LLM05 supply chain vulnerabilities. CRA Article 13 cybersecurity requirements for the production process.

EVASION

Inputs designed to cause the model to make a mistake at inference. Adversarial examples in vision. Prompt injection in language models. Jailbreaks against safety guardrails. REFERENCE · NIST AI 100-2 evasion attacks. OWASP LLM01 prompt injection. The literature line from Goodfellow et al. 2014 forward.

CONFIDENTIALITY

Attacks that recover protected information about the training data, the model parameters, or other users' inputs. Model inversion. Membership inference. Training-data extraction from generative models. REFERENCE · NIST AI 100-2 privacy attacks. OWASP LLM06 sensitive information disclosure. GDPR data-protection-by-design in design and by default obligations.

MODEL FLAWS

Defects in model behaviour exploitable by an adversary at runtime. Hallucinated tool calls in agentic systems. Reward hacking. Specification gaming. Off-distribution behaviour. REFERENCE · OWASP LLM Top 10 LLM02 insecure output handling, LLM07 system prompt leakage, LLM09 misinformation. The Concrete Problems in AI Safety literature.

Two reference texts are operative for any Article 15(5) compliance file. The OWASP Top 10 for Large Language Model Applications enumerates the LLM-specific risks the cybersecurity solutions must consider. NIST AI 100-2, the Adversarial Machine Learning taxonomy, provides the formal vocabulary for evasion, poisoning, privacy, and abuse attacks. Neither is named in Article 15. Both are the texts a regulator is most likely to read alongside the file.

07 · ACCURACY IN PRACTICE

What gets declared in the IFU.

The Article 15(3) declaration is short on its face and demanding in practice. The declaration that satisfies a reviewing authority has four components.

First, the metric. The metric must be the metric appropriate to the task. A binary classification system declared with accuracy on a class-imbalanced production distribution will be challenged. Precision, recall, F1, area under the receiver operating characteristic curve, calibration error · the choice is the provider's, the justification reads against the intended purpose under Article 13(3)(b)(i).

Second, the level. A point estimate alone is insufficient where the test methodology supports an interval. A 95 percent confidence interval, a per-population stratification, or a per-confidence-band breakdown is the operative declaration in most cases.

Third, the test methodology. The test set, the holdout strategy, the data freshness, the population covered. The Annex IV technical documentation under Article 11 is where the methodology lives in detail. The IFU declaration cross-references it.

Fourth, the conditions under which the declared level holds, and the conditions under which it does not. Article 13(3)(b)(ii) names the requirement explicitly · any known and foreseeable circumstances that may have an impact on that expected level. An IFU declaration that does not name the conditions under which performance degrades is incomplete by the statute's own terms.

08 · RESILIENCE IN PRACTICE

Fail-safe and redundancy patterns.

Paragraph (4) names technical redundancy solutions, backup, and fail-safe plans as positive examples. Four patterns satisfy in practice, individually or in combination.

GRACEFUL DEGRADATION

Under partial failure, the system continues to operate at reduced capability rather than fail outright. A recommendation system falls back to a baseline ranker. A scoring model falls back to a known-good prior version.

HOT FAIL-OVER

A redundant inference path is held warm and engages on detection of primary-path failure. The detection-to-failover latency budget is part of the design specification recorded in the technical file.

WATCHDOG TIMERS

Per-action timeouts that bound the maximum time a single inference can hold a thread, a tool, or a downstream side effect. The watchdog firing is a logged event under Article 12(2)(a).

OOD DETECTION

Out-of-distribution detection on the input before inference, on the output after inference, or both. Inputs outside the validated distribution route to a documented fall-back path · human review, refusal, or a baseline model.

The pattern that fails in practice is the absence of any of the four · the system that has no graceful degradation, no fail-over, no watchdog, no out-of-distribution check. That system is exposed under Article 15(4) on its first review, regardless of the production accuracy it achieves on the happy path.

09 · CROSS-REFERENCE WEB

Article 15 and the rest of Section 2.

Article 15 does not stand alone. It reads against the rest of Chapter III, Section 2.

Art. 9

The risk-management system feeds the testing regime. Article 9(6) requires testing of high-risk AI systems against preliminarily defined metrics and probabilistic thresholds appropriate to the intended purpose. Article 15 is the technical-quality target the Article 9 testing measures against.

Art. 10

Data governance underwrites the achievable accuracy. Training, validation, and testing data sets shall be relevant, sufficiently representative, and as far as possible free of errors and complete in view of the intended purpose. Article 15(1) accuracy stands or falls on the Article 10 data quality.

Art. 13

Transparency declares the level. Article 13(3)(b)(ii) is the cross-reference Article 15(3) points to · the IFU records the accuracy, robustness, and cybersecurity expected, against which the system has been tested and validated.

Art. 17

The quality-management system holds the procedures. Article 17(1)(g) requires procedures for the management of modifications. Article 17(1)(h) requires examination, test, and validation procedures. Article 15 evidence sits inside the QMS.

Art. 60

Real-world testing under controlled conditions outside AI regulatory sandboxes. Where a provider tests in production-adjacent conditions to validate Article 15 levels, Article 60 sets the conditions.

Art. 72

Post-market monitoring. The provider establishes a system that actively and systematically collects, documents, and analyses relevant data on the performance of high-risk AI systems throughout their lifetime. Article 72 is how the Article 15(1) lifecycle promise is demonstrated.

Art. 73

Serious-incident reporting. Where a serious incident occurs, the provider reports to the market-surveillance authorities. Article 15(4) and (5) failures that produce a serious incident are reportable.

The picture is a closed loop. Article 9 sets the test plan. Article 10 underwrites the data. Article 15 sets the level. Article 13 declares it. Article 17 documents the procedures. Article 12 logs the runtime. Article 72 monitors after placement. Article 73 reports the failures. The loop is not optional. Each step is a separate provider obligation under Article 16.

10 · WARRANT FIELD MAP

Where Warrant maps Article 15.

15(1)

System performs consistently in accuracy, robustness, and cybersecurity over the lifecycle. FIELD · metadata.consistency_check_id (per-trace cross-reference to a lifecycle-consistency record mapped to a specific EU AI Act obligation).

15(3)

Accuracy levels and metrics declared in the instructions for use. FIELD · metadata.ifu_accuracy_block (declared metric, level, test set identifier, conditions, version of IFU).

15(4)

Technical and organisational measures including redundancy, backup, and fail-safe. FIELD · trace.actions[*].failsafe_state (primary, fallback, degraded, watchdog-fired) plus per-trace fail-safe-engagement roll-up.

15(4) feedback

Continuously learning systems · feedback-loop bias mitigation. FIELD · metadata.feedback_loop_audit (drift-window check, retraining-gate decision, bias-metric delta against Article 10 baseline).

15(5)

Cybersecurity controls evidenced against the named threat surface. FIELD · metadata.cybersec_controls (mapping against the five named threats) + trace.actions[*].adversarial_check (per-action input and output adversarial scoring).

Sample EU evidence package · Warrant registerINDEPENDENTLY VERIFIABLE WITHOUT CONTACTING WARRANT

→ /v/7de85ceaeac42a47

11 · FAQ

Questions a compliance officer asks first.

What is an appropriate level of accuracy under Article 15(1)?

Article 15(1) does not fix a numerical floor. The level is appropriate when it is fit for the system's intended purpose, declared in the instructions for use under Article 15(3) and Article 13(3)(b)(ii), and supported by the test methodology recorded in the Annex IV technical documentation. The Commission encourages benchmarks under Article 15(2). Until those benchmarks attach, the provider justifies the level against the risk-management system under Article 9 and the data-governance regime under Article 10.

Where do the Commission's benchmarks under Article 15(2) actually come from?

Article 15(2) instructs the Commission to encourage the development of benchmarks and measurement methodologies in cooperation with relevant stakeholders and organisations such as metrology and benchmarking authorities. In practice that pulls in the Joint Research Centre, ENISA on the cybersecurity side, CEN-CENELEC JTC 21 on harmonised standards, and national metrology institutes. The track is incremental. Until a harmonised benchmark is in force, the provider's declared methodology under Article 13 carries the burden.

Does Article 15(5) require resistance to all known adversarial attacks?

No. The text says technical solutions shall be appropriate to the relevant circumstances and the risks. The named threat surface includes data poisoning, model poisoning, adversarial examples or model evasion, confidentiality attacks, and model flaws. The standard is appropriateness, not exhaustiveness. The provider documents the threats considered, the countermeasures applied, and the residual risk.

How does Article 15 interact with the EU Cyber Resilience Act?

Regulation (EU) 2024/2847, the Cyber Resilience Act, attaches horizontal cybersecurity obligations to products with digital elements. A high-risk AI system that is also a product with digital elements is in scope of both regimes. Recital 77 of the AI Act anticipates the overlap. The operative pattern is to satisfy the more specific obligation in each regime once and reference it in both technical files, rather than duplicate the evidence.

Does the feedback-loop language in Article 15(4) ban online learning?

No. The text says systems that continue to learn after being placed on the market shall be developed in such a way as to eliminate or reduce as far as possible the risk of possibly biased outputs influencing input for future operations, and that any such feedback loops are duly addressed with appropriate mitigation measures. Online learning is permitted. It is conditioned on a documented mitigation regime · drift detection, retraining gates, output sampling, and the bias controls under Article 10(2)(f) and Article 10(2)(g).

What is the relationship to OWASP LLM Top 10 and NIST AI 100-2?

Neither is named in Article 15. Both are operative reference texts. The OWASP LLM Top 10 enumerates the prompt-injection, training-data poisoning, model denial-of-service, supply-chain, and sensitive-information-disclosure risks that the cybersecurity solutions under Article 15(5) must consider where appropriate. NIST AI 100-2 (Adversarial Machine Learning · A Taxonomy and Terminology of Attacks and Mitigations) provides the formal taxonomy for evasion, poisoning, privacy, and abuse attacks that the technical file should map against.

Can i claim Article 15 conformity through a harmonised standard?

Yes, where one applies. Article 40 confers a presumption of conformity on high-risk AI systems and general-purpose AI models that conform to harmonised standards or parts thereof published in the Official Journal. The CEN-CENELEC JTC 21 standardisation request from the Commission, dated 22 May 2023, includes accuracy, robustness, and cybersecurity within scope. As of May 2026 the relevant harmonised standards are still in development. Until publication in the OJEU, the presumption does not attach and the provider justifies the level directly.

What evidence does a regulator expect for Article 15(4) resilience?

A documented analysis of foreseeable error and fault modes, the technical and organisational measures applied to each, and operational records demonstrating that those measures functioned in production. Where redundancy or fail-safe is the answer, the file records the architecture · primary path, fallback path, watchdog, the conditions under which the fallback engages, and the per-event log entries when it engaged. The Article 12 logging perimeter is the runtime correlate.

12 · READ THE SOURCE

Read the source directly.

Authored by Warrant Compliance, the regulatory-analysis function at Warrant. [email protected]. Editorial commentary on regulatory text. Not legal advice. The verbatim quotation of Article 15 reflects the official English-language text of Regulation (EU) 2024/1689 as published in the Official Journal of the European Union on 12 July 2024.