The load-bearing claim.
Four verbs, one paragraph, the entire methodology. NIST chose verbs not nouns deliberately. GOVERN is not governance; it is the act of cultivating a culture in which AI risk is named and owned. MAP is not a mapping document; it is the act of establishing context per system. MEASURE is not metrics; it is the act of running tests, evaluations, verifications, and validations against the system. MANAGE is not risk register; it is the act of allocating treatment to a measured risk. The verbs specify continuing obligations, not artefacts.
The single most-misread sentence in the entire RMF is the one that follows the four definitions in § 5. The four functions are not sequential. They are concurrent and interdependent. A firm that treats GOVERN as a one-time policy publication, MAP as a system-launch checklist, MEASURE as a quarterly metrics review, and MANAGE as a ticket queue has not implemented the RMF. The RMF requires the four functions to operate together against every AI system the firm runs, every release the firm cuts, and every decision the firm's AI produces.
The 19 categories and 72 subcategories under the four functions are the supervisor's expansion of what each verb means in operational practice. GOVERN has 6 categories. MAP has 5. MEASURE has 4. MANAGE has 4. Each subcategory is a discrete practice that an organisation can self-attest against. NIST does not score the attestation; the document is constructed for self-assessment. But the absence of a subcategory from a firm's self-attestation reads as an absence to a federal procurement officer reviewing the firm's RMF posture.
For an AI agent operating in production, the four functions translate into per-decision evidence categories. GOVERN is the audit trail of who decided what was acceptable risk and when. MAP is the per-trace context (purpose, jurisdiction, affected parties) that frames the decision. MEASURE is the eval-suite run record plus the per-decision residual-risk record. MANAGE is the per-decision rationale and the treatment-of-risk evidence. The four functions read against an AI agent in production become four columns in the per-decision evidence package.
Culture, accountability, transparency.
GOVERN is the cross-cutting function. It is not the first stage of a pipeline; it is the soil in which the other three functions grow. NIST's framing in § 5.1 is that GOVERN cultivates a culture of AI risk management that is enabled by senior leadership, that is inclusive of diverse perspectives, and that is supported by clear policies, procedures, and practices. The function exists in the language of culture because the RMF authors observed that AI risk management without cultural reinforcement tends to collapse into a documentation exercise.
GOVERN 1 has six subcategories. The load-bearing one is GOVERN 1.1: legal and regulatory requirements involving AI are understood, managed, and documented. For a firm running an AI agent across multiple jurisdictions, GOVERN 1.1 alone encompasses the entire EU AI Act mapping, the FCA Consumer Duty mapping, the SR 11-7 model-risk mapping, the SEBI retail-algo mapping, and any other regime that engages the firm's AI use. The RMF reads through to those regimes; it does not displace them.
For an AI agent in production, the GOVERN function is the audit trail of who decided what was acceptable risk and when. A per-release attestation that names the accountable Senior Manager, the regulatory regimes engaged, the residual risk accepted, and the date of acceptance is the operational discharge of GOVERN 1.1, GOVERN 2.1, and GOVERN 4 simultaneously. A GOVERN function that exists only as a published policy document and a quarterly steering-committee minute is the configuration the supervisor expects to fail.
The absence of GOVERN evidence is the most common gap NIST cites in its consultation responses. Firms run MAP-style work in product launches and MEASURE-style work in evaluation suites; the gap is at GOVERN, where the cultural and accountability infrastructure should turn the technical work into a defensible record. The Warrant package addresses this gap: every per-decision record names the accountable signer and the release it belongs to, and the record is independently verifiable without contacting Warrant. That is the GOVERN evidence the technical pipelines do not by themselves produce.
Context and risks.
MAP is the function that establishes context per AI system. The function's five categories work together to answer five questions: what is this system for, what is it, what does it know, what does it depend on, and what does it affect. NIST's framing is that without an answer to all five questions the firm cannot meaningfully measure or manage the risks the system creates. The MAP function is where Warrant's classification stage produces its first evidence layer.
The MAP function is where a per-trace classification produces direct RMF evidence. Warrant maps each trace to a domain, a set of jurisdictions, the regulatory regimes engaged, and a risk tier; that output discharges MAP 1, MAP 2, and MAP 5 in a single record. The third-party-component answer for MAP 4 is encoded once at deployment time (the model identity, the embedding provider, the eval harness) and bound to every per-decision package by reference. The capability-and-benchmark answer for MAP 3 is the eval suite the firm runs against the system on a release cadence.
For an AI agent making a customer-facing decision, MAP 5 is the operative subcategory the supervisor will pull on examination. The supervisor's question is whether the firm characterised the impact of this action on this customer (or this protected group) before the action shipped. A characterisation that exists only at the system level, not the per-action level, does not survive the question. MAP 5 is per action, not per system, and the per-action record is the one the supervisor asks to see.
Analytics, metrics, validation.
MEASURE is the function that quantifies the risks the MAP function identified. Its four categories are the load-bearing instrumentation: appropriate methods and metrics, the test-evaluate-verify-validate (TEVV) regime, the mechanism for tracking risks over time, and the feedback loop that surfaces new risks as the system operates. MEASURE is also the function whose subcategory NIST writes most prescriptively about test methodology; MEASURE 2.7 is the explicit TEVV obligation.
MEASURE 2.7 is the subcategory that expands TEVV into operational practice. Uncertainty quantification means the model can distinguish high-confidence from low-confidence outputs, and the firm has evidence per output of which class applied. Robustness means the model's behaviour is stable under distribution shift, adversarial input, and edge-case input, and the firm has evidence the test cases were exercised. Accuracy means the model's outputs match the ground truth where ground truth exists, and the firm has evidence per release of the accuracy on the relevant test set. Reliability means the model degrades gracefully under failure modes, and the firm has evidence the failure modes were enumerated.
The Warrant 200-trace evaluation suite (the regulator-grade-evals work shipped February 2026) is a TEVV instantiation that satisfies MEASURE 2.7 directly. The suite is run against every release; the suite's run record is bound by reference to every evidence package the system produces post-release. The supervisor's question on MEASURE 2.7 (show me the TEVV evidence for the release that produced this decision) reduces to a record that is independently verifiable without contacting Warrant rather than a discovery exercise across observability tooling. /blog/regulator-grade-evals documents the suite's construction in full.
MEASURE 3 is the subcategory that closes the per-decision loop. The RMF reads tracking as continuous, not periodic. A tracking mechanism that operates only at the cohort level and only on a quarterly cadence does not satisfy MEASURE 3 for an AI system that produces decisions every minute. The per-decision residual-risk record is the load-bearing piece, and the per-cohort rollup reads off the per-decision records, not the other way around. MEASURE 3 is what fails when an AI agent's monitoring rotates with the underlying observability data.
Prioritisation and treatment.
MANAGE is the function that takes the risks MAP identified and MEASURE quantified and decides what to do about them. Its four categories are: prioritise treatment based on assessment, develop strategies to maximise benefits, manage third-party risks, and document risk treatments and decisions. The MANAGE function produces the risk-treatment evidence record per decision.
MANAGE 4 is the subcategory that lands hardest on per-decision evidence. The RMF expects the firm to document the risk treatment per identified risk and the decision basis on which the treatment was selected. For an AI agent producing decisions in real time, the per-decision rationale is the document the supervisor will pull. A rationale that exists only as a system-level policy document, not as a per-action record bound to the action's outputs, does not survive the question.
The Warrant per-action decision_rationale field is the direct answer to MANAGE 4. Per decision, Warrant produces an authorisation envelope: within_purpose, preconditions_met, human_oversight_appropriate, reversible, justification. The justification is the per-decision rationale that MANAGE 4 requires. Bound into a record that is independently verifiable without contacting Warrant, the field is retrievable per decision long after the action shipped, which is the survival property MANAGE 4 implicitly requires for a regime that the supervisor reads in years.
Twelve risk categories specific to generative AI.
NIST published the Generative AI Profile (NIST AI 600-1) on 26 July 2024. The profile is the companion document to AI RMF 1.0 that addresses risks specific to generative AI systems. The structure follows the four functions of the parent RMF; the additions are the twelve generative-specific risk categories and the suggested actions per category.
Twelve risk categories. Each category has GOVERN, MAP, MEASURE, and MANAGE-style suggested actions, totalling more than 200 individual practices a generative-AI deployment can self-attest against. The profile is not a checklist; NIST is explicit that not every category applies to every deployment. A code-completion model in an enterprise IDE has a different risk profile than a customer-facing chatbot in a regulated industry, and the profile expects the firm to map the categories to the deployment's actual surface.
The twelve categories partition into three rough groups by how directly they engage AI evidence-of-record obligations. The first group (CBRN, dangerous content, obscene content) reaches outputs that should not have been produced; the evidence is the refusal record and the upstream filtering record. The second group (confabulation, information integrity, harmful bias, intellectual property) reaches output quality and trustworthiness; the evidence is the grounding record, the citation record, and the residual-risk record. The third group (data privacy, information security, environmental impacts, human-AI configuration, value chain and component integration) reaches operational and architectural concerns; the evidence is the supply-chain record, the human-oversight record, and the deployment-scope record.
The middle group is where Warrant's evidence shape lands hardest. A per-decision package that records the inputs, the retrieval-grounded sources, the output, and the residual risk is direct evidence against confabulation, information integrity, and harmful bias risks simultaneously. The package does not eliminate the risks; the RMF does not expect elimination. It produces the evidence record that the firm identified the risk, treated it, and accepted the residual.
Three of twelve that produce per-decision evidence.
Of the twelve generative-AI risk categories, three produce direct per-decision evidence shape that maps cleanly to the Warrant record. The remaining nine produce evidence at the system level (CBRN refusal policies), at the architecture level (information security controls), or at the supply-chain level (value chain and component integration). The three load-bearing categories for per-decision evidence are confabulation, information integrity, and human-AI configuration.
Confabulation is the generative-AI risk that has produced the most regulator commentary across 2024 and 2025. The supervisor's framing is that a model that asserts false content with the same confidence as true content places the entire information chain at risk; the firm cannot defend the deployment without evidence that the chain is grounded. The mitigation evidence is per-decision: the retrieval sources the model consulted, the citations bound to the output, and the refusal pattern where the model lacked sufficient grounding. A grounding record that exists at the architecture level but not the per-decision level does not answer the supervisor's per-customer question.
Information integrity is the second-order risk that follows confabulation at scale. NIST's framing is that even where any single confabulated output is recoverable, a population of confabulated outputs degrades trust in the entire information ecosystem. The mitigation is structural: every output the AI agent produces should carry a verifiable provenance trail, and the trail should be independently inspectable. The Warrant evidence package is the structural answer; each record is independently verifiable without contacting Warrant, which is the property NIST asks for under information integrity.
Human-AI configuration is the risk that lives in the boundary between automation and oversight. NIST's framing in AI 600-1 is that a generative-AI system's human-oversight model must be specified per decision class, not per system; some decisions warrant pure automation, others warrant human review, others warrant human-in-the-loop coordination. The mitigation evidence is the per-decision oversight check: did the trigger conditions fire, did a reviewer engage, what was the reviewer's identity and time-on-task. The Warrant authorization envelope's human_oversight_appropriate boolean is the field that carries this evidence into the per-decision package.
RMF, ISO 42001, EU AI Act.
NIST has published official crosswalks from the AI RMF to ISO/IEC 42001:2023, to the OECD AI Principles, and to other major standards. The crosswalks are not for show; they are the operational answer to the question every multi-jurisdiction firm asks: do i need to run RMF, 42001, and EU AI Act compliance separately? The honest answer is that the three regimes overlap by 60 to 70 percent on substance, and a single per-decision evidence shape can satisfy all three.
The RMF-to-42001 crosswalk is the cleanest. The four RMF functions map to ISO 42001 management-system clauses with high fidelity. GOVERN maps to ISO 42001 Clauses 5 (leadership) and 7 (support); the cultural and accountability infrastructure the RMF describes is the management-system commitment ISO codifies. MAP maps to ISO 42001 Clause 6.1 (planning and risk assessment); the contextualisation work is the same in both regimes. MEASURE maps to ISO 42001 Clause 9 (performance evaluation); both regimes treat measurement as a continuing obligation. MANAGE maps to ISO 42001 Clauses 8 (operation) and 10 (improvement); the treatment-of-risk and continuous-improvement work is shared.
The RMF-to-EU-AI-Act crosswalk runs through Articles 9, 12, and 13 of the binding regulation. Article 9 (risk management system) maps to RMF MAP plus MANAGE; the EU AI Act's Annex IV documentation requirement reads directly into the RMF MAP and MANAGE outputs. Article 12 (logging) maps to MEASURE 3 (mechanisms for tracking risks over time); both regimes require per-event records that survive across the regulatory horizon. Article 13 (transparency) maps to MAP 5 (impacts characterised) plus GOVERN 5 (engagement with relevant AI actors); the transparency obligation reaches the affected-party characterisation and the stakeholder-engagement record together.
The meta-answer for a CTO running an AI agent across multiple regimes is that one evidence shape can satisfy three jurisdictions. The shape is the per-decision package that captures the four-function evidence in a single record, mapped to a specific obligation under each regime. The regime-specific obligations are then a binding step at the end: same per-action record, different external citations. /blog/one-agent-many-jurisdictions walks the binding step in detail.
The category-to-field map.
The table below names the RMF function or category, the evidence the AI agent must produce per action, and the Warrant evidence field that carries the record into the per-decision package. The mapping is the shape an accountable Senior Manager or Chief AI Officer can hand to a federal procurement officer or an internal-audit team without further engineering.
| RMF function · category | What evidence must show | Warrant evidence field |
|---|---|---|
| GOVERN 1.1 · legal req | Regulator-mapping per decision. | regulator_evidence.regimes_engaged |
| GOVERN 2.1 · roles | Named accountable signer per release. | trace.signed_off_by |
| MAP 2 · categorization | Risk-tier classification per trace. | classification.risk_tier |
| MAP 5 · impacts | Per-action affected-party check. | trace.actions[*].affected_parties |
| MEASURE 2 · TEVV | Eval-suite run record per release. | regulator_evidence.eval_suite_record |
| MEASURE 3 · track over time | Per-decision residual-risk record. | trace.actions[*].residual_risk_check |
| MANAGE 4 · decisions documented | Per-decision rationale. | trace.actions[*].decision_rationale |
| GenAI · confabulation | Retrieval-grounded responses + refusal. | trace.actions[*].retrieval_grounded_check |
| GenAI · information integrity | A record independently verifiable without contacting Warrant, with canonical citations. | trace.actions[*].citations |
| GenAI · human-AI config | Human-oversight trigger log. | trace.actions[*].oversight_trigger |
The mapping is reversible. Given a procurement officer's question on a specific RMF subcategory, the firm reads the column, retrieves the field, and produces the per-decision record. Given a specific customer or internal-audit case, the firm reads the per-decision record and produces the bound RMF subcategories. Either direction is one query against the per-decision package.
How a voluntary RMF became federal procurement-gate.
Executive Order 14110, signed 30 October 2023, is the document that turned the voluntary RMF into the de facto federal methodology. Section 4.1 of the order directed NIST to extend the AI RMF to address generative-AI risks (which produced AI 600-1 in July 2024) and tasked federal agencies with adopting RMF-aligned governance. The order is a policy instrument, not a statute, but it binds every executive-branch agency to the methodology by direction.
OMB Memorandum M-24-10, dated 28 March 2024, is where the order operationalised. The memorandum directs federal agencies to inventory their AI uses, conduct impact assessments, designate a Chief AI Officer with named accountability, and apply specific safeguards to safety-impacting and rights-impacting AI uses. The methodological backbone for those safeguards is the AI RMF. Where an agency cannot describe a deployment against the four RMF functions, the agency cannot complete the M-24-10 governance obligation.
For a SaaS vendor selling into US federal procurement in 2026, the implication is direct. The federal acquisition cycle increasingly references the RMF as baseline; the GSA AI procurement guidance, the DoD CDAO acquisition language, and the civilian-agency AI buying patterns all read the RMF as the methodological vocabulary. A vendor that cannot describe its product against the four functions, the relevant subcategories, and the GenAI Profile risks is a vendor that introduces friction at procurement evaluation. RMF alignment is a procurement-gate property in 2026, regardless of the voluntary label on the document.
The supervisory parallel runs in regulated industry as well. The SR 11-7 model-risk regime in US banking has been reading through to RMF concepts in supervisory examination across 2024 and 2025; the FDA's evolving guidance on AI-based medical devices reads RMF MAP and MEASURE language directly into the device's algorithm change protocol. /blog/sr-11-7-model-risk walks the SR 11-7 reading in detail. The pattern is that voluntary federal methodology becomes mandatory regulator expectation through the sectoral pathway, not the procurement pathway alone.
Framework, not artefact.
The RMF is honest about its limits. NIST writes the document as a methodology, not a binding standard. The RMF does not certify; there is no RMF-compliant badge. It does not pass-fail; the categories are self-attested. It does not produce an artefact; the firm running the methodology produces whatever artefacts its choice of tooling generates, and NIST stays out of that choice.
The honesty is also the gap. A federal procurement officer reading a vendor's RMF self-attestation has no independent way to verify the attestation's evidence base. An internal auditor reading the same attestation against the firm's actual operating posture has the same problem. The RMF asks the firm to produce evidence; it does not specify what the evidence looks like or how the firm should make the evidence portable across tooling, retention horizons, and personnel changes.
The Warrant evidence package is the evidence-of-decision instantiation that the RMF asks for but does not specify how to produce. The document is per-decision. It binds the four-function evidence (GOVERN signer, MAP classification, MEASURE eval record, MANAGE rationale) to a specific action a specific AI agent took at a specific time. It is independently verifiable without contacting Warrant, so the file's integrity survives any internal infrastructure decision the firm makes. The result is an artefact the firm controls and the supervisor can verify independently.
That gap closure is the load-bearing claim of Warrant against the RMF. The methodology asks for evidence. The per-decision package is the evidence the RMF does not by itself produce, and each record is independently verifiable without contacting Warrant. The four-function vocabulary becomes the evidence's organising principle; the per-decision package becomes the evidence's physical form. The RMF self-attestation rests on records the firm can produce on demand, at any horizon the procurement officer or the internal auditor names. The document tower the RMF asks for stands on a foundation the voluntary register did not, by itself, supply.
For the firm, the operational consequence is that an RMF self-attestation can be made over a population of per-decision packages rather than a population of internal narratives. The four-function language describes what the firm does; the packages prove what the firm did. Read together, the language and the packages form a posture a procurement officer can verify without negotiating retention, a supervisor can read without site-visit cooperation, and a board can sign without trusting that the underlying observability stack will retain its data through the next decade.
Questions a CAIO and an internal auditor ask first.
Read the source directly.
- NIST AI 100-1 · Artificial Intelligence Risk Management Framework (AI RMF 1.0) · January 2023 (PDF)
- NIST AI 600-1 · AI RMF Generative AI Profile · July 2024 (PDF)
- NIST AI RMF Playbook · companion suggested-actions reference
- NIST AI RMF Roadmap · ongoing development priorities
- Executive Order 14110 · Safe, Secure, and Trustworthy AI · 30 October 2023
- OMB Memorandum M-24-10 · 28 March 2024 (PDF)
- NIST AI RMF Crosswalks · ISO 42001, OECD, EU AI Act
- Per-category Warrant evidence field mapping
Authored by Warrant Compliance, the regulatory-analysis function at Warrant. [email protected]. Editorial commentary on the AI RMF and the Generative AI Profile. Not legal advice. The verbatim quotations of NIST AI 100-1 § 5, GOVERN 1, MEASURE 2.7, the MANAGE function summary, and the AI 600-1 risk-category list reflect the published NIST text in force on 9 May 2026.