The AI agent audit trail: NYDFS 500.6 + SR 26-2

Q: Do standard LLM inference logs satisfy a NYDFS 500.6(a)(2) audit trail?

No. 23 NYCRR 500.6(a)(2) requires audit trails designed to detect and respond to Cybersecurity Events that could materially harm the Covered Entity. The 16 October 2024 NYDFS Industry Letter applies that rule to AI without amending it. An LLM inference log records prompt tokens, completion tokens, the model id, and a redacted prompt body. It does not record what specific Nonpublic Information the agent accessed, under what authorization, or what decision the output influenced. It is traffic-shaped, not operation-shaped, so it does not answer the questions a 500.6(a)(2) audit trail must answer.

Q: Does SR 11-7 require evidence per decision or only at validation time?

Both. The ongoing-monitoring pillar directs validation activities to continue on an ongoing basis after a model goes into use. SR 26-2 (17 April 2026), which supersedes SR 11-7, carries the four-pillar discipline into a principles-based restatement and extends effective challenge to alternatives-considered logging at runtime, not only at validation. The comprehensive-documentation standard, read with the third-party replicability test, requires a knowledgeable third party to reconstruct what the model actually did at the granularity of each tool call and each retrieval. That is a per-decision evidence obligation, not a quarterly aggregate.

Q: What is the penalty exposure for an inadequate AI audit trail?

Under NYDFS Part 500, civil money penalties under Banking Law and Financial Services Law, plus consent orders. Recent NYDFS settlements include PayPal at USD 2 million (27 January 2025) citing Part 500 failures and the combined Geico and Travelers settlement at USD 11.3 million (28 November 2023) citing 500.6 audit-trail gaps. Under SR 11-7, exposure runs through Matters Requiring Attention and Matters Requiring Immediate Attention findings and civil money penalties; recent enforcement includes the Wells Fargo model-risk actions exceeding USD 3 billion and the Citigroup USD 400 million order.

Q: Does the CISO certification under 500.17(b) cover the AI audit trail?

It covers the obligation, not the evidence. 500.17(b)(1)(i) requires the annual certification to rest on data and documentation sufficient to accurately determine and demonstrate material compliance. A CISO who certifies 500.6(a)(2) compliance for AI systems without an operation-level evidence trail is signing on faith. The Second Amendment certification under 500.17(b)(2) is submitted by the Covered Entity's highest-ranking executive, which puts that officer on the same risk.

01 · WHY A LOG IS NOT A TRAIL

Why an inference log is not an audit trail.

"Each covered entity shall securely maintain systems that, to the extent applicable and based on its risk assessment ... (2) include audit trails designed to detect and respond to cybersecurity events that have a reasonable likelihood of materially harming any material part of the normal operations of the covered entity." 23 NYCRR § 500.6(a)(2) · Second Amendment · effective 1 November 2023

The 16 October 2024 NYDFS Industry Letter does not impose a new rule. It applies 23 NYCRR Part 500 to AI, including § 500.6(a)(2). Read against that clause, a standard AI deployment's logs come up short. The API gateway records timestamp, method, path, status, latency. The LLM inference log records prompt tokens, completion tokens, the model id, and a redacted prompt body. Both are traffic-shaped. A § 500.6(a)(2) audit trail is operation-shaped. The full statutory read of why standard logs fall short is in standard API call logs do not satisfy 23 NYCRR § 500.6.

US bank model risk guidance sets the same bar from the model-risk direction. SR 26-2, issued 17 April 2026 (OCC Bulletin 2026-13) by the Federal Reserve / OCC / FDIC, is the current interagency guidance; it supersedes and replaces SR 11-7 (2011) and SR 21-8, carrying the four-pillar discipline into a principles-based, risk-tailored restatement with explicit AI/ML scope. Its comprehensive-documentation standard requires that a knowledgeable third party reconstruct what the model did, when, and why, at the granularity of each tool call and each retrieval. The pillar-by-pillar reading is in SR 26-2 / SR 11-7, line by line.

Put the two regimes side by side and they converge on one shape: a record per consequential action that an examiner can read end-to-end. The question is what that record has to contain.

02 · THE FIVE QUESTIONS

The five questions a regulator asks.

An examiner reconstructing an AI agent's decision does not start from the architecture. It starts from a single consequential action and asks five questions. Each question is anchored to a clause, and each clause is satisfied by a field in the per-action record.

What did the agent access? The specific Nonpublic Information element, not a request hash. NYDFS § 500.6(a)(1) reconstruction + § 500.1(k) NPI · SR 11-7 documentation pillar → trace.actions[*] (subject, inputs, outputs)

When did it happen? A timestamp the Covered Entity cannot retroactively change. NYDFS § 500.6(b) retention · § 500.17(a)(1) 72-hour clock → trace.actions[*].ts on the public record, fixed in time

Under what authority? The policy, role, and purpose under which the action was permitted. NYDFS § 500.7 access privileges · SR 11-7 governance pillar → authorization_envelope.* + within_purpose

Under what constraints? The preconditions, oversight, and reversibility that bounded the action. SR 11-7 use-context · effective challenge → authorization_envelope.preconditions_met + alternatives considered

With what result? The decision the action influenced, and the model that produced it. NYDFS § 500.11 third-party · SR 11-7 model inventory → trace.actions[*].outputs + model inventory binding

"An inference log answers none of the five. The per-action record answers all five."Warrant Compliance · 2026-06-04

03 · Q1 · WHAT WAS ACCESSED

What did the agent access.

The first question is the simplest to ask and the hardest for a standard log to answer. § 500.6(a)(1) requires systems designed to reconstruct material financial transactions. The reconstruction has to name the specific data element the agent touched, because Nonpublic Information under § 500.1(k) is defined broadly: business information whose tampering would materially impact operations, the classic combination-PII prong, and health information.

An agent that fetched a customer's last twelve transactions records those twelve specific transaction IDs, not "GET /transactions returned 200 in 84ms." The record names the subject of each action and the inputs and outputs that flowed through it. In the per-action record this is trace.actions[*] carrying subject, inputs, and outputs. SR 11-7's documentation pillar reads the same requirement: a knowledgeable third party must be able to reconstruct what the model consumed and produced.

This is also where the agentic shape diverges sharply from a 2011-era model. A credit-score regression issued one decision and the trail was a single database row. An AI agent issues a sequence of tool calls and retrievals before the customer-facing decision, and the audit trail has to reconstruct each one. Anything coarser fails the third-party replicability test on examination.

04 · Q2 · WHEN

When did it happen.

Each covered entity shall maintain records ... for not fewer than five years [under (a)(1)] and ... for not fewer than three years [under (a)(2)], to reconstruct material financial transactions and to detect and respond to cybersecurity events. 23 NYCRR § 500.6(b) · retention · Second Amendment

The second question is about time, and time is where most logs quietly fail. § 500.6(b) sets a retention floor of five years for (a)(1) records and three years for (a)(2) records. An audit trail that lives inside a 30-day application-log rotation does not satisfy. More to the point, the retention clock is meaningless if the timestamps inside the trail can be edited after a Cybersecurity Event.

The § 500.17(a)(1) 72-hour notice clock starts at determination of a Cybersecurity Event, not at occurrence. To meet that clock, an investigator has to place the agent's actions at fixed points relative to the determination. A record whose timestamp is under the Covered Entity's own control cannot do that against an adversarial reading. The per-action trace.actions[*].ts is placed on the public record and fixed in time, so the placement is independently verifiable without contacting Warrant rather than asserted from the Covered Entity's own clock.

05 · Q3 · UNDER WHAT AUTHORITY

Under what authorization.

The third question separates legitimate access from compromised access, and it is the one a § 500.6(a)(2) audit trail is built around. An agent that accessed an account number is in scope for § 500.7 access privileges, and the trail must record the authorization the access satisfied. Without it, an investigator cannot tell a permitted read from a breach.

In the per-action record this is the authorization_envelope together with the within_purpose determination: the policy, the role, and the purpose limitation under which the agent was permitted to take the action. Standard logs record that a request succeeded. They do not record whether the agent was allowed to make it.

SR 11-7 reads the same field from the governance pillar, which requires a named human officer accountable for the model's outputs; the agent does not displace that accountability, it inherits it. The record binds each action to the policy version current at decision time and to the accountable officer's role, so the authorization an examiner reads is the one that actually applied, not a reconstruction after the fact. The deployer-side accountability question, read against the EU regime, is in the Article 26 deployer obligations, line by line.

06 · Q4 · UNDER WHAT CONSTRAINTS

Under what constraints.

An effective validation framework should include ... ongoing monitoring ... Validation activities should continue on an ongoing basis after a model goes into use, to track known model limitations and identify any new ones. SR 11-7 · ongoing monitoring and effective challenge

The fourth question is the one most logs never even attempt. SR 11-7 defines effective challenge as critical analysis by objective, qualified individuals who can identify model limitations and assumptions. Under SR 26-2 that expectation extended to runtime: the bank is expected to log, per decision, what alternatives the agent considered and why the chosen path was preferred. The agent that emits one path through one tool with no record of the alternatives it weighed and discarded is the gap finding most likely to surface in the next examination cycle.

In the per-action record the constraints leg is authorization_envelope.preconditions_met together with the per-action capture of whether human oversight was appropriate, whether the action was reversible, and the alternatives considered. This is the same shape § 500.6(a)(2) reads from the cybersecurity direction: an action that stayed inside the preconditions its authorization required is legitimate; an action that did not is a Cybersecurity Event under § 500.1(f). The constraint record is what lets an examiner tell the two apart.

SR 11-7 also fixes when the constraints have to be re-established. A foundation-model swap, a prompt-template rewrite that broadens the use case, or a retrieval-corpus change that introduces new domains each read as triggers for re-validation. The per-action record carries the model and policy version in force at decision time, so an examiner can see whether the action ran under the constraints that were actually validated.

07 · Q5 · WITH WHAT RESULT

With what result.

The fifth question closes the loop: what decision did the action influence, and which model produced it. § 500.6(a)(2) requires the trail to reconstruct a Cybersecurity Event end-to-end, not merely log its detection. SR 11-7 requires every material model in production to carry an inventory row with version, owner, last validation date, and residual risk. An LLM-driven decisioning agent without an inventory row is, on the operative SR 26-2 read, an unmanaged model, and unmanaged models are the most common phrasing in Matters Requiring Attention letters.

In the per-action record the result leg is trace.actions[*].outputs bound to the model inventory identifier of the version that produced it. An examiner pulling a single decision walks from the action, through the outputs, to the inventory row, to the model card and active validation. The walk takes seconds. The same walk for a firm that does not bind decisions to inventory rows takes weeks and often produces a partial answer, which is itself the gap finding.

§ 500.11 third-party service provider governance attaches here too. A foundation-model provider that processes NPI on the Covered Entity's behalf is a third-party service provider, and the audit trail must record the model identity and the model provider per action. The result field carries the model that produced the output, so the third-party chain is on the record alongside the decision it shaped.

Sample US evidence package · small-business underwriting agentINDEPENDENTLY VERIFIABLE · MAPPED TO § 500.6(a)(2) + SR 11-7

→ us-fintech.pdf

08 · THE EU CROSS-OVER

Where the EU AI Act crosses over.

The five questions are not unique to the US. An AI agent that evaluates the creditworthiness of natural persons or establishes their credit score is high-risk under Annex III point 5(b) of Regulation (EU) 2024/1689, which brings the Article 12 record-keeping obligation. Application of Article 12 to Annex III standalone high-risk systems is deferred from 2 August 2026 to 2 December 2027 by the Digital Omnibus, which replaces the earlier conditional trigger with a fixed date (adopted: European Parliament 16 June 2026, Council 29 June 2026). Non-compliance is reachable under Article 99(4) at up to EUR 15 million or 3 percent of global annual turnover.

So one credit-decisioning agent serving EU and US customers can carry three record obligations at once: NYDFS § 500.6(a)(2), SR 11-7 ongoing monitoring, and EU AI Act Article 12. The supervisors differ; the questions do not. Each asks what the agent accessed, when, under what authority, under what constraints, and with what result. That convergence is the point: one per-action record, mapped to a specific obligation under each regime. The classification reading for creditworthiness is in the high-risk classification Guidelines, read in full.

3 regimes

ONE CREDIT AGENT

NYDFS Part 500, SR 11-7 / SR 26-2, and EU AI Act Article 12 can all attach to a single credit-decisioning agent.

5 questions

ONE RECORD SHAPE

What, when, under what authority, under what constraints, with what result. The same per-action record answers all five across all three.

09 · FAQ

Questions a compliance officer asks first.

Do standard LLM inference logs satisfy a NYDFS 500.6(a)(2) audit trail?

No. § 500.6(a)(2) requires audit trails designed to detect and respond to Cybersecurity Events. An LLM inference log records prompt tokens, completion tokens, the model id, and a redacted prompt body. It does not record what specific Nonpublic Information the agent accessed, under what authorization, or what decision the output influenced. It is traffic-shaped, not operation-shaped, so it does not answer the questions a § 500.6(a)(2) audit trail must answer.

What are the five questions a regulator asks of an AI agent decision?

What the agent accessed, when, under what authorization, under what constraints, and what decision it influenced. Under NYDFS Part 500 these map to § 500.6(a)(2) audit trails, § 500.6(a)(1) reconstruction, § 500.7 access privileges, and § 500.11 third-party governance. Under SR 11-7 they map to the model inventory, ongoing monitoring, the use-context check, and the comprehensive-documentation replicability standard. The same per-action record answers all five.

Does SR 11-7 require evidence per decision or only at validation time?

Both. SR 11-7's ongoing-monitoring pillar directs validation to continue on an ongoing basis after a model goes into use. SR 26-2 expanded effective challenge to include alternatives-considered logging at runtime, not only at validation. The comprehensive-documentation standard, read with the third-party replicability test, requires a knowledgeable third party to reconstruct what the model did at the granularity of each tool call and retrieval. That is a per-decision evidence obligation.

How does the EU AI Act creditworthiness rule cross over with NYDFS and SR 11-7?

An AI agent that evaluates the creditworthiness of natural persons is high-risk under Annex III point 5(b) of Regulation (EU) 2024/1689, which brings the Article 12 record-keeping obligation. The same agent inside a US bank is a material model under SR 11-7 and SR 26-2 and touches Nonpublic Information under § 500.1(k). One agent can carry three record obligations at once, and the five regulator questions are common to all three.

What is the penalty exposure for an inadequate AI audit trail?

Under NYDFS Part 500, civil money penalties plus consent orders. Recent settlements include PayPal at USD 2 million (27 January 2025) and the combined Geico and Travelers settlement at USD 11.3 million (28 November 2023) citing § 500.6 audit-trail gaps. Under SR 11-7, exposure runs through Matters Requiring Attention and Matters Requiring Immediate Attention findings and civil money penalties; recent enforcement includes the Wells Fargo model-risk actions exceeding USD 3 billion and the Citigroup USD 400 million order.

Does the CISO certification under 500.17(b) cover the AI audit trail?

It covers the obligation, not the evidence. § 500.17(b)(1)(i) requires the certification to rest on data and documentation sufficient to accurately determine and demonstrate material compliance. A CISO who certifies § 500.6(a)(2) compliance for AI without an operation-level evidence trail is signing on faith. The Second Amendment certification under § 500.17(b)(2) is submitted by the Covered Entity's highest-ranking executive, which puts that officer on the same risk.

10 · READ THE SOURCE

Read the source directly.

Authored by Warrant Compliance, the regulatory-analysis function at Warrant. [email protected]. Editorial commentary on regulatory text. Not legal advice. SR 26-2 (17 April 2026, OCC Bulletin 2026-13) is the current interagency model risk guidance and supersedes SR 11-7 (2011) and SR 21-8. The five-question framing is Warrant's reading of 23 NYCRR § 500.6(a)(2) applied to AI per the 16 October 2024 Industry Letter and of SR 26-2 ongoing monitoring; the regulators did not write it as a numbered list. The verbatim quotations of § 500.6 and the SR 11-7 § V lineage are from the official texts cited above.

the AI agent audit trail: NYDFS Part 500 + SR 26-2 evidence.