Why an inference log is not an audit trail.
The 16 October 2024 NYDFS Industry Letter does not impose a new rule. It applies 23 NYCRR Part 500 to AI, including § 500.6(a)(2). Read against that clause, a standard AI deployment's logs come up short. The API gateway records timestamp, method, path, status, latency. The LLM inference log records prompt tokens, completion tokens, the model id, and a redacted prompt body. Both are traffic-shaped. A § 500.6(a)(2) audit trail is operation-shaped. The full statutory read of why standard logs fall short is in standard API call logs do not satisfy 23 NYCRR § 500.6.
SR 11-7 sets the same bar from the model-risk direction. The Federal Reserve / OCC / FDIC interagency guidance, originally 4 April 2011, was carried forward by SR 26-2 in 2026 with explicit AI/ML scope. Its comprehensive-documentation standard requires that a knowledgeable third party reconstruct what the model did, when, and why, at the granularity of each tool call and each retrieval. The pillar-by-pillar reading is in SR 11-7 / SR 26-2, line by line.
Put the two regimes side by side and they converge on one shape: a record per consequential action that an examiner can read end-to-end. The question is what that record has to contain.
The five questions a regulator asks.
An examiner reconstructing an AI agent's decision does not start from the architecture. It starts from a single consequential action and asks five questions. Each question is anchored to a clause, and each clause is satisfied by a field in the per-action record.
trace.actions[*] (subject, inputs, outputs)
trace.actions[*].ts on the public record, fixed in time
authorization_envelope.* + within_purpose
authorization_envelope.preconditions_met + alternatives considered
trace.actions[*].outputs + model inventory binding
What did the agent access.
The first question is the simplest to ask and the hardest for a standard log to answer. § 500.6(a)(1) requires systems designed to reconstruct material financial transactions. The reconstruction has to name the specific data element the agent touched, because Nonpublic Information under § 500.1(k) is defined broadly: business information whose tampering would materially impact operations, the classic combination-PII prong, and health information.
An agent that fetched a customer's last twelve transactions records those twelve specific transaction IDs, not "GET /transactions returned 200 in 84ms." The record names the subject of each action and the inputs and outputs that flowed through it. In the per-action record this is trace.actions[*] carrying subject, inputs, and outputs. SR 11-7 § III.B(4) reads the same requirement as the documentation pillar: a knowledgeable third party must be able to reconstruct what the model consumed and produced.
This is also where the agentic shape diverges sharply from a 2011-era model. A credit-score regression issued one decision and the trail was a single database row. An AI agent issues a sequence of tool calls and retrievals before the customer-facing decision, and the audit trail has to reconstruct each one. Anything coarser fails the third-party replicability test on examination.
When did it happen.
The second question is about time, and time is where most logs quietly fail. § 500.6(b) sets a retention floor of five years for (a)(1) records and three years for (a)(2) records. An audit trail that lives inside a 30-day application-log rotation does not satisfy. More to the point, the retention clock is meaningless if the timestamps inside the trail can be edited after a Cybersecurity Event.
The § 500.17(a)(1) 72-hour notice clock starts at determination of a Cybersecurity Event, not at occurrence. To meet that clock, an investigator has to place the agent's actions at fixed points relative to the determination. A record whose timestamp is under the Covered Entity's own control cannot do that against an adversarial reading. The per-action trace.actions[*].ts is placed on the public record and fixed in time, so the placement is independently verifiable without contacting Warrant rather than asserted from the Covered Entity's own clock.
Under what authorization.
The third question separates legitimate access from compromised access, and it is the one a § 500.6(a)(2) audit trail is built around. An agent that accessed an account number is in scope for § 500.7 access privileges, and the trail must record the authorization the access satisfied. Without it, an investigator cannot tell a permitted read from a breach.
In the per-action record this is the authorization_envelope together with the within_purpose determination: the policy, the role, and the purpose limitation under which the agent was permitted to take the action. Standard logs record that a request succeeded. They do not record whether the agent was allowed to make it.
SR 11-7 reads the same field from the governance pillar. § V.A requires a named human officer accountable for the model's outputs; the agent does not displace that accountability, it inherits it. The record binds each action to the policy version current at decision time and to the accountable officer's role, so the authorization an examiner reads is the one that actually applied, not a reconstruction after the fact. The deployer-side accountability question, read against the EU regime, is in the Article 26 deployer obligations, line by line.
Under what constraints.
The fourth question is the one most logs never even attempt. SR 11-7 § V.B defines effective challenge as critical analysis by objective, qualified individuals who can identify model limitations and assumptions. Under SR 26-2 that expectation extended to runtime: the bank is expected to log, per decision, what alternatives the agent considered and why the chosen path was preferred. The agent that emits one path through one tool with no record of the alternatives it weighed and discarded is the gap finding most likely to surface in the next examination cycle.
In the per-action record the constraints leg is authorization_envelope.preconditions_met together with the per-action capture of whether human oversight was appropriate, whether the action was reversible, and the alternatives considered. This is the same shape § 500.6(a)(2) reads from the cybersecurity direction: an action that stayed inside the preconditions its authorization required is legitimate; an action that did not is a Cybersecurity Event under § 500.1(f). The constraint record is what lets an examiner tell the two apart.
SR 11-7 § V.A.5 also fixes when the constraints have to be re-established. A foundation-model swap, a prompt-template rewrite that broadens the use case, or a retrieval-corpus change that introduces new domains each read as triggers for re-validation. The per-action record carries the model and policy version in force at decision time, so an examiner can see whether the action ran under the constraints that were actually validated.
With what result.
The fifth question closes the loop: what decision did the action influence, and which model produced it. § 500.6(a)(2) requires the trail to reconstruct a Cybersecurity Event end-to-end, not merely log its detection. SR 11-7 § IV.A requires every material model in production to carry an inventory row with version, owner, last validation date, and residual risk. An LLM-driven decisioning agent without an inventory row is, on the operative SR 26-2 read, an unmanaged model, and unmanaged models are the most common phrasing in Matters Requiring Attention letters.
In the per-action record the result leg is trace.actions[*].outputs bound to the model inventory identifier of the version that produced it. An examiner pulling a single decision walks from the action, through the outputs, to the inventory row, to the model card and active validation. The walk takes seconds. The same walk for a firm that does not bind decisions to inventory rows takes weeks and often produces a partial answer, which is itself the gap finding.
§ 500.11 third-party service provider governance attaches here too. A foundation-model provider that processes NPI on the Covered Entity's behalf is a third-party service provider, and the audit trail must record the model identity and the model provider per action. The result field carries the model that produced the output, so the third-party chain is on the record alongside the decision it shaped.
Where the EU AI Act crosses over.
The five questions are not unique to the US. An AI agent that evaluates the creditworthiness of natural persons or establishes their credit score is high-risk under Annex III point 5(b) of Regulation (EU) 2024/1689, which brings the Article 12 record-keeping obligation. Application of Article 12 to Annex III standalone systems begins 2 August 2026, subject to a provisional deferral to 2 December 2027 under the May 2026 Digital Omnibus, pending publication in the Official Journal. Non-compliance is reachable under Article 99(4) at up to EUR 15 million or 3 percent of global annual turnover.
So one credit-decisioning agent serving EU and US customers can carry three record obligations at once: NYDFS § 500.6(a)(2), SR 11-7 ongoing monitoring, and EU AI Act Article 12. The supervisors differ; the questions do not. Each asks what the agent accessed, when, under what authority, under what constraints, and with what result. That convergence is the point: one per-action record, mapped to a specific obligation under each regime. The classification reading for creditworthiness is in the high-risk classification Guidelines, read in full.
Questions a compliance officer asks first.
Read the source directly.
- 23 NYCRR Part 500, Second Amendment (1 November 2023, PDF) · § 500.6 audit trail
- NYDFS Industry Letter, Cybersecurity Risks Arising from Artificial Intelligence (16 October 2024)
- Federal Reserve SR 26-2, carry-forward of SR 11-7 with AI/ML scope (PDF)
- Federal Reserve SR 11-7, Supervisory Guidance on Model Risk Management (4 April 2011)
- Regulation (EU) 2024/1689 · EUR-Lex CELEX:32024R1689 · Annex III point 5(b) + Article 12
- Standard API call logs do not satisfy 23 NYCRR § 500.6 · the statutory read
- SR 11-7 / SR 26-2, line by line · the four pillars read for evidence
- NYDFS Part 500 · per-obligation Warrant evidence field mapping
- SR 11-7 / SR 26-2 · per-obligation Warrant evidence field mapping
Authored by Warrant Compliance, the regulatory-analysis function at Warrant. [email protected]. Editorial commentary on regulatory text. Not legal advice. The five-question framing is Warrant's reading of 23 NYCRR § 500.6(a)(2) applied to AI per the 16 October 2024 Industry Letter and of SR 11-7 / SR 26-2 ongoing monitoring; the regulators did not write it as a numbered list. The verbatim quotations of § 500.6 and SR 11-7 § V are from the official texts cited above.