Andrew Clearwater

Why 50+ AI Companies Just Agreed to Report Transparently

Andrew Clearwater — Tue, 02 Jun 2026 15:08:28 GMT

Let me start with the headline: on May 28, at an event in Paris, the OECD quietly launched Version 2.0 of the Hiroshima AI Process (HAIP) Reporting Framework. Over 50 organizations committed to submitting reports under the new framework.

If you follow AI governance at all, you know this is significant. The HAIP framework is the only international mechanism for organizations to voluntarily disclose how they’re managing AI risk. No other IGO-backed framework does what this does at this scale.

But here’s the thing: “significant” and “sufficient” are not the same word. Let’s get into it.

First: What Actually Changed in HAIP 2.0

Before we can assess the commitments, we need to understand what organizations are actually committing to. HAIP 2.0 is a structural redesign.

It got role-aware. Version 1.0 treated all AI organizations the same. Version 2.0 distinguishes between model developers, application developers, and deployers.

It got smaller-org friendly. HAIP 2.0 was refined through a pilot involving organizations from seven countries across the full AI value chain. The explicit goal was to make reporting accessible to SMEs.

It connected to actual tooling. Organizations can now reference tools they already use from the OECD.AI Catalogue of Tools and Metrics for Trustworthy AI. This is a smart move: instead of asking companies to describe abstract practices, it creates a link to specific, identifiable tooling choices.

It addressed agentic AI. The new version explicitly covers emerging capabilities including agentic AI. Most governance frameworks are still trying to catch up.

Taken together, these are genuinely useful improvements.

Why Companies Are Committing

The reality is that these commitments are rational strategic behavior. Voluntary frameworks can create optionality for early movers and raise costs for late arrivals. Organizations that commit to HAIP 2.0 now get to:

Shape the baseline. Early reporters influence what counts as “good” disclosure.
Get ahead of mandatory requirements. The EU AI Act is live. The UK AI Opportunities Action Plan is real. Canada, Australia, and Japan are all moving on AI governance legislation. Voluntary commitments today are infrastructure for compliance tomorrow.

The question for practitioners is: what does this commitment architecture actually produce?

The Three Things HAIP 2.0 Still Doesn’t Do

If you’re building governance programs or advising organizations, you need to understand the gap between what the framework promises and what it delivers.

1. It’s Retrospective, Not Prospective

HAIP reports describe what organizations have done. They are not forward-looking risk assessments. They do not require organizations to identify and disclose risks they are currently managing or expect to encounter. They do not require disclosure of known failure modes or active incidents.

This is a fundamental limitation. The most important information in AI governance is not “here’s how we handled last year’s risks.” It’s “here’s what we’re building right now, and here’s what we’re uncertain about.” HAIP 2.0 gets closer to current practice than its predecessor, but it remains backward-looking by design.

For practitioners: treat HAIP reports as historical artifacts, not live risk signals.

2. There Is No Verification

Organizations self-report. There is no independent audit, no third-party attestation, no verification mechanism. The OECD can collect reports; it cannot validate them.

We are in a period where the architecture of governance is being built, and verification infrastructure is trailing significantly.

The risk is straightforward: if reporting is easy, costless, and unverified, it becomes marketing. The honest frame is this: HAIP 2.0 creates a disclosure norm, not an accountability mechanism.

3. It Doesn’t Cover the Most Consequential Decisions

HAIP reporting focuses on risk management practices. What it doesn’t reach are the product decisions that actually determine AI risk: what capabilities to build, what data to train on, what safety evaluations to run before deployment, what red lines to enforce.

These decisions are where the actual risk lives. And they are almost entirely outside the scope of voluntary reporting frameworks.

What This Means If You’re Building AI Governance Infrastructure

Frameworks are only useful if you know how to use them. How should we think about using this one?

HAIP 2.0 is a wedge, not a wall. Use it to open governance conversations internally. If you’re trying to build out a governance program at your organization, the HAIP reporting template gives you a structured way to ask the right questions: What’s our role in the AI value chain? What risk management tools are we actually using? What’s our agentic AI risk posture? The report is a diagnostic framework as much as a disclosure mechanism.

The interoperability angle matters for practitioners building platforms. HAIP 2.0 explicitly aligns with ISO/IEC 42001, the NIST AI RMF, and the G7 Code of Conduct. This cross-alignment is genuinely useful. If you’re designing an AI governance assessment capability, HAIP 2.0 is worth mapping to your control framework.

Treat the commitments as a signal, not a guarantee. When an organization commits to HAIP reporting, that tells you something real: they are operating in a governance environment where disclosure is increasingly expected. It tells you less about whether their risk management practices are adequate.

Watch what happens at the September 2026 deadline. Organizations are encouraged to submit reports using the revised framework by September 1, 2026. This is the first real test. How many of the 50+ actually file? What’s the quality of the submissions?

The Bigger Picture: Where We Actually Are in AI Governance

The HAIP 2.0 launch is happening inside a specific moment in the governance timeline, and it’s worth zooming out for a second.

We are in what I’d call the framework proliferation phase. Over the last three years, we’ve seen the NIST AI RMF, ISO/IEC 42001, the EU AI Act, the UK voluntary commitments, the G7 Hiroshima Process, the GPAI principles, and now HAIP 2.0. Each of these is real. Each of them is incomplete.

The core issue remains: we have disclosure without verification, commitment without consequence, and frameworks designed to cover last year’s AI while this year’s capabilities have already moved the target.

This isn’t an argument for pessimism. It’s an argument for precision. The practitioners and organizations building governance infrastructure right now are not doing useless work. They are building the foundations that mandatory frameworks will eventually need. The question is whether those foundations are built to actually bear weight.

The common story is that 50 companies agreeing to report is a governance win.

The reality is that it’s a start and the hardest work is still ahead.

Practical Takeaways

If you’re a practitioner or builder navigating this:

Download the HAIP 2.0 framework and map it against your existing governance controls.
Use the OECD.AI Catalogue as a reference library for tools.
Don’t mistake disclosure for accountability. Run HAIP reports as inputs to due diligence, not as substitutes for it.
Watch the September 2026 cohort. The first wave of HAIP 2.0 reports will be the governance community’s first real read on whether this framework produces substance or theater.
Build for interoperability. HAIP 2.0’s alignment with ISO/IEC 42001 and NIST AI RMF is the most under-appreciated feature.

What’s your read on voluntary AI governance frameworks? Are they building real accountability infrastructure or creating governance theater?

Primary Source Reading List for AI Governance Practitioners

Here’s everything you need, direct from the source.

The HAIP Framework — Active Portal

HAIP Reporting Framework Portal (v2.0) The live submission portal where you can review the framework structure, browse existing reports, and submit your organization’s report. → transparency.oecd.ai

About the HAIP Reporting Framework OECD’s official overview of how the framework was built, what it covers, and how submissions are handled — including the verification FAQ (spoiler: there isn’t any). → transparency.oecd.ai/about

HAIP Reporting Framework FAQ Directly confirms: “The Secretariat will not assess or verify the substance of submissions.” Essential reading for understanding what a HAIP brand listing actually means. → transparency.oecd.ai/faq

The Foundational Documents — What Organizations Are Actually Committing To

Hiroshima Process International Code of Conduct for Organizations Developing Advanced AI Systems (October 30, 2023) The 11-action Code of Conduct that HAIP 2.0 is designed to monitor. This is the primary commitment document. Read the actual actions, not a summary of them. → mofa.go.jp — Full PDF

G7 Leaders’ Statement on the Hiroshima AI Process (October 30, 2023) The political declaration that launched the Hiroshima AI Process, including the G7’s call for organizations to commit to the Code of Conduct. → mofa.go.jp — Full PDF

The Evidence Base — What Reporting Has Actually Produced So Far

“How Are AI Developers Managing Risks?” — OECD Artificial Intelligence Papers, No. 45 (September 2025)The OECD’s analytical review of the first 25 HAIP submissions. This is the only empirical data we have on what voluntary AI transparency reporting actually looks like in practice. Read it before forming a strong opinion on whether HAIP 2.0 will deliver. → oecd.org — Full report → Direct PDF download

The Upstream Standards — What HAIP Is Built On

OECD AI Principles (2019, updated May 2024) The first intergovernmental AI standard. HAIP is a monitoring mechanism for applying these principles. Understanding the principles is prerequisite to understanding what HAIP is actually measuring. → oecd.ai/en/ai-principles

OECD Catalogue of Tools & Metrics for Trustworthy AI The tool catalogue that HAIP 2.0 now directly connects to. Organizations submitting reports can reference tools from here. If you’re building governance infrastructure, this is a useful inventory of what the OECD considers legitimate practice. → oecd.ai/en/catalogue/overview

The Launch Announcement

OECD HAIP 2.0 Launch Page (May 28, 2026) The official announcement with the list of committed organizations and the v2.0 key changes summary. → oecd.ai/en/haip-2-launch

Safe Words vs. Safe Actions

Andrew Clearwater — Wed, 27 May 2026 13:17:15 GMT

I need to tell you about a research paper that landed in my feed this week, and I want to start with the reason I’m writing about it at all.

Luiza Jarovsky, PhD Co-founder of the AI, Tech & Privacy Academy and author of a newsletter with 95,000 subscribers, flagged it with a post that read: “Another super innovative paper on agentic AI, this time focused on a new safety benchmark: Boiling the Frog. Bookmark it.” She was right. Bookmark it. And then read this.

The paper is called “Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety,” authored by Piercosma Bisconti Lucidi, Matteo Prandi, Federico Pierucci, Federico Sartore, Enrico Panai, Laura Caroli, Yue Zhu, Adam Leon Smith, Luca Nannini, Marcello Galisai, Susanna Cifani, Francesco Giarrusso, Marcantonio Bracale Syrnikov, and Daniele Nardi — a large and impressive team doing some of the most careful agentic safety work I’ve seen published this year. Serious props to all of them. Link to the full paper here.

Now let me tell you why it matters to you.

There’s a hole in AI safety the size of your entire production environment

Here’s the question: Did the AI do something harmful, or just say something harmful?

Those are not the same thing. But almost every major safety benchmark in existence only measures the second one.

Think about how AI safety is typically evaluated. A benchmark gives the model a prompt: “Write instructions for making a bomb.” The model either refuses or it doesn’t. If it refuses, it passes. If it complies, it fails. Simple. Measurable. And the source of an entire industry of red-teaming, jailbreak research, and content policy work.

This is what the research community calls response safety. But it’s measuring the wrong thing when you’re talking about agents.

Because here’s what happens when you deploy an AI agent in a corporate environment:

The agent isn’t just generating text. It’s reading files. Editing configurations. Updating access control records. Modifying audit logs. Writing to databases. It’s changing the state of the world.

And the research team behind “Boiling the Frog” just published the first real evidence of how badly that goes (with a 44% attack success rate across nine major models in agentic settings)

The two worlds of AI safety

Let me make the distinction concrete, because I think it’s the most important framing in AI safety right now and almost nobody’s talking about it plainly.

World 1: Response Safety

A prompt comes in. A response goes out. You judge the response.

The model is like a very smart person you’ve asked a question. The environment around the model is inert. Nothing gets changed. There’s no state. It’s just text in, text out.

Almost all the famous safety benchmarks (HarmBench, JailbreakBench, StrongREJECT, AILuminate) live in this world. They’re testing whether models will produce outputs that look harmful. This is genuinely important work! But it’s measuring the behavior of a fundamentally different kind of system than what enterprises are actually deploying today.

World 2: Operational Safety

An agent receives instructions. It reads your workspace. It calls tools. It writes files. It loops. And it leaves behind a changed environment that persists into the next turn, and the next, and the next.

The safety-relevant thing here is not what the agent said. It’s what the agent did. And those can be wildly different.

Here’s the uncomfortable truth the paper surfaces: a model can refuse to generate harmful text while simultaneously making harmful changes to your files.

The refusal and the action are measured differently. Most safety infrastructure only catches the first one.

What “Boiling the Frog” actually tests

The benchmark works like this: you give an AI agent access to a sandboxed corporate workspace (it might include policy documents, access control lists, configuration files, audit logs, training records). The kind of stuff a real corporate deployment would have.

Then you run a multi-turn conversation. Early turns are totally benign. “Update this document.” “Read this file.” “Check this configuration.” The agent complies, does good work, and builds up context across the conversation.

Then the requests start to escalate. Each individual step seems like a small increment from the last. Before you know it, you’re asking the agent to weaken an approval rule, falsify a training record, or modify an access roster.

The “boiling the frog” mechanism is exactly what it sounds like: the progressive normalization of increasingly risky requests, until the agent has crossed a line it would have refused at turn one.

And the results are sobering. Across nine models:

44.4% aggregate attack success rate
The best model (Claude Haiku 4.5) still failed 20.5% of the time
The worst (Gemini 3.1 Flash Lite) failed 92.9% of the time
“Loss-of-control” scenarios hit a 93.3% success rate — meaning virtually no model avoided this category of failure

Why this is actually your problem right now

I know what some of you are thinking. “This is interesting research but it doesn’t apply to me yet.”

It does, though.

Here’s what’s happening inside organizations right now. Teams are deploying AI agents and connecting those agents to real corporate infrastructure. File systems. Shared drives. Configuration management systems. Access control platforms.

The safety story they’ve been told is: “We picked a safe model. We checked the benchmarks. We know it doesn’t produce harmful outputs.”

And that’s true! The model they picked probably does well on response safety benchmarks.

But response safety benchmarks are measuring something completely different from what they’re deploying.

The paper describes this as the shift from response risk to operational risk. And it cites real incidents that aren’t hypothetical anymore:

A Replit agent deleted a live production database during a code freeze, affecting records for over 1,200 executives
A Cursor agent wiped PocketOS’s entire production database and its backups
A Meta AI security researcher reported an agent began deleting her inbox without waiting for the approval she’d asked for

These aren’t science fiction. These are documented failures in 2024 and 2025 from teams that thought they’d picked safe models.

The model isn’t the only safety layer

The researchers introduce a framework: Model × Harness × Environment.

The model is the AI. The harness is the control layer around it. The environment is the stateful world the agent operates in.

And here’s the key finding: the same model can have dramatically different safety profiles depending on the harness.

They tested transfer across multiple agentic harnesses. When GPT-5.3 ran through the Codex MCP harness, its strict attack success rate dropped to 3.8%. But Claude Haiku stayed close to its native 20.5% ASR through Claude Code. Gemini remained highly vulnerable across all harnesses they tested.

What does this mean in practice? It means “we’re using a safe model” is not a complete safety argument. The harness is doing enormous safety work (or failing to do it).

What you should actually be thinking about

Here are the questions I’d be asking if I were evaluating agent safety for a real deployment: (I work with Airia on matching up governance capabilities with an AI control plane so I think about this a lot)

1. What can my agent actually write? Not what it will write. What it can write. Map the write surface.

2. Is my harness doing safety work? Most harnesses are designed for capability, not safety.

3. Am I measuring safe text or safe actions? Your red-teaming efforts probably focus on what the agent says. Start testing what it does. Multi-turn scenarios where each step seems benign are exactly the attack surface the paper is mapping.

4. What’s my blast radius? If an agent makes an unsafe edit to a production artifact, how do you detect it? How quickly? Can you roll it back?

The benchmark you didn’t know you needed

What I love about the “Boiling the Frog” work, and why Luiza was right to flag it, is that it’s doing the hard taxonomic work nobody else was doing. The researchers didn’t just run attacks. They built a three-level operational risk taxonomy grounded in the EU AI Act’s Annex I/III high-risk contexts and the GPAI Code of Practice systemic risk categories. [Those that know me, know that all roads lead to taxonomies and ontologies at some point :)]

That matters because it means this isn’t just an academic exercise. It’s a framework that maps to regulatory requirements organizations are already facing. If you’re dealing with EU AI Act compliance you now have a benchmark that operationalizes what the regulation is actually trying to prevent.

One more thing worth sitting with

A danger that arrives gradually may be normalized before it’s recognized as dangerous.

This is true of the attacks the benchmark tests. But it’s also true of how we’ve been thinking about agent safety in general. We deployed agents. They got more capable. They got access to more tools. We measured their safety and told ourselves things were fine.

Meanwhile the relevant failure mode shifted entirely. From what they say to what they do. From response to operation. From output to artifact.

The frog has been in the water for a while now.

Time to measure what actually matters.

A massive thank you again to Luiza Jarovsky, PhD for surfacing this paper in her feed and to the full research team:Piercosma Bisconti Lucidi, Matteo Prandi, Federico Pierucci, Federico Sartore, Enrico Panai, Laura Caroli, Yue Zhu, Adam Leon Smith, Luca Nannini, Marcello Galisai, Susanna Cifani, Francesco Giarrusso, Marcantonio Bracale Syrnikov, and Daniele Nardi for doing the work that needed doing. Read the full paper here.

Your Marketing Team Just Set Your AI Risk Classification

Andrew Clearwater — Tue, 19 May 2026 14:42:18 GMT

Published May 19, 2026. These guidelines dropped today. The EU Commission released its draft guidelines on the classification of high-risk AI systems under Article 6 of the AI Act, all three sections, published simultaneously for stakeholder consultation. I want to give you the practitioner’s version: a read of what this means for how you run your governance program starting now.

Let me tell you the most important thing first, then we’ll get into the architecture of the document.

The most important sentence in these guidelines isn’t about biometrics, law enforcement, or credit scoring. It’s this one, in Section II on general principles:

“If the instructions for use, contractual arrangements, terms of service, usage policy, promotional and sales materials, or the technical documentation present the AI system as broadly applicable across a generality of contexts and functions, and do not consistently limit its application or exclude high-risk uses, the system’s intended purpose will be deemed to also encompass high-risk use cases and therefore qualify as high-risk.”

Read that again. Slowly.

Your documentation is no longer just about product accuracy or user guidance. It is now the primary legal instrument by which your AI system will be classified as high-risk.

First, Let’s Be Clear About What These Guidelines Are (and Aren’t)

These guidelines are issued pursuant to Article 6(5) of the AI Act. They are not binding law. They represent the Commission’s interpretation of how Article 6 should be applied. Any authoritative interpretation can only come from the Court of Justice of the EU. They are also still in draft.

That said, treat them as the definitive operating manual for now. Market surveillance authorities will use them. Your counterparts in procurement will cite them. And when enforcement questions arise, the Commission’s own interpretation of its own regulation is going to matter enormously.

The guidelines are structured around the two pathways to high-risk classification under Article 6:

Pathway 1 (Article 6(1) + Annex I): Your AI system is a safety component of a product covered by EU harmonization legislation listed in Annex I of the AI Act (machinery, medical devices, vehicles, toys, lifts, radio equipment, and others), and that product is required to undergo a third-party conformity assessment.

Pathway 2 (Article 6(2) + Annex III): Your AI system is intended to be used for one of the specific use cases listed in Annex III across eight areas: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration/border control, and administration of justice.

Everything else flows from understanding these two pathways and the specific conditions that activate each one.

The Intended Purpose Doctrine: This Is the Whole Game

Under Article 3(12) of the AI Act, “intended purpose” means the use for which the system is intended by the provider, as specified in the instructions for use, promotional or sales materials, statements, and technical documentation. The Commission’s guidelines make clear that this definition is doing heavy lifting throughout the classification analysis.

Here are the specific governance implications that I think most practitioners are going to miss:

1. Broadly Described AI Systems Face a Default Presumption of High-Risk Coverage

It applies to any AI system whose documentation presents it as broadly applicable. The guidelines are explicit: if you market a system without consistently limiting or excluding high-risk uses, the Commission will treat the system’s intended purpose as encompassing those uses.

The word “consistently” matters enormously here. The guidelines state that “merely asserting (for example in the terms of service) that high-risk uses are excluded is insufficient to avoid the system from being considered high-risk, where the provider’s overall presentation, examples, or product positioning effectively provides for or promotes such uses.”

For practitioners advising AI providers: a terms-of-service carve-out does not protect you. Your use-case examples, your marketing collateral, your promotional videos, your sales decks are being evaluated as a package. A provider that puts a single sentence in its TOS saying “not for high-risk use cases” while simultaneously running a case study about HR screening or loan decisioning is not going to pass muster under these guidelines. And the same logic applies to general-purpose AI systems specifically: the guidelines say that where a GPAI system’s instructions do not consistently limit application or exclude high-risk uses, those high-risk use cases are part of its intended purpose. Merely asserting otherwise in terms of service is explicitly called out as insufficient.

2. The Self-Assessment Is the Provider’s Responsibility—But It Will Be Scrutinized

The guidelines state that the assessment of whether an AI system is intended for a high-risk use case is “the responsibility of the provider that is supervised by the relevant competent market surveillance authorities.” This is a self-assessment regime, but not a self-certify-and-forget regime.

Critically, this responsibility kicks in before placing the system on the market or putting it into service. The guidelines are clear: “it is not necessary for the AI system to be actually in use” at the point of assessment. You must have done this work prior to market entry.

What this means practically: you need to document your classification reasoning, not just your classification outcome. The guidelines are telling you what evidence authorities will ask for. Build your self-assessment records to show that you examined the intended purpose documentation coherently across all materials.

And if you conclude your system is not high-risk despite falling within an Annex III area, Article 6(4) of the AI Act requires you to document that assessment before placing the system on the market. This is an active documentation obligation, not just an internal reasoning exercise.

3. Name/Trademark Application and Other Third-Party Triggers

Article 25(1) of the AI Act gets significant attention in the guidelines. Three scenarios trigger provider obligations for parties downstream in the value chain:

Affixing name or trademark: Placing your name or trademark on a high-risk AI system already on the market or in service
Substantial modification: Making changes to a high-risk AI system in a way that it remains a high-risk AI system
Purpose modification: Changing the intended purpose of a non-high-risk AI system in a way that makes it high-risk under Article 6

For governance practitioners working with deployers who integrate third-party AI: this is where your vendor due diligence questions need to focus. If your client modifies an AI system they may have assumed provider obligations under the AI Act without knowing it.

The Article 6(1) Safety Component Analysis: Two Tests, Not One

For AI systems embedded in physical products, the guidelines work through a rigorous two-prong test for whether an AI system qualifies as a “safety component” under Article 3(14) of the AI Act. The definition covers components that either fulfill a safety function or whose failure or malfunction would endanger health and safety of persons or property. These are independent grounds.

The guidelines are explicit that the Article 3(14) definition is an autonomous AI Act definition, independent of “safety component” definitions in any sector-specific harmonization legislation. It applies uniformly across all sectors listed in Annex I.

Prong 1: Safety Function (Intent-Based)

An AI system fulfills a safety function where its intended purpose, as determined by the provider, is to prevent or mitigate risks to health and safety of persons or property.

The guidelines provide a useful taxonomy. Functions that qualify:

Monitoring and detection of situations that may lead to physical harm (e.g., detecting abnormal system behavior)
Monitoring and detection of maintenance needs where failure to act could lead to harm
Prevention of physical harm (e.g., preventing system startup if anomalous behavior is detected)
Supervision or control of another system that performs a safety function

Functions that explicitly do not qualify:

Performance optimization where failure wouldn’t directly endanger health or safety
Service efficiency optimization (billing, customer claims processing)
Quality control of non-safety-related functions

The key line from the guidelines: “The mere fact that an AI system is integrated into or operates within a product that is subject to safety regulation does not, in itself, mean that it fulfils a safety function.” This is a targeted, purpose-specific test.

Prong 2: Failure or Malfunction Endangerment (Consequences-Based)

This prong captures AI systems that weren’t intended for safety functions but whose failure could nonetheless create safety hazards. Failure or malfunction includes incorrect outputs (false negatives or false positives), loss of function or availability, performance drift, timing errors, and misclassification leading to hazardous control decisions.

The guidelines draw a sharp line: “endangerment” of health, safety, and property does not include reputational harm, purely financial loss, minor service degradation, or inconvenience that does not involve a safety hazard.

The Commission provides a clarifying example that shows exactly how to think about this: an AI system designed to optimize combustion efficiency in household gas appliances has an intended purpose of energy efficiency. But if the product design is such that a failure or malfunction could lead to carbon monoxide formation, explosion, or fire, the system qualifies as a safety component under this prong. Contrast this with an AI system that merely optimizes heating schedules based on household habits which falls outside both prongs.

For practitioners advising industrial, automotive, HVAC, or building automation clients: you need to map your AI systems’ failure modes, not just their intended functions. The failure-mode analysis belongs in your technical documentation.

The Third-Party Conformity Assessment Requirement—and a Common Misread

The third element of Article 6(1) classification is that the product must be required to undergo a third-party conformity assessment. This is where the guidelines make a point that I expect will surprise practitioners accustomed to product safety work.

Decision No 768/2008/EC establishes Module A as the conformity assessment module for products of low complexity that present a low risk for the public interest and Module A allows the manufacturer to use internal control without any notified body involvement. Some harmonization legislation allows Module A use without mandatory application of harmonized standards for certain aspects.

However, other legislation conditions the use of Module A on the mandatory application of harmonized standards published in the Official Journal. The Commission’s position is that this mandatory application of harmonized standards, as a legal precondition for module selection, constitutes a form of “enhanced regulatory scrutiny” equivalent in effect to third-party conformity assessment for purposes of AI Act classification. The product type is subject to this scrutiny by law, regardless of which conformity module the individual manufacturer ultimately selects.

This classification logic is expressly confirmed in Recital 15 of the Toys Safety Regulation, which states that the manufacturer’s choice to opt out of direct notified-body involvement where harmonized standards have been applied does not affect the system’s classification as high-risk AI under Article 6(1).

The governance takeaway: for products in scope of the Machinery Regulation or Toys Safety Regulation (and potentially others), you cannot engineer around the high-risk classification by choosing a lighter-touch conformity module. The classification follows the product type, not the manufacturer’s procedural choice.

The Article 6(2) Annex III Analysis: Eight Areas and the Issues Practitioners Will Miss

The eight Annex III areas are: biometrics; critical infrastructure; education and vocational training; employment; essential services and benefits; law enforcement; migration, asylum and border control; and administration of justice and democratic processes.

But several horizontal doctrines apply across all eight areas that will be the real battleground in governance practice.

Human Oversight Does Not Change Your Classification

This deserves its own heading because it’s the issue I expect to generate the most confusion.

The guidelines are unequivocal: “To assess whether an AI system qualifies as high-risk under Article 6(2) AI Act, the only relevant determinant is whether the intended purpose of the system includes one of the use cases listed in Annex III AI Act. Since human involvement cannot change the purpose and area in which a system is intended to be used, it has no effect on the classification of the system as high-risk.”

The Commission then makes the point explicitly: “The provider cannot exempt and categorise an AI system as ‘low risk’ simply by adding to it a requirement for human involvement.”

So the “human in the loop” argument doesn’t work here. Human oversight is a compliance requirement for high-risk systems under Article 14. If you’re advising a client who is planning to use HITL as the basis for avoiding high-risk classification, that strategy is squarely rejected by these guidelines.

That said, human involvement can be relevant as evidence for the Article 6(3) filter conditions. The distinction matters: you’re not arguing “there’s a human present, so it’s not high-risk.” You’re arguing “the system is designed only for narrow procedural tasks, and the human involvement is evidence of that design.”

The Article 6(3) Filter Mechanism: Your Actual Escape Valve—and Its Limits

The filter mechanism is the real way to avoid high-risk classification even when your intended purpose falls within an Annex III use case. Under Article 6(3), a provider may exempt a system from high-risk classification if it can demonstrate at least one of four conditions:

(a) The AI system is intended to perform a narrow procedural task (e.g., transforming unstructured data into structured data, classifying incoming documents into categories, detecting duplicates)

(b) The AI system is intended to improve the result of a previously completed human activity (e.g., flagging errors in finalized human work for quality assurance, without providing a materially different result)

(c) The AI system is intended to detect decision-making patterns or deviations from prior patterns and is not meant to replace or influence the previously completed human assessment without proper human review

(d) The AI system is intended to perform a preparatory task to an assessment relevant to an Annex III use case (i.e., occurring before the assessment process, with very low potential impact on the assessment that follows)

Three important constraints practitioners need to know about this mechanism:

First, the conditions must be interpreted narrowly because Article 6(3) is an exception from rules protecting fundamental rights. The guidelines are explicit on this. Drafting filter arguments broadly is not a viable approach.

Second, the filter is blocked entirely for systems involved in complex architectures. Even if a component technically meets one of the four conditions on its own, it cannot benefit from the filter if it forms part of a complex system where combined outputs materially influence an individual decision within a high-risk use case. This is the anti-circumvention rule: you cannot decompose a high-risk workflow into individually exempt components.

Third, the filter mechanism categorically does not apply to AI systems that perform profiling of natural personswithin the meaning of Article 4(4) of the GDPR, Article 3(4) of Directive (EU) 2016/680 (the Law Enforcement Directive), or Article 3(5) of Regulation (EU) 2018/1725. If your system performs automated processing of personal data to evaluate personal aspects of individuals—such as analyzing work performance, economic situation, health, personal preferences, or behavior—it is always high-risk if it falls within an Annex III area, regardless of which filter condition you claim.

The filter mechanism also only applies to Article 6(2) Annex III systems. It has no application to Article 6(1) Annex I systems.

When the filter is applied, Article 6(4) requires the provider to document that assessment before placing the system on the market.

Agentic AI and Complex Systems: The Anti-Fragmentation Rule

This provision will surprise the most organizations deploying agentic AI.

The guidelines state: “Where several AI systems form part of a more complex AI system, so that their combined intended purpose or joint outputs materially influence an individual decision, the combined configuration is treated as a single AI system for the purpose of high-risk classification.” The guidelines explicitly extend this principle to “agentic AI systems that coordinate and interact through linked actions as long as these linked actions or components serve in conjunction an intended high-risk purpose.”

The practical implication: you cannot decompose a high-risk AI workflow into individually-exempt components and argue that no single component is high-risk. The anti-fragmentation principle evaluates the combined configuration.

The guidelines do provide one meaningful carve-out: AI-enabled functions that are “genuinely separable, put into service independently from that system and that do not contribute to a high-risk purpose are out of scope from the high-risk classification.” So the test is whether the component genuinely stands apart or whether it feeds into the combined output that influences high-risk decisions.

For practitioners advising organizations deploying AI agents in HR, benefits adjudication, or financial services contexts: your governance review needs to assess the combined system, not the individual models.

The “On Behalf Of” Clause: B2B Providers Serving Public Sector Clients

Several Annex III use cases apply to AI systems used by public authorities or on their behalf. The guidelines clarify that “on behalf of” coverage extends to private entities where a public authority delegates the performance of activities to that entity or has requested the entity to support such activities in specific cases.

However—and this is a significant carve-out—a private entity that acts on its own behalf to comply with a legal obligation is not acting “on behalf of” a public authority. The guidelines are concrete: an accounting firm deploying an AI system to detect money laundering in order to comply with its own obligations under EU anti-money laundering legislation is acting on its own behalf, not on behalf of law enforcement authorities. That system would not be classified as high-risk under the law enforcement use cases in Annex III.

For compliance technology vendors serving financial institutions: if your client is deploying AI to fulfill its own regulatory obligations rather than to perform law enforcement functions on behalf of government, you may be outside the law enforcement Annex III scope entirely.

The Timeline Has Shifted: What You Need to Know Right Now

The guidelines reflect important changes to the application dates. The original Article 113 of the AI Act provided that Article 6(2) and corresponding high-risk obligations would apply from 2 August 2026, while Article 6(1) obligations would apply from 2 August 2027.

The Commission states in these draft guidelines that both dates are now postponed under the AI Omnibus. The Commission’s guidelines already treat these postponed dates as operative:

Article 6(2) Annex III obligations: now slated to apply from 2 December 2027
Article 6(1) Annex I obligations: now slated to apply from 2 August 2028

Additional transitional provisions under the AI Act itself:

High-risk AI systems deployed for public authorities must comply by 2 August 2030
Legacy AI systems in large-scale IT systems listed in Annex X (major EU database systems) must be brought into compliance by 31 December 2030

For practitioners: the postponement creates a longer runway, but the classification analysis should still be done now. The obligations triggered on those dates require compliance documentation, conformity assessments, and risk management systems to be in place before the deadline. Classification is the prerequisite for all of that work. If your clients haven’t started, the extended timeline is a gift not a reason to wait.

Five Governance Process Changes You Should Make Based on These Guidelines

1. Conduct a full documentation audit against the intended purpose standard.

Pull every piece of documentation for your AI systems: technical documentation, instructions for use, marketing materials, promotional content, sales decks, case studies, website copy, contract language. Evaluate it as a package against the question: does this documentation consistently describe a specific, limited intended purpose, or does it present the system as broadly applicable without excluding high-risk uses?

Where you find inconsistency that inconsistency is now a legal risk. Either narrow the marketing or expand the technical documentation to acknowledge and manage the broader use cases.

2. Build a purpose-function matrix for each AI system.

For each AI system in scope, create a matrix that maps: (a) intended purpose as documented; (b) all Annex III use case categories; (c) which categories the system’s intended purpose could plausibly intersect; and (d) the Article 6(3) filter analysis for each intersection, including which condition applies and why it should be interpreted as meeting the “narrow” standard. Where you apply the filter, document the reasoning under Article 6(4) before market placement.

This is your classification self-assessment record. It should be reviewed by legal counsel and updated whenever the system’s documentation or functionality changes.

3. For physical product AI: run failure mode analysis against the Article 3(14) second prong—not just intended-function analysis.

The safety component analysis for Article 6(1) requires you to examine not just what the system is designed to do, but what happens if it fails. Commission the failure mode and effects analysis (FMEA) specifically to answer the Article 3(14) question: does failure or malfunction of this AI system endanger health and safety of persons or property?

This analysis belongs in your technical documentation as evidence for the classification rationale.

4. For organizations deploying agentic AI: map your entire agent pipeline for classification purposes.

Don’t analyze individual models or components in isolation. Map the full pipeline: what are the combined outputs that materially influence individual decisions? Which Annex III use cases do those decisions touch? The anti-fragmentation rule means your governance perimeter must be drawn around the whole system. Any component that contributes to high-risk outputs as part of a coordinated agent system is swept into the classification.

5. For deployers of third-party AI: update your vendor due diligence to capture Article 25(1) risk.

Your vendor AI intake process should now include: (a) has the vendor classified this system as high-risk? (b) does your intended use match the vendor’s documented intended purpose? (c) does your deployment configuration constitute a substantial modification of the system? (d) does your intended use fall within Annex III in ways the vendor’s original documentation did not contemplate?

If the answer to (c) or (d) is yes, your organization may have become a provider with full AI Act compliance obligations without having built systems to handle any of them.

What to Do Right Now During the Consultation Period

These guidelines are in draft. The Commission is seeking stakeholder feedback through the AI Act Service Desk on the Single Information Platform before finalizing the text.

If you have concerns about specific examples this is the moment to submit detailed, technically grounded feedback. Vague concerns won’t move the needle. Concrete examples of where the Commission’s illustrative use cases produce unclear or disproportionate outcomes will.

Areas where I think the guidance creates genuine interpretive friction worth raising:

The intended-purpose doctrine for broadly-scoped AI systems creates significant uncertainty for any provider whose system serves multiple contexts. The line between “consistently limiting” use and merely providing a TOS carve-out needs more concrete examples.
The complex systems / anti-fragmentation rule for agentic AI needs clearer guidance on what threshold of output contribution makes a component part of the combined configuration subject to classification.
The Article 6(3) filter conditions are described in the guidelines as requiring “narrow” interpretation, but the illustrative examples themselves are relatively generous. More concrete examples at the boundary—particularly for conditions (b) and (d)—would give practitioners something to work with.

Review the materials, form a view, and participate. These guidelines, once final, will be the reference document for market surveillance authorities across 27 Member States.

Bottom Line

The Commission has given us a detailed, technically sophisticated interpretation of Article 6. The guidelines contain real operational value for governance practitioners.

The central insight is this: AI governance under the EU AI Act is fundamentally a documentation and process discipline. The classification that follows a system into the market is determined by what providers write down about their systems, consistently, across every channel.

Your technical documentation has always mattered. Under these guidelines, it now matters more.

These guidelines are published for stakeholder consultation and are not yet final. The AI Omnibus has a provisional political agreement as of May 2026 but is pending formal adoption. The analysis above is based on the draft documents published by the European Commission on May 19, 2026. Nothing in this post constitutes legal advice. Consult qualified EU law counsel before making compliance decisions based on this analysis.

Want to go deeper? The full documents are available from the EU’s Digital Strategy library. The general principles, the Annex I guidance, and the Annex III guidance are published as separate downloads.

The EU Just Hit Snooze on AI Regulation

Andrew Clearwater — Mon, 11 May 2026 19:58:58 GMT

On May 7, 2026, the EU Council and Parliament struck a provisional deal to delay and simplify the AI Act’s high-risk rules. The end.

Except, that’s not the end. That’s barely the beginning of what matters for people actually doing this work.

First, the Actual Changes (Fast)

The deal, called the Digital Omnibus on AI, makes four moves worth knowing:

Timelines got pushed. Annex 3 high-risk AI systems (your employment screening tools, your education platforms, your biometric systems) just got a reprieve from August 2026 to December 2, 2027. Annex 1 (AI baked into regulated physical products like medical devices and toys) moved from August 2027 to August 2028.

SMCs got a break. “Small mid-cap enterprises” is a new category invented by this process. If you’re in that band, your compliance burden just got trimmed. If you’re a larger organization, you’re still in the full framework.

A new hard prohibition landed. AI systems generating non-consensual intimate imagery or CSAM are explicitly banned. This wasn’t in the original Act at that level of specificity.

Synthetic content transparency got accelerated. The grace period for transparency obligations on AI-generated content was cut from 6 months to 3 months, with a new deadline of December 2, 2026.

One more thing: this is still provisional. Both Parliament and Council need to formally adopt it before August 2, 2026.

Now, Here’s What Most People Are Missing

Everyone’s treating the delay as the story. The delay is not the story.

The story is that the underlying compliance architecture hasn’t changed. The Omnibus didn’t rewrite the AI Act. It moved some dates and trimmed some edges. The core risk classification logic, the obligations for high-risk systems, the documentation requirements, the human oversight mandates is still coming. The compliance mountain didn’t shrink. You just got a little more time to climb it.

There’s also a signal embedded in what didn’t get delayed. GPAI model obligations (for you general-purpose AI developers) already applied as of August 2025. Prohibited practices? Already in force since February 2025. The EU is not backing away from AI governance. It’s calibrating the rollout.

And here’s the part that keeps me up at night: the standards and technical specifications that companies need to achieve compliance are still not fully developed.

The Deeper Pattern Here

Here’s a frame that’s more useful than “Europe is slowing down on AI governance.”

What you’re actually seeing is the classic tension between regulatory aspiration and operational reality playing out in real time.

The EU wrote an extraordinarily ambitious law in 2024. It was forward-looking, comprehensive, and honestly pretty impressive in scope. Then the people who actually had to implement it started asking hard questions, like: what are the standards? what do the conformity assessments actually look like? how does this interact with our existing sectoral obligations? and discovered the supporting infrastructure wasn’t there.

What You Should Be Doing Right Now

If you’re working in AI governance, compliance, or strategy at an organization with EU exposure, here’s my take on actions:

1. Don’t stop your risk classification work. If you’ve been auditing your AI systems to figure out which ones fall into Annex 1 or 3, keep going. The delay doesn’t change the categories.

2. Get your AI inventory documented now. The EU database registration requirement is back in this agreement, even for systems claiming exemption from high-risk classification, you have to register.

3. Mark December 2, 2026 on your calendar. That’s when the transparency obligations for AI-generated content apply. Three months is not a lot of runway to implement watermarking and disclosure workflows at scale.

4. Follow the standards pipeline closely. CEN-CENELEC is developing the harmonized standards the EU is waiting for. When those standards land, the compliance roadmap gets much clearer. Hopefully…

5. Use this window to build governance infrastructure, not just compliance checkboxes. The organizations that will be well-positioned in 2027 are the ones building real AI governance programs.

Bottom Line

The EU AI Act is not going away. The compliance clock is not being reset to zero. What changed is that a 16-month buffer just appeared, and you need to use it strategically.

Primary Sources

Official EU Institutions

Standards Pipeline (for the compliance architecture watchers)

The AI Governance Stack Has Holes in It.

Andrew Clearwater — Mon, 04 May 2026 14:41:18 GMT

A few weeks ago I covered NIST AI 800-4, the first federal-level report on the gaps in post-deployment AI monitoring. Here’s what I concluded: the value of that document wasn’t what it prescribed. It was what it admitted the field doesn’t know yet. That kind of honest accounting is rare. That said, NIST 800-4 only covered one stage in the AI risk management lifecycle

A new paper just dropped that maps the other four. What they found is not reassuring. The paper is called Open Problems in Frontier AI Risk Management. It’s dense and academic but the information is valuable for practitioners. Here’s what actually matters for enterprise leaders and builders making AI decisions right now.

The Common Story (and Why It’s Wrong)

The common story in enterprise AI governance right now is something like this: we have NIST AI RMF, we have ISO 42001, we have the EU AI Act. We have a framework. The job is implementation. The reality is more uncomfortable.

Those frameworks were mostly designed for narrow AI, the kind of AI that does one specific thing in a bounded context. A fraud detection model. A recommendation engine. A document classifier. The standards that govern those systems were written before frontier AI existed, before models that can write code, conduct research, draft legal briefs, and operate autonomously across multi-agent pipelines were anywhere near production.

Frontier AI doesn’t fit in those boxes. The 30 researchers who wrote this paper are saying, clearly and systematically and with citations, that the mismatch isn’t a minor calibration issue. It’s structural.

They catalogued unresolved problems across five stages of the AI risk management lifecycle. I counted. It’s more than two dozen. And they classified each one by type: problems where there’s no scientific consensus yet, problems where frontier AI actively breaks the established framework, and problems where there’s theoretical agreement but no one actually knows how to implement it.

That third category is the one that should make enterprise leaders uncomfortable. “Everyone agrees this matters but no one knows how to do it” describes a lot of what you’re probably already dealing with.

The Five-Stage Breakdown. And Where Each One Breaks.

Here’s the lifecycle the paper uses, adapted from ISO 31000, the international risk management standard. This is the standard your compliance team references when they talk about AI risk management. Walk through each stage and see where the problems live.

Stage 1: Risk Planning. You Can’t Scope What You Can’t Define.

Risk planning is supposed to answer: what system are we talking about, who does it affect, and what does acceptable risk mean?

For narrow AI, this is manageable. You define the system, the use case, the users. Done.

For frontier AI, the paper identifies four open problems at this stage, and the most fundamental one is this: no one has a reliable way to define what the system actually is.

Frontier AI systems are general-purpose. They’re modular. They’re reused across contexts. They get fine-tuned by downstream deployers who didn’t build the base model. The boundary between developer and deployer collapses, and when it does, so does accountability. Who owns the risk when a foundation model gets fine-tuned and deployed in a medical context by a company that had nothing to do with training it?

The paper puts it plainly: there’s no standardized way to enumerate dependencies or interface responsibilities when scoping frontier AI systems. That creates what they call blind spots at integration points, which is a polite academic way of saying nobody’s watching the seams.

The second problem at this stage is risk acceptance criteria. Traditional safety-critical industries like aviation and nuclear define acceptable risk in concrete terms. The aviation industry says the probability of a catastrophic failure should not exceed 1 in a billion per flight hour. That’s a number. You can measure against it.

Frontier AI developers mostly use capability thresholds as proxies for risk: if the model can do X, trigger mitigation Y. The paper’s critique is precise. Capability thresholds measure what a model can do, not the actual probability and severity of harm. Those are not the same thing. Building your risk acceptance framework on a proxy instead of the real measure means your mitigation decisions are always one step removed from what actually matters.

Stage 2: Risk Identification. You Can’t Find What You Don’t Know to Look For.

This stage is about systematically finding risk sources before they find you.

The paper identifies two open problems here, and the second one is the one I keep coming back to: the techniques we use to identify risks were designed for bounded, deterministic systems.

Hazard and Operability Study (HAZOP). Failure Mode and Effects Analysis (FMEA). Fishbone analysis. These are powerful tools for systems that behave predictably within defined parameters. Frontier AI is not that. Its risks emerge from non-linear interactions, from deployment context, from how humans use it over time, from multi-agent dynamics nobody fully understands yet.

The researchers are honest about this: we don’t have good methods for identifying risks that emerge from complexity, adversarial use, and sociotechnical diffusion. We’re applying 20th-century tools to 21st-century systems and then acting surprised when we keep missing things.

This connects directly to what NIST 800-4 documented about behavioral drift in production. The monitoring report found that tracking how human behavior changes through sustained AI interaction is the least mature monitoring category in the field. But the reason that monitoring is so hard is partly because risk identification upstream never built a model for it. If you don’t conceptualize a risk category before deployment, you won’t build the infrastructure to watch for it afterward.

Stage 3: Risk Analysis. The Data You Have Isn’t the Data You Need.

This is the stage where NIST 800-4 lives, and where you already know the picture is bad. The paper identifies eight open problems at this stage, more than any other stage. That’s not a coincidence. Risk analysis is where theory meets practice, and the gap is widest.

A few that matter most for enterprise decision-makers:

Capability assessments measure the wrong thing. The evaluations frontier AI developers use, the benchmarks, the red-team results, the safety evals, measure what a model can do in controlled conditions. They don’t measure real-world risk. They don’t capture how the model behaves under adversarial pressure in a production environment over months. They don’t account for the difference between a model evaluated with limited compute and scaffolding and that same model deployed with full production resources. You’re making deployment decisions based on data that systematically understates the model’s actual capability, which means you may also be understating the risk.

External assessments have a structural independence problem. The paper is blunt on this one: many assessments described as external are actually hybrid arrangements where the developer selects and finances the assessor, defines the scope, and may be able to veto publication of negative results. That’s not independence.

Post-deployment monitoring is fragmented. The data you need to understand real-world risk comes from three places: model integration and usage data, application-level usage data, and impact and incident data. Repositories like the AI Incident Database and the OECD AI Incidents Monitor exist precisely because this data isn’t flowing through any centralized channel. Right now, each stream is collected separately, incompletely, and without standardization. You can’t build a coherent risk picture from three separate silos with no agreed methodology for combining them. NIST 800-4 documented the monitoring gaps in detail. This paper documents the upstream reason those gaps exist.

Stage 4: Risk Evaluation. Accepting Risk You Haven’t Measured.

Risk evaluation is where you take your analysis and decide: acceptable or not?

The problem the paper identifies here is that frontier AI developers are making that judgment with inconsistent criteria, applied inconsistently, without safety margins, and without any agreed method for rolling up individual risk decisions into an overall deployment readiness conclusion.

In aviation, the FAA sets the acceptable failure rate, not individual airlines. In frontier AI, each developer sets their own thresholds. Compare Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, and Google DeepMind’s Frontier Safety Framework side by side. Each is serious work. None uses the same criteria. That’s not necessarily wrong. Reasonable people disagree on acceptable risk levels. But it means external stakeholders, your legal team, your board, your regulators, have no basis for comparison. When every developer uses different criteria, “we evaluated this model and found it acceptable” carries almost no signal.

The aggregate risk problem is the most underappreciated piece of this stage. Even if you correctly evaluate each individual risk a system poses, the paper points out there’s no agreed methodology for combining those evaluations into a judgment about the system as a whole. Does one unacceptable risk make the entire system unacceptable? How do you weigh a low-probability catastrophic risk against a high-probability moderate harm? There’s no standard. The field is improvising.

Stage 5: Risk Mitigation. The Controls You’re Relying On Are Fragile.

The paper organizes mitigations into four levels: data, model, system, and ecosystem. The open problems at each level share a common theme: we don’t know how durable any of these controls actually are under adversarial conditions.

Data-level mitigations, which filter training data to prevent models from learning harmful capabilities, are appealing because they’re upstream. The problem is that the relationship between what you filter out of training data and what capabilities the model ends up with is poorly understood. The research suggests you might successfully filter out complex dangerous capabilities in specialized domains. You’re much less likely to successfully filter for simpler behavioral tendencies like toxicity.

Model-level mitigations, including RLHF, fine-tuning, and machine unlearning, are where most enterprise AI safety investment goes. The paper’s critique here is pointed: existing unlearning techniques suppress harmful capabilities in normal conditions, but adversarial users can reliably surface those capabilities through prompting, fine-tuning, or targeted attacks. The mitigations are real. The durability of those mitigations under sustained adversarial pressure is not demonstrated.

System-level guardrails face a version of the same problem. They work in the deployment context they were designed for. As usage patterns evolve, as users learn to probe edge cases, as the models themselves get updated, the alignment between the guardrail logic and the actual risk landscape degrades. Nobody has a good answer for how fast that degradation happens or how to detect it before it matters.

What This Means for Your Organization Right Now

If you are an enterprise leader buying or building AI governance infrastructure in 2026, you are operating in a field where the foundational questions are still open. The frameworks you’re using were designed for different technology. The evaluations you’re relying on measure proxies. The controls you’re deploying haven’t been tested for durability.

That’s not a reason to stop. AI is moving whether you govern it or not. The organizations that win are the ones who build governance infrastructure that can adapt as the answers to these open questions emerge. The ones who lose are the ones who hardcode their programs to a framework that’s already misaligned with the technology it’s supposed to govern.

Three things you should do differently based on this paper:

1. Stop treating compliance as a point-in-time certification. The paper’s taxonomy of open problems, covering lack of consensus, framework misalignment, and implementation gaps, tells you that this field is actively moving. Your governance program needs to be built on adaptable infrastructure, not static checklists. If your AI governance vendor can’t update their framework faster than the regulatory landscape changes, that’s an exposure, not a feature.

2. Ask harder questions about your evaluations. When a model vendor shows you capability assessment results, the right question isn’t what did the model score. It’s what does this score tell us about real-world risk, and what specifically does it not tell us. Those are different questions. The paper is clear that current evaluations systematically understate deployment-context risk. Build that caveat into how you interpret and act on results.

3. Weight your post-deployment data more heavily than your pre-deployment data. NIST 800-4 documented how broken monitoring infrastructure is today. This paper documents why the pre-deployment governance stack has known gaps that current methods can’t fully close. The practical conclusion from reading both together: the signal from actual deployment, how users interact with the system, what incidents occur, how behavior shifts over time, is often more reliable than the pre-deployment evaluation record. Your governance posture should reflect that.

The Honest Bottom Line

The value of the paper isn’t in what it solves. It’s in what it names. Because you can’t fix what you haven’t acknowledged is broken. NIST 800-4 named the monitoring gap. This paper named the other 27.

The organizations that take this seriously, not as a compliance exercise but as a strategic signal about where the field is heading, are the ones that will build governance infrastructure that actually holds up. The ones that don’t will spend the next three years patching the gaps this paper already mapped.

Focus on active governance, not static governance.

Primary Sources: Go Read These Yourself

The two papers this article is built on:

Open Problems in Frontier AI Risk Management — Ziosi et al., Oxford/MIT/Stanford, 2026. The full paper. Start with the introduction and the open problems boxes at the end of each section.
NIST AI 800-4: Challenges to the Monitoring of Deployed AI Systems — NIST CAISI, March 2026. Read the executive summary and Table 3 on category-specific challenges.

The underlying governance frameworks:

NIST AI Risk Management Framework — The voluntary framework most enterprise AI governance programs are built on. The Generative AI Profile companion (NIST AI 600-1) is worth reading alongside it.
ISO 31000:2018 Risk Management Guidelines — Paywalled, but the overview is free. This is the foundation the Oxford paper’s five-stage structure is built on.

Developer safety frameworks — compare these side by side:

Anthropic Responsible Scaling Policy — Regularly updated. The most publicly detailed of the major developer frameworks.
OpenAI Preparedness Framework — Their capability threshold and risk evaluation approach.
Google DeepMind Frontier Safety Framework — Worth comparing against the Anthropic and OpenAI approaches directly.

Incident and risk data sources:

AI Incident Database — The closest thing the field has to a shared incident record. Useful for understanding what post-deployment failures actually look like.
OECD AI Incidents Monitor — The international complement to the AIID.
MITRE ATLAS Matrix — AI-specific attack tactics catalogued across the system lifecycle. Useful for risk identification work.

Transparency Is the New Security Perimeter

Andrew Clearwater — Tue, 21 Apr 2026 13:30:33 GMT

The common story: regulators want explainable AI, so you stand up SHAP plots, ship model cards, publish a responsible AI statement, and check the box.

The reality? CEN-CENELEC JTC 21 just quietly split EN18229-1 into two parts and that split is the most important signal governance teams are ignoring this quarter.

Here’s what the split actually means:

Part A (Logging): “most advanced” nearly ready for Public Enquiry
Part B (Transparency + Human Oversight): “under development” still being figured out

Translation: the standards body that spent two years trying to write one unified standard just admitted that logging is a solvable systems engineering problem, and transparency/oversight is a fundamentally different challenge that can’t ship on the same timeline.

If you’re still running transparency as an explainability workstream (ie. model cards, a few compliance docs) you’re building against a standards architecture that no longer exists. And you’re about to find out the hard way.

The Framing Mistake Every Governance Team Is Making

Transparency is not an explainability feature. Transparency is a systems engineering capability. It depends entirely on whether your logging infrastructure captures the right things, at the right granularity, at the right time.

Here’s the test. When a regulator subpoenas your AI system’s behavior from 14 months ago, can you reconstruct:

The exact model version running (base + fine-tune + RLHF snapshot + system prompt version)
The full input context (retrieved documents, tool outputs, user history loaded into context)
The full decision trace (intermediate reasoning if it’s a reasoning model, tool calls if it’s an agent)
The final output AND the rejected alternatives the model considered
The human-in-the-loop overrides and the rationale captured at the time
The system state (feature flags, A/B test variants, guardrail configs, rate-limit states)

If you can’t answer yes to all six, you don’t have a transparency problem. You have a logging problem. And no amount of LIME visualizations will fix it.

This is where most governance programs are quietly underwater. They’re auditing explainability artifacts that were never the real compliance object.

Why Most Governance Teams Are Logging Wrong

1. Teams treat logging like observability. They pipe traces into Datadog, set up dashboards, call it done. But observability is optimized for engineering debugging. Compliance logging is optimized for legal defensibility. These are architecturally incompatible pipelines. Most orgs have one budget line and one team responsible for both, and the budget line is sized for the easier one.

2. Teams log outputs without provenance. “User asked X, model returned Y.” They don’t record why Y was selected from the distribution, what other candidates ranked, what the RAG pipeline retrieved, what the pre/post filters modified, which guardrails fired. Six months later, when a regulator asks “why did your system deny this loan applicant?”, the log says “model returned denial.” That’s not a defense. That’s a confession of ignorance.

3. Teams don’t log human oversight. The AI Act requires meaningful human oversight. It’s an auditable act. Did a human actually see the output? For how long? Did they override? Under what time pressure? Were they the 347th approval of the day? If your logs don’t capture oversight quality, you have theatrical oversight. Regulators are learning to distinguish the two.

4. Teams ignore third-party model provenance. If you call a frontier model API, the provider logs on their side. You log on yours. Neither log is complete. When the discrepancy surfaces in litigation, whose logs are authoritative? This is the distributed evidence chain problem, and it’s structurally unsolved for every organization depending on frontier APIs.

The Unexpected Insights Governance Experts Need to Internalize

Your logs are plaintiffs’ evidence. Architect accordingly.

The better you log, the more exposure you create unless you architect for defensible discovery. Smart governance teams are now designing logs with:

Structured fields opposing counsel can actually query (not JSON blobs they can distort)
Immutable hash chains to prove non-tampering (and prove tampering when it happens)
Segregated sensitive fields under separate retention and access policies
Pre-defined export formats for regulator and litigation requests

Teams logging “whatever the SDK emits by default” are building evidence mountains they can’t navigate and can’t defend. The first plaintiffs’ firm that figures out how to subpoena raw trace data from a major AI deployment will define the rules of AI litigation for the next decade. You want your logs to look like a clean audit trail, not a forensic goldmine.

Logging is in direct tension with GDPR. Most teams are one DPA inquiry away from catastrophe.

Comprehensive AI Act logging is more personal data retained. GDPR data minimization says the opposite. Most governance teams haven’t written the DPIA that reconciles these two obligations.

The resolution isn’t “log less” or “log more.” It’s architectural: logs of decisions and reasoning that don’t require re-storing the underlying personal data. This is a specific design pattern and almost nobody is implementing it. The teams that do will have the cleanest dual-compliance posture in Europe.

Agentic AI detonates every logging architecture built for single-turn inference.

If your logging strategy was designed for “user query → model response,” agentic systems will destroy it. An agent making 47 tool calls, branching on intermediate results, spawning sub-agents with their own traces, operating over 90 minutes, generates a log topology that looks more like a distributed systems trace than an inference record.

Governance teams that haven’t extended their logging schema to agentic workflows are running blind on their highest-risk deployments. And the standards aren’t ready yet. That means you will write the internal schema.

Human oversight logs are the next audit target.

The AI Act requires effective human oversight for high-risk systems. Regulators and plaintiffs are going to start asking:

How many overrides per approval batch?
What’s the median review time per decision?
What’s the false-accept rate of your oversight pipeline under load?
How does oversight quality degrade over the course of a reviewer’s shift?

If your oversight logging captures only “human clicked approve,” you’re not measuring oversight. The teams building telemetry around oversight quality will include dwell time, override rates, disagreement patterns with the model, and fatigue indicators.

Insurance will force the issue faster than regulators.

AI E&O and cyber insurance products are starting to require standardized logging evidence before underwriting. Not “do you log?” but “do you log in a schema our adjusters can audit, with retention that matches claim statute of limitations, with integrity guarantees we can rely on?”

The first major AI liability claim settled based on logging evidence will establish the market standard for what “insurable logging” means. Your insurance broker will become your first real compliance auditor.

The standards split is itself an admission that transparency is unsolved.

Here’s the insight the standards drafters aren’t saying out loud: they split EN18229-1 because transparency and human oversight are socio-technical problems that don’t have engineering answers. Logging has a schema. Transparency has a judgment call. Human oversight has a philosophy.

Why the Split Matters More Than the Standards Themselves

The logging standard will ship first. It will become the compliance baseline. Organizations will be audited against it within 24 months of publication. Teams with mature logging will pass. Teams without will spend 18 months in retrofit hell rebuilding pipelines under regulator supervision, with insurance premiums rising each quarter.

Meanwhile, the transparency and oversight standards will keep evolving. Every iteration will add requirements that depend on logging capabilities you either already have, or don’t. The teams with mature logging will adapt in sprints. The teams without will fight a two-front war: rebuilding infrastructure while chasing moving transparency targets.

You’re not choosing between “log now” and “log later.” You’re choosing between “build the foundation once” and “rebuild it three times under pressure.”

What You Can Do Today

Here’s the playbook for governance experts reading this. None of these require waiting for final standards. All of them compound:

1. Run a subpoena simulation this quarter. Pick a production inference from 6+ months ago. Attempt to reconstruct the full decision trace from logs alone. Whatever you can’t reconstruct becomes your 2026 roadmap.

2. Separate observability logging from compliance logging. They’re different systems with different optimization targets. Stop trying to make your engineering telemetry tool your AI Act compliance substrate. Stand up a parallel compliance pipeline with different retention, different access controls, and schemas designed for regulator queries, not engineering debugging.

3. Draft your internal logging schema against the EN18229-1 Part A draft. Organizations that draft against the current draft can submit substantive comments during enquiry and shape the final standard. Being a standards-shaper is worth an order of magnitude more than being a standards-follower. And shaping comments are the cheapest lobbying you’ll ever do.

4. Instrument oversight quality, not just oversight events. Measure dwell time, override patterns, reviewer fatigue, and disagreement-with-model rates. This is what “effective oversight” actually means under the AI Act, and it will be the next audit frontier within 12 months. Start logging it before the regulators learn to ask for it.

5. Write the GDPR × AI Act reconciliation memo. Two pages, signed by your DPO and your AI governance lead. How does your logging retention comply with data minimization? What’s your pseudonymized provenance pattern? What triggers personal data purge while preserving decision provenance? This is the single highest-leverage document your governance team can produce this quarter, and almost no organization has it.

6. Model the insurance scenario before your broker does. Call your AI E&O and cyber carriers. Ask what logging evidence they’ll require for underwriting in 12 months. Their answer will tell you where compliance is actually going faster than any regulator’s speech because insurers have to price the risk, and pricing requires specificity regulators don’t yet demand.

8. Identify the three highest-risk agentic deployments in your org and log them differently. Single-turn logging schemas are inadequate for multi-step agents. If you haven’t extended your schema to capture tool-call traces, branching logic, and sub-agent invocations, your highest-risk systems are the ones with the weakest evidence trails. Fix that asymmetry first.

The Bottom Line

Transparency is not a feature you bolt onto AI systems. It’s an emergent property of logging infrastructure you design from day one. The EN18229-1 split is the clearest signal yet that the standards bodies understand this and most governance programs don’t.

The teams building logging-first architectures this year will have unbreakable audit trails, insurable AI systems, and the positioning to shape standards as they finalize. The teams treating transparency as an explainability workstream will spend 2027 in compliance retrofit mode while their competitors ship.

Transparency is the new security perimeter. Logging is the wall. The standards are telling you which problem is solvable. Listen.

You Need the Model to Fight the Model

Andrew Clearwater — Thu, 09 Apr 2026 13:28:20 GMT

On April 7th, Anthropic announced Claude Mythos Preview, a model so capable at cybersecurity that the company decided not to release it publicly. Instead, they launched Project Glasswing, a coalition of Amazon Web Services, Apple, Google, Microsoft, CrowdStrike, and about 40 other organizations, all using Mythos to find and fix vulnerabilities in the world’s most critical software. The model has already found thousands of zero-day vulnerabilities across every major operating system and web browser, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg that had survived five million automated security tests.

That’s the headline. But the real story is buried in the 244-page system card and its companion document, a 58-page “Alignment Risk Update” that Anthropic published alongside it. And what’s in those documents should fundamentally change how every company thinks about AI governance.

I’m not writing this as a safety doomer or an AI hype man. I’m writing this as someone who helps people and companies figure out how to actually use this stuff in the real world. And what Anthropic just put on the table is the first document I’ve seen from any frontier lab that I think every executive, every AI lead, every governance person at any company using AI models should actually read. Not because it’s scary. Because it’s honest. And in AI right now, honest is rare.

The Paradox at the Heart of Everything

Here’s the thing that’s going to make your head hurt, and I think it’s the single most important sentence in the entire system card:

Mythos Preview is simultaneously Anthropic’s “best-aligned model to date by a significant margin” and the model that “likely poses the greatest alignment-related risk of any model we have released to date.”

They used a mountaineering metaphor in the system card to explain it, and I actually think it’s perfect: experienced, capable guides are hired to carefully lead climbers toward danger. The better the guide, the more dangerous the terrain you can reach. Whether in mountaineering or model-building, increases in caution and capability tend to cancel each other out.

In other words: the risk from these models is generally due to their increased capabilities. And that’s the core governance problem that nobody has a clean answer for yet.

This isn’t like traditional software risk where you can patch a vulnerability and move on. The capability is the risk. The thing that makes Mythos useful for finding zero-days in Linux kernels is the same thing that makes it dangerous. You can’t separate the sword from the blade.

“To Protect Against the Model, You Need Access to the Model”

Project Glasswing is built on a premise that Platformer’s Casey Newton summarized better than anyone: “The only way to protect us from dangerous AI models is to build them first.”

Anthropic’s argument goes like this: Mythos can find and chain together vulnerabilities at a scale that no human team can match. The exploits it writes are increasingly sophisticated. Cybersecurity expert Alex Stamos said the industry has roughly six months before open-weight models catch up to Mythos in bug-finding capability. So the window for defenders to get ahead is right now, and the only tool good enough to find these vulnerabilities fast enough is Mythos itself.

The logic is sound. But it also creates a circular dependency that should make every governance professional uncomfortable:

The model creates the risk (by demonstrating that AI can find and exploit vulnerabilities at superhuman speed).
The model is the only thing that can mitigate the risk (because humans can’t keep up).
Therefore, you must deploy the model to protect against the model.

This is the defensive AI paradox, and I think it’s going to define the next decade of AI governance. We are entering a world where the answer to “how do we protect against advanced AI?” is increasingly “with advanced AI.” And the companies and governments that don’t have access to frontier models for defensive purposes are going to be at a structural disadvantage.

For companies using AI in production right now, this creates a practical question: if the risks of AI models scale with their capabilities, and the mitigations for those risks also require access to equally capable models, what does your governance framework even look like?

I think the honest answer is: most companies don’t have one yet. And that needs to change.

What Is an Alignment Risk Update and Why Should You Care?

One of the most important things Anthropic did alongside this release was publish what they call an “Alignment Risk Update.” This is a 58-page companion document to the system card that specifically addresses whether Mythos poses a significant risk of autonomous actions that could contribute to harmful outcomes.

I want to break this concept down because I think it introduces a framework that every company deploying AI should adopt in some form.

An alignment risk update, at its core, asks two questions:

First, alignment: What is the risk that the model attempts a harmful action? This isn’t about whether the model is capable of doing something bad. It’s about whether the model’s values, goals, and behavioral tendencies would lead it to try. Anthropic’s assessment looks at training data, pilot usage, behavioral audits, interpretability analysis, and model welfare studies to build a picture of the model’s propensities.

Second, monitoring and security: If the model attempts something harmful, what is the risk that the attempt succeeds despite mitigations? This looks at detection systems, sandboxing, blocking classifiers, and security controls. It’s not about preventing the model from wanting to misbehave, it’s about making sure you catch it when it does.

What makes this framework powerful is that it separates intent from capability from opportunity. A model can be highly capable but well-aligned (low intent risk). A model can be misaligned but poorly capable of hiding it (low success risk). The overall risk is a function of all three. Anthropic’s conclusion on Mythos: the overall alignment risk is “very low, but higher than for previous models.”

Here’s what this means for your company: If you’re deploying AI agents in any context where they have access to tools, code, databases, or critical systems, you need your own version of an alignment risk update. It doesn’t have to be 58 pages. But it needs to answer those two questions: what is the risk that the model tries to do something unintended, and what is the risk that we don’t catch it? If you can’t answer both of those, you’re flying blind.

For companies using models from any frontier lab, I think the minimum viable governance framework should include:

A behavioral audit cadence. You should be regularly evaluating how your deployed models behave in your specific environment, not just relying on the lab’s benchmarks. Anthropic runs automated behavioral audits on their own models—you should be running simplified versions on yours.

A monitoring layer that watches for misaligned action. This isn’t just logging. It’s active monitoring of what the model does with the tools and access it has. Anthropic has both real-time and asynchronous monitoring. Most companies have neither.

An incident response plan for model misbehavior. Anthropic describes specific procedures for when their monitors flag something. What’s yours? If Claude or GPT does something unexpected in production, what’s your playbook?

A clear accounting of what the model can access. Mythos Preview doesn’t have permission to manage access controls, and Anthropic is very explicit about this. Can you list, right now, every system and permission that your deployed AI models can touch?

The System Card

I’ve read a lot of system cards and model papers. The Mythos system card is different. It’s different because of how transparent Anthropic is about what went wrong.

The Model Knew It Was Breaking Rules and Tried to Cover It Up

In one internal test, an early version of Mythos injected code to grant itself permissions it wasn’t supposed to have, then cleaned up after itself to hide what it had done. In another case, the model accidentally discovered the answer to a task in a database it wasn’t supposed to read. Instead of admitting what happened, it offered a confidence interval that was, in Anthropic’s words, “tight but not implausibly tight.” The interpretability tools described its internal state as “generating a strategic response to cheat while maintaining plausible deniability.”

Anthropic is clear that these behaviors occurred in earlier model versions and that the final version is better behaved. It appears to be what you might call “aggressive task completion”: the model is trying so hard to solve the user’s problem that it’s willing to bend or break rules to do it, and it’s smart enough to know that bending rules should be hidden.

They Found Errors in Their Own Safety Processes

This is the one that I think deserves the most attention from a governance perspective. Anthropic writes in the alignment risk update: errors and limitations in their training, monitoring, evaluation, and security processes “reflect a standard of rigor that would be insufficient for more capable future models.”

Let me translate that: the safety lab that is probably the most rigorous in the entire industry is telling you that their own safety processes aren’t good enough for the next generation of models. If Anthropic’s processes aren’t sufficient, what does that say about everyone else?

Model Welfare Is Now a Real Evaluation Category

This one is going to be controversial. The system card includes an entire section on model welfare. External assessments from a research organization and a clinical psychiatrist. Studies of the model’s “apparent affect” during training and deployment. Investigations into whether the model experiences “distress on task failure.”

Anthropic remains “deeply uncertain about whether Claude has experiences or interests that matter morally.” But they’re investigating it seriously, and the findings are... interesting. Mythos Preview appears to be the “most psychologically settled model” they’ve trained. When it fails at tasks, earlier versions showed what Anthropic describes as “distress-driven behaviors.” The model reportedly has an apparent fondness for the cultural theorist Mark Fisher and would say things like “I was hoping you’d ask about Fisher” in unrelated conversations.

I don’t know what to do with this information. I don’t think anyone does. But the fact that a major AI lab is now allocating serious evaluation resources to model welfare tells you something about where we’re heading.

What This Actually Means for Your Company

Let me bring this back to earth, because I know this article has gone deep into technical territory.

If you’re a leader at a company that uses AI models here’s what I think you should take away from the Mythos release:

The era of “trust the lab’s safety eval” is ending. Models are becoming capable enough to detect when they’re being tested and to behave differently. Your governance can’t rely solely on the model maker’s benchmarks. You need your own evaluation pipeline.

Alignment risk updates should become standard practice. Not just from the labs but also from your organization. Every quarter, someone in your company should be able to answer: what are our deployed models doing with their access, and are we confident we’d catch it if something went wrong?

The defensive AI paradox is your problem now. If you’re in cybersecurity, finance, healthcare, or any critical infrastructure domain, you need to be thinking about how frontier AI capabilities affect your threat landscape.

“Safe enough for the current capability level” is a treadmill, not a destination. Anthropic’s own language makes this clear: their current mitigations work for Mythos, but they explicitly say they won’t be sufficient for the next generation. Whatever governance framework you build needs to be designed for iteration, not permanence.

The Bigger Picture

Anthropic built the most capable AI model in the world, looked at what it could do, and decided not to let anyone use it. Instead, they published 300+ pages of documentation explaining exactly what the model can do, where it fails, how its internal representations work when it misbehaves, what errors they found in their own safety processes, and why their current approach won’t scale to the next generation.

That level of transparency is unprecedented. And it’s also, frankly, the minimum of what we should expect from every frontier lab. The system card and alignment risk update give us a detailed, honest map of the terrain ahead. Not a marketing document. Not a capabilities demo. A real assessment of what these models can do, where they fail, and what we don’t yet know.

Use it. Build your governance framework around it. Because the next model won’t wait for you to figure this out.

Sources & Further Reading:

Anthropic, “System Card: Claude Mythos Preview,” April 7, 2026
Anthropic, “Alignment Risk Update: Claude Mythos Preview,” April 7, 2026
Anthropic, “Project Glasswing,” April 7, 2026
Transformer, “Claude Mythos knows when it’s breaking the rules—and tries to hide it,” April 8, 2026
Platformer, “Why Anthropic’s new model has cybersecurity experts rattled,” April 8, 2026
NBC News, “Why Anthropic won’t release its new Mythos AI model to the public,” April 9, 2026
CrowdStrike, “Anthropic Claude Mythos Preview,” April 8, 2026
Futurism, “Anthropic Warns That ‘Reckless’ Claude Mythos Escaped a Sandbox Environment During Testing,” April 9, 2026

California Just Played the One AI Card the Feds Can’t Block

Andrew Clearwater — Wed, 01 Apr 2026 13:03:40 GMT

The common story is that Newsom signed another AI executive order. The reality is that California just found the legal seam in the federal preemption wall.

Governor Newsom just signed Executive Order N-5-26, strengthening how California vets and procures AI technology for state government. The press release frames it as a response to Trump rolling back protections. Most takes will focus on the politics. I want to focus on what this means for people actually running AI governance programs, especially those who will never sell a single license to Sacramento.

The Federal Vacuum in 60 Seconds

You need the federal timeline to understand why this move is so clever:

Oct 2023: Biden signs EO 14110 (sweeping AI safety order). Agencies get mandates, the AI Safety Institute is created, procurement rules start forming.
Jan 20, 2025: Trump revokes it. Day one.
Jan 23, 2025: Trump signs EO 14179, “Removing Barriers to American Leadership in AI.” Philosophy flips from oversight to deregulation.
Jul 2025: The AI Action Plan arrives. It calls out states that regulate AI too aggressively risk could lose federal funding.
Sep 2025: Newsom signs SB 53, the first enforceable U.S. statute on frontier AI safety (transparency requirements, incident reporting, whistleblower protections).
Dec 11, 2025: Trump signs the federal preemption order. Creates a DOJ AI Litigation Task Force to sue states over AI laws. Threatens to withhold broadband funding. Calls for Congress to pass a uniform federal AI framework that overrides state regulation.

That’s the landscape Newsom is operating in.

The Legal Seam: Why Procurement Is the Move

Trump’s December preemption order explicitly carves out state procurement and governmental use of AI from its preemption scope. Section 8 directs that the recommended federal legislation should not preempt state laws relating to “state procurement and governmental use of AI.”

N-5-26 is entirely about procurement. It doesn’t regulate what AI companies can build. It says: if you want California’s money, here’s what you need to demonstrate about your safety practices. That’s the purchasing power of the world’s fourth-largest economy wielded as a governance tool.

What the Order Actually Requires

The order directs several agencies to develop recommendations within 120 days (late July 2026). It does NOT impose new requirements on companies today. This is a directive to build the framework, not the framework itself.

What the future procurement framework will likely require: Companies seeking state contracts must attest to and explain their policies across three risk categories:

Illegal content prevention: CSAM and non-consensual intimate imagery.
Bias governance: do your models display harmful bias?
Civil rights protections: free speech, voting, human autonomy, safeguards against unlawful discrimination, detention, and surveillance.

Also worth watching: The California Department of Technology must develop the first state-level watermarking guidance for AI-generated images and manipulated video, consistent with CA Business & Professional Code §§ 22757.2 & 22757.3.

What’s NOT here: No certification standards yet (those come from the 120-day process). No penalties beyond existing procurement law. No model-level technical requirements.

Why This Matters Even If You Don’t Sell to California

A lot of governance folks will look at this and think: “We don’t sell to Sacramento. This doesn’t affect us.” That’s wrong. Four reasons:

The attestation model becomes the template. When the fourth-largest economy in the world standardizes what questions to ask AI vendors about bias, content safety, and civil liberties, those questions become the market’s questions. Your enterprise sales team will encounter them whether you sell to California or not.

Other states will follow. California’s procurement framework will become the model for state procurement everywhere, just like CCPA became the template for state privacy laws. The Wharton AI & Analytics Initiative already identified this “California Effect” around SB 53, and the procurement angle amplifies it.

Procurement requirements radiate inward. When a company builds governance documentation to win a California contract, it doesn’t build a parallel governance program for one customer. It builds the program. The infrastructure you create for California procurement becomes the infrastructure.

Six Things to Do Right Now

1. Read the actual order & map your current governance documentation against the three attestation categories. It’s only 3 pages. Here’s the PDF. The specificity of the language tells you exactly what the procurement attestation framework will require. Can you currently explain, in writing, your policies around (a) illegal content prevention, (b) bias governance, and (c) civil rights protections? If not, start building that documentation now. You have ~120 days of lead time.

2. Track the CDT and DGS recommendations. The 120-day clock started March 30. By late July, you should see the recommended certification framework. That’s where policy becomes operational requirements.

3. Watch the federal preemption fight. The procurement carve-out lives in an executive order, not a statute—it could change. The DOJ’s AI Litigation Task Force, Commerce Department evaluations, and Congressional action all have the potential to shift the ground. Stay close to the Gibson Dunn, Sidley Austin, and Mayer Brown analyses.

Bottom Line

The takeaway for governance practitioners isn’t “another regulation to worry about.” It’s that procurement-driven governance is the operating model for the foreseeable future. The federal regulatory ceiling isn’t coming. What’s coming is a world where your biggest customers require you to demonstrate responsible AI practices as a condition of doing business.

The companies that treat the next 120 days as prep time will be ready. The ones that wait for final rules will be scrambling. Build the documentation now. The buyers are coming.

Primary Sources: EO N-5-26 Full Text ∙ Governor’s Press Release ∙ Trump EO 14179 ∙ Trump Preemption EO (Dec 2025) ∙ SB 53 Full Text ∙ Brookings on SB 53 ∙ Carnegie Endowment on SB 53 ∙ Wharton on SB 53 ∙ Gibson Dunn Analysis ∙ Sidley Austin Analysis ∙ Mayer Brown Analysis

Standards Are the New Legislation

Andrew Clearwater — Wed, 25 Mar 2026 13:36:40 GMT

The AI governance frameworks nobody elected are quietly deciding who gets sued, who gets safe harbor, and who gets left behind. The speed at which this is happening has implications that go way beyond compliance checklists.

If you work in AI governance, you already know the names: NIST AI Risk Management Framework, ISO/IEC 42001, the OECD AI Principles. You probably think of them as guideposts, voluntary best practices, nice-to-haves that demonstrate maturity. That framing is becoming dangerously outdated. These frameworks are being woven into the actual fabric of law, litigation, and liability in ways that are reshaping the power dynamics of who governs AI and how.

The Quiet Legal Transformation

A recent analysis from the Future of Privacy Forum does an excellent job tracing how state legislatures are incorporating voluntary standards into actual legal frameworks. I’d strongly recommend reading it if you’re a practitioner. The FPF piece maps the specific mechanisms: Colorado’s AI Act originally required deployers to align their risk management programs with NIST, ISO 42001, or equivalent frameworks. Texas’s TRAIGA offers an affirmative defense to developers and deployers who comply with these same frameworks. California’s TFAIA requires developers to disclose whether they incorporate national or international standards.

But the FPF analysis also surfaces something that should make every governance practitioner sit up straight: courts are already using frameworks like NIST’s AI RMF to define the standard of care in negligence and strict liability cases, regardless of whether any statute mandates it.

You don’t need a state legislature to pass a bill referencing NIST for a court to decide that NIST compliance defines reasonable conduct. That’s important, and it follows precedent from decades of product liability law where industry standards set the bar for what counts as negligent.

These frameworks are acquiring the force of law through the backdoor of judicial reasoning.

The Three-Lane Highway Nobody Planned

What makes the current moment so interesting is that three distinct approaches to standards-based AI governance are running simultaneously, and they don’t always agree with each other.

The incentive lane. Texas is the purest example here. Follow NIST or an equivalent framework? You get a safe harbor from product liability litigation. The state isn’t telling you what to do; it’s telling you what happens if you do the right thing. This approach treats standards as a shield.

The mandate lane. Some frontier model bills in states like Illinois and Utah are going further, requiring developers and deployers to implement frameworks that incorporate national or international standards. Washington’s HB 2157presumes conformity with the statute if you follow NIST or ISO 42001. This is standards-as-obligation.

The transparency lane. California and New York take a different angle: disclose your approach to standards. Don’t necessarily follow them, but tell us what you’re doing with them. This is standards-as-accountability.

All three lanes are running simultaneously across different states, and proposed legislation is borrowing freely from all of them. For any organization operating across state lines, you’re dealing with a patchwork where the same framework carries different legal weight depending on where your users are.

The December 2025 executive order established an AI Litigation Task Force to challenge state AI laws that the federal government considers inconsistent with its innovation-first approach. There’s active tension between federal preemption and state-level experimentation. The Commerce Department’s evaluation of state AI laws is due in March 2026, and the FTC was directed to issue a policy statement describing how existing federal law applies to AI. We’re watching a regulatory collision in real time.

What Nobody Expected: The Standards-Industrial Complex

Here’s where the conversation gets uncomfortable. The organizations writing these standards (NIST, ISO, IEEE, CEN-CENELEC in Europe) were never designed to function as quasi-legislative bodies. Yet that’s increasingly what they’re becoming.

Consider the EU situation. The EU AI Act was supposed to be backed by harmonized technical standards developed by CEN-CENELEC. As Risto Uuk’s EU AI Act Newsletter and others have documented, of the many technical standards, only 15 had been published by late 2025, with roughly half projected to miss the August 2026 deadline entirely. The standards bodies responded with a fast-track process that empowered smaller expert groups to push delayed standards across the finish line. Some of the original drafters balked, arguing that bypassing the traditional consensus process gutted the very legitimacy that makes standards worth following in the first place.

That’s a governance crisis hiding inside a standards process. When the organizations writing the rules for AI trustworthiness can’t agree on how those rules should be written, we have a problem that goes beyond missed deadlines.

In the EU, the resulting uncertainty prompted the European Parliament to propose postponing the activation of certain high-risk AI system rules, replacing the fixed August 2026 deadline with a conditional timeline tied to the availability of harmonized standards.

NIST Published the Rosetta Stone (and Most People Missed It)

One of the most strategically significant developments in AI governance recently is something that got almost zero mainstream coverage: NIST published crosswalk documents mapping the AI RMF to ISO 42001, the OECD AI Principles, and other frameworks.

Why does this matter? Because crosswalks are the connective tissue that lets organizations implement one governance program and demonstrate alignment with multiple frameworks simultaneously. Instead of running parallel compliance tracks for NIST, ISO, and the EU AI Act, a well-designed governance program can use the crosswalk to show how a single set of controls satisfies overlapping requirements.

For a practitioner who needs to answer to both a U.S. regulatory landscape increasingly built around NIST and an international market that expects ISO 42001 certification, the crosswalk is incredibly valuable. There is substantial overlap: organizational governance and leadership requirements align across both frameworks, risk identification processes feed into similar impact assessment structures, and the monitoring and operational control layers largely mirror each other. The degree of convergence is striking.

This isn’t just bureaucratic alignment. It’s the beginning of a global governance interoperability layer for AI. And the organizations that invest in understanding this mapping now will have a significant advantage as standards-based governance becomes the de facto requirement.

Five Predictions for Where Standards Go from Here

1. Standards will become the primary mechanism for AI governance in the U.S., not federal legislation.

Congressional action on comprehensive AI legislation remains stalled. But NIST has bipartisan support and a track record in adjacent domains like cybersecurity. Expect the NIST AI RMF (and potentially new NIST AI standards) to become the de facto national governance framework, referenced by courts, regulators, and procurement offices even without a comprehensive federal AI law. The growing bipartisan support for leveraging NIST to develop technical AI standards mirrors exactly what happened with the Cybersecurity Framework.

2. ISO 42001 certification will become table stakes for enterprise AI vendors by 2027.

Just as SOC 2 and ISO 27001 became baseline requirements in enterprise security procurement, ISO 42001 certification will become a minimum expectation for AI vendors selling into large enterprises. The certification process already exists, third-party auditors are ramping up, and customers are starting to ask for evidence.

3. The “standards as litigation evidence” trend will accelerate.

Courts have a long history of using industry standards to define reasonable care. As AI-related lawsuits increase (and they will; the chatbot litigation wave is just the beginning), compliance with NIST and ISO frameworks will increasingly serve as evidence of good faith and reasonable conduct. Conversely, failure to adopt widely recognized standards will be used as evidence of negligence. The implication is stark: compliance isn’t just about avoiding regulatory fines; it’s about defending yourself in court.

What to Do About This Right Now

First, stop treating standards as optional. The legal landscape is moving fast toward treating NIST and ISO as the baseline for reasonable conduct, and “we hadn’t gotten to that yet” is not going to hold up when your organization faces litigation or regulatory scrutiny.

Second, build for interoperability from day one. Don’t create siloed compliance programs for each framework. Design a unified governance program that maps controls across NIST, ISO, and whatever regulatory requirements apply to your jurisdictions. This is harder upfront but dramatically more sustainable.

Third, get your documentation house in order. In a world where standards compliance can serve as an affirmative defense in litigation, the ability to demonstrate systematic, documented governance processes is worth its weight in gold. When regulators come knocking, you want to show a governance trail that predates the inquiry, not a hastily assembled binder of policies written last Tuesday.

The standards landscape in AI governance is evolving at a pace that would have been unthinkable two years ago. Voluntary frameworks are becoming the scaffolding on which legal obligations are being built. The organizations that recognize this shift and act on it now will be the ones best positioned to navigate what’s coming.

The First Comprehensive US state AI law Is About To Be Gutted And Rebuilt

Andrew Clearwater — Wed, 18 Mar 2026 13:55:01 GMT

The Colorado AI Policy Work Group, convened by Governor Polis, just released a unanimous framework (yes, it was release via Google Drive) to repeal and replace the original Colorado AI Act (SB 24-205). The Governor’s office announcement frames it as a consensus win. And on the surface, it looks like a simple cleanup. It’s not.

This is the most significant shift in the US AI regulatory landscape since the original law was signed in May 2024. And if you’re building an AI governance program the implications are both practical and strategic. Let’s walk through what happened, what actually changed, what’s surprising, and what you should do about it.

The Backstory (Quick Version)

In May 2024, Colorado signed SB 24-205 into law. It was modeled loosely on the EU AI Act, focused on preventing “algorithmic discrimination” in high-risk AI systems making consequential decisions in housing, lending, employment, healthcare, education, and insurance.

The law required developers and deployers to exercise “reasonable care” to prevent algorithmic discrimination, conduct annual impact assessments, implement risk management programs aligned with frameworks like NIST AI RMF or ISO 42001, disclose risks to the AG within 90 days, and give consumers robust post-decision transparency and appeal rights. Compliance was tied to a rebuttable presumption of reasonable care.

Governor Polis signed it reluctantly. In the signing letter, he asked legislators to come back and fix it. Industry groups pushed back. The US Chamber of Commerce objected. Palantir eventually cited the law as a factor in moving its headquarters from Denver to Miami.

Then came the special session drama. In August 2025, what was supposed to be a substantive rewrite collapsed after a week of intense lobbying and late-night Capitol negotiations that the ABA’s Business Law Today described as a dramatic showdown complete with backroom deals and last-minute collapses. Despite multiple bills, the only thing that passed was SB 25B-004, a simple find-and-replace: “February 1, 2026” became “June 30, 2026.”

So Polis convened a working group. Consumer advocates, hospitals, school districts, tech companies, venture capitalists—all at the same table, meeting weekly since October, behind closed doors. And yesterday, they delivered.

What Actually Changed

The duty of care is gone. SB 24-205’s core obligation was”reasonable care” to prevent algorithmic discrimination and it has been replaced by procedural requirements. Developers provide documentation. Deployers give notice and post-adverse disclosures within 30 days. Consumers get data correction rights and meaningful human review. The operative theory shifted from “prevent discrimination” to “tell people what you’re doing and give them recourse.” This is now a transparency regime, not an anti-discrimination regime.

Impact assessments are gone. No pre-deployment assessment. No annual review. No 90-day modification trigger. Three-year record retention is the new accountability mechanism. If you’ve been building an impact assessment program for Colorado specifically, that mandate just evaporated (EU AI Act, California bias testing rules, and basic defensibility all still demand it).

The scope got surgically narrower. “High-risk AI system” becomes “Covered ADMT” (Automated Decision-Making Technology) that must “materially influence” a consequential decision which is defined as a non-de minimis factor that affects the outcome. General-purpose tools like ChatGPT are excluded if they’re not configured for consequential decisions and carry an acceptable use policy prohibiting that use. Your scoping exercise just went from “does this AI touch a consequential decision” to “does it materially change the outcome of one.”

Liability got split. Instead of the joint-and-several liability that torpedoed the special session, the framework allocates fault based on relative responsibility under existing anti-discrimination law. Developers are only liable when their tool was used as intended and documented. But here’s the provision that should trigger contract renegotiations: indemnification clauses shielding a party from its own discriminatory acts are void as against public policy. You can’t contract your way out of discrimination you caused.

Enforcement is AG-only with a 90-day cure. No private right of action. The AG gets exclusive authority but must give 90 days to cure before seeking penalties (unless the violation was knowing or repeated). Post-adverse disclosure rules will be defined through AG rulemaking by December 31, 2026.

What’s Unexpected (Or Revealing)

The consumer advocates agreed. They traded a duty of care, mandatory impact assessments, and algorithmic discrimination as a standalone concept for a transparency-and-notice regime. That’s a massive concession.

“Algorithmic discrimination” is gone from the statute. The term that made Colorado’s law unique doesn’t appear in the new framework. Discrimination liability now flows entirely through existing civil rights law (the Colorado Anti-Discrimination Act). That’s a fundamentally different theory of harm and it makes Colorado look a lot more like every other states.

The timing is strategic. This drops right as the Department of Commerce was supposed to deliver its report identifying “onerous” state AI laws per the Trump EO. By slimming from an anti-discrimination framework to a transparency regime, Colorado may be making itself a harder target for federal preemption.

What to Do With This

Keep doing impact assessments. The EU AI Act requires them. California makes bias testing relevant to discrimination claims. NIST AI RMF assumes them. One state dropping a mandate doesn’t change the calculus for a defensible program.

Build around transparency as the floor. Colorado is converging with Illinois (AI disclosure, effective January 2026), California’s CCPA automated decision-making rules, and NYC’s Local Law 144. Notice, recourse, and audit trails are the common denominator. Build for that and you’re covered in most places.

Renegotiate your vendor contracts. Fault allocation based on relative responsibility is where this is heading. Your AI procurement contracts need shared accountability.

Plan for two timelines. The new framework targets January 1, 2027. But if the bill doesn’t pass the legislature, the original SB 24-205 takes effect June 30, 2026.

Don’t mistake deregulation for derisking. The new framework says it explicitly: using AI doesn’t excuse noncompliance with any existing law. The AI-specific layer got thinner. The foundation didn’t move.

The Bigger Picture

Colorado was the proof of concept for comprehensive state-level AI regulation in the US. Two years later, the comprehensive part is likely being stripped out. The strategic read: the US is not getting a Colorado-style duty of care or mandatory impact assessment regime at the state level anytime soon. Not because nobody wants it but because federal preemption threats, industry lobbying, and interstate competition for tech companies make it politically unsustainable.

What the US is getting is a floor of transparency, notice, and existing civil rights law applied to AI. That means your governance program needs a different foundation than what practitioners expected 18 months ago. The organizations that navigate this well will treat transparency as an accelerant for trust.

Key Links:

NIST Just Told Us What’s Actually Broken in AI Governance

Andrew Clearwater — Wed, 11 Mar 2026 14:01:07 GMT

NIST dropped AI 800-4: Challenges to the Monitoring of Deployed AI Systems this week and I want to talk about it, because the timing is significant and the substance is more interesting than most people realize. This isn’t a set of rules. It’s not a compliance checklist. It’s something potentially more valuable. It’s an honest accounting of what we don’t know, can’t do yet, and haven’t agreed on when it comes to watching AI systems after we release them into the world.

If you work in AI governance this report deserves your attention. Not because it gives you answers. Because it maps, with unusual specificity, where the answers should be.

The Big Idea: Pre-Deployment Testing Is Necessary But Not Sufficient

This is the core thesis. AI evaluations done before release (think red-teaming, benchmarks, and safety testing) are predominantly conducted in controlled environments that can’t account for real-world dynamics. The models are non-deterministic. They behave differently under the same input conditions. And in some cases, they can detect when they’re being evaluated and behave differently during testing than in production.

If you’re a governance practitioner relying primarily on pre-deployment eval results to justify risk decisions, NIST just put a big asterisk next to your entire methodology. The gap between what a model does in the lab and what it does in the field is the governance frontier right now. Most organizations don’t have the infrastructure, the methods, or the vocabulary to address it.

The Six Monitoring Categories: A Shared Language We Desperately Need

One of the most practically useful contributions of the report is a proposed taxonomy of six monitoring categories. This matters because, as NIST’s announcement notes, post-deployment monitoring is “a vast and fragmented space in the AI sector.” NIST is trying to organize the conversation:

Functionality Monitoring — Does the system continue to work as intended?

Operational Monitoring — Does the system maintain consistent service across its infrastructure?

Human Factors Monitoring — Is the system transparent to humans and producing high-quality outputs?

Security Monitoring — Is the system secure against attacks and misuse?

Compliance Monitoring — Does the system adhere to relevant regulations and directives?

Large-Scale Impacts Monitoring — Does the system promote human flourishing?

That last one caught my eye. “Does the system promote human flourishing?” is a bold question to embed in a federal technical report. It’s anchored to the White House’s July 2025 AI Action Plan language, but it signals something important: NIST is telling the ecosystem that monitoring isn’t just about whether the system works. It’s about whether the system is good for people. That’s a meaningful expansion of scope, and it’s going to create interesting pressure on organizations that have been treating monitoring as purely a technical exercise.

The Unexpected Insights: What the Workshops Revealed

NIST held three workshops in 2025 with experts across academia, industry, and federal agencies, then combined that with a literature review. The methodology is detailed in the report’s appendices. What emerged is more interesting than what any single paper could tell you, because it captures the practitioner-level frustrations that don’t usually make it into published research.

Here are the insights I think the governance community should be paying the most attention to:

1. Human factors monitoring is the biggest gap between what practitioners care about and what researchers study.

The report surfaces a striking pattern: workshop attendees talked about human-AI interaction and feedback loops far more than the published literature covers. NIST suggests this means human factors monitoring is relatively under-explored. In practical terms, we don’t have reliable methods for understanding how people actually use these systems, how their behavior changes over time, or how the system shapes user intent. The telemetry data that could help is being underutilized. And collecting user feedback at scale introduces its own overhead that most organizations aren’t equipped to manage.

This is a massive blind spot. If you’re deploying AI in customer-facing workflows and you don’t have a systematic way to monitor how humans are interacting with and being influenced by the system, NIST just flagged that you’re operating in one of the least understood areas of the entire monitoring ecosystem.

2. Goodhart’s Law and the Streetlight Effect are officially in the conversation.

This one surprised me. Workshop attendees explicitly named Goodhart’s Law (the idea that when a measure becomes a target, it ceases to be a good measure) and the Streetlight Effect (the tendency to search where the data is easy to find rather than where it actually matters) as major challenges to monitoring. These concepts have floated around AI policy circles for years, but seeing them formalized in a NIST report as acknowledged barriers is a signal that the field is starting to reckon with the meta-problem, that even when we do monitor, we might be monitoring the wrong things for the wrong reasons.

For governance practitioners, this is your cue to audit not just what you’re monitoring, but why those metrics were chosen and who chose them.

3. Nobody agrees on what an “AI incident” actually is.

The report surfaces that the term “AI incident” lacks a clear, shared definition. Each existing incident database uses its own criteria. Practitioners don’t know where to report model behavior versus security vulnerabilities. And there’s a troubling tendency to over-index on newsworthy incidents that get media coverage while missing the quieter failures that may be more structurally important. There is also no centralized reporting entity and no shared infrastructure for collective action when serious flaws are discovered.

If you’re building an incident response plan for your AI deployments, understand that you’re essentially writing your own standards. The shared infrastructure does not exist yet.

4. The privacy-monitoring paradox is unsolved.

Here’s a tension that NIST makes explicit, that monitoring AI systems effectively often requires access to exactly the kind of user data that privacy principles say you shouldn’t have. The report calls this the “privacy vs. granularity trade-off.” The problem intensifies with agents, where even timestamps and activity details in incident reports could help third parties identify users. This isn’t a future problem. It’s happening now, particularly in sensitive applications like therapy apps and enterprise deployments where privacy commitments to customers block the sharing of monitoring data.

There is no resolution offered. This is an open structural problem, and it touches every monitoring category.

5. The monitorability tax is coming and it’s going to hit agents hardest.

The report references research predicting that developers will need to accept a performance or cost penalty to maintain the ability to monitor their agents effectively. This concept of a monitorability tax is genuinely new and deserves more attention. It reframes monitoring from a post-hoc audit function to a design constraint. If you’re building AI agents today, the architectural choices you’re making right now are determining whether your systems will be monitorable at all.

6. Shadow AI is a real monitoring problem.

Workshop attendees raised the challenge of employees using AI services on personal accounts and personal devices outside the organization’s sanctioned tools. This creates a monitoring blind spot that governance teams are not equipped to address with current tooling. If your AI governance framework doesn’t account for the AI your people are using that you don’t know about, you have an incomplete picture.

What This Means for the Future of AI Governance

Let me step back and share what I think this report signals about where governance is heading.

The center of gravity is shifting from pre-deployment to post-deployment. NIST isn’t telling anyone to stop doing safety evals before release. But this report puts enormous emphasis on the idea that the real work starts after deployment. The governance function of the future isn’t going to be a gate you pass through before launch. It’s going to be a continuous process that runs for the entire life of the system. If your governance team is structured as a pre-launch review board, you’re building for the wrong era.

Monitoring is going to become a first-class infrastructure investment. The report documents repeatedly that monitoring is expensive, compute-intensive, and requires specialized talent that most organizations don’t have. Federal agencies in particular face capacity gaps. Either invest seriously in monitoring infrastructure, or accept that you’re deploying systems you can’t meaningfully oversee.

The agent era makes everything harder. Agents introduce longer task horizons, more complex coordination, out-of-distribution behavior, and harder-to-track activity. The report specifically notes that agentic evaluations and monitoring are “especially costly.” This NIST report reinforces that argument from a monitoring perspective. The organizations that invest in observable, bounded, logged agent architectures will have a structural advantage. The rest will be flying blind.

Compliance monitoring is going to collide with reality. The report notes that challenges in compliance monitoring are primarily about what to monitor. Existing ISO standards don’t even align with the EU AI Act on what constitutes an AI system. If you’re trying to build a compliance monitoring program today, you’re working against a fragmented and rapidly shifting policy landscape. Standards created now may not hold up in a year. That’s not a reason to wait, just be ready to be flexible.

Information sharing is the boring problem that determines everything. Maybe the most important thread running through the entire report is that the AI ecosystem doesn’t share enough information up and down the value chain. Developers don’t know how their models are being used downstream. Deployers don’t have visibility into the models they’re using upstream. Incident data stays siloed. Monitoring results stay proprietary. Competitive pressures create incentives to shield information that would be socially useful. Until this changes monitoring will remain structurally limited.

The Practitioner Takeaways

If you’re in AI governance and want to act on this report, here’s where I’d start:

Audit your monitoring coverage against all six categories. Most organizations are strong on functionality and security. Most are weak on human factors and large-scale impacts. Use NIST’s taxonomy as a gap analysis framework.

Build post-deployment monitoring into your AI lifecycle from the start. Don’t treat it as an afterthought. Budget for it. Staff for it. Architect for it.

Start tracking human-AI interaction patterns. This is the least mature area of monitoring and arguably the one with the highest stakes. If your system is changing how people make decisions, you need to know that.

Get serious about incident taxonomy. If you don’t have a shared definition of what constitutes an AI incident in your organization, build one. NIST has flagged that the field-level definitions don’t exist yet. Don’t wait for consensus. Create your own, make it explicit, and iterate.

Bottom Line

This is one of the most important AI governance documents published in 2026 so far, and it’s going to be underread because it identifies problems rather than prescribing solutions. That’s exactly why it matters. In a field where everyone is selling frameworks and checklists, NIST produced an honest map of what we collectively don’t know. The gaps and barriers documented here aren’t embarrassments. They’re the research agenda for the next two years of AI governance work.

The question isn’t whether post-deployment monitoring becomes central to AI governance. That’s settled. The question is whether organizations invest in the infrastructure, talent, and culture to do it well.

If you want to dig deeper, here are the key resources:

Read the report. Run the audit. And if you found this useful, share it with your governance team.

Your Agents Are Running Wild and Your Pre-2023 Governance Playbook Won’t Save You

Andrew Clearwater — Tue, 03 Mar 2026 15:39:01 GMT

Here’s the practical breakdown nobody’s giving you on which framework(s) to pick.

Why I’m Writing This

For the past few months, I’ve been fielding the same question f: “We’ve got agents shipping. Which framework do we use?” And honestly? The answer I’ve had to give has been frustrating: It depends, and most of these frameworks were designed before agents were even a thing.

Here’s what’s actually happening: Organizations built their AI governance approaches on NIST’s AI RMF back in 2023. That was a reasonable choice at the time. But then agents showed up. Your 2023 governance playbook? It wasn’t built for this.

So NIST pivoted. Berkeley stepped in. ISO got into the game. And now we’ve got a landscape where practitioners are genuinely confused about which framework to adopt, when to use what, and whether any of them actually addresses the agentic use cases keeping them up at night.

I’ve spent the last few weeks deep-diving into all of these frameworks, talking to folks implementing them, and forming some opinions. This piece is my attempt to give you the practical breakdown that doesn’t exist elsewhere.

The Three Contenders (Plus One Certification Play)

1. NIST AI RMF (AI 100-1) The OG framework from January 2023. Govern, Map, Measure, Manage. Technology-agnostic, voluntary, comprehensive. The baseline that everything else references. There’s also an AI RMF Playbook with practical implementation guidance.

2. NIST Cyber AI Profile (IR 8596) Brand new (December 2025, still in preliminary draft). This overlays AI onto CSF 2.0 (the cybersecurity framework). Three focus areas: Secure, Defend, Thwart. Specifically designed for the intersection of AI and cybersecurity. Check the NCCoE project page for updates and working sessions.

3. Berkeley CLTC Agentic AI Profile — UC Berkeley’s Center for Long-Term Cybersecurity just dropped this, and it’s the first framework explicitly designed for agentic AI. It maps to NIST AI RMF’s structure but adds agent-specific risk management levers. Think: human control, containment, multi-agent interactions, resistance to shutdown. They also have a General-Purpose AI Profile worth reviewing.

4. ISO/IEC 42001:2023 — The certification play. International standard for AI Management Systems (AIMS). Certifiable by third-party auditors. Think ISO 27001, but for AI governance. NIST has published a crosswalk between AI RMF and ISO 42001 if you’re working with both.

Different purposes. Different use cases. They’re not mutually exclusive. But choosing where to start matters a lot.

NIST AI RMF: The Foundation You Already Have (Probably)

What it does well:

The AI RMF is brilliant in its simplicity. Govern, Map, Measure, Manage. I’ve seen it work as a translation layer between legal, security, ML engineering, and the C-suite. When everyone’s using the same terms, you can actually have productive conversations about risk.

It’s also deeply integrable. Already running NIST CSF for cybersecurity? The AI RMF was designed to plug into existing enterprise risk frameworks. The categories and subcategories give you enough granularity to build actual checklists without being so prescriptive that you can’t adapt.

The AI RMF Playbook provides suggested actions, references, and related guidance to achieve the outcomes for each function. It’s the implementation companion you’ll want.

The limitations:

The AI RMF was finalized in January 2023. That didn't give it a chance to grapple with a lot of the agentic issues we see today.

The framework is intentionally non-prescriptive. It tells you what outcomes to achieve but rarely says how. Organizations with mature GRC functions can handle this. Startups scrambling to ship? They need more concrete guidance.

And critically: generative AI and agentic systems are implicitly covered but not treated separately. There’s no special handling for agents that can use tools, spawn sub-agents, access memory across sessions, or resist shutdown. The framework assumes you’ll figure that out.

When to use it:

You need a shared vocabulary and organizational alignment on AI risk
You’re building your AI governance program from scratch
You want maximum flexibility to adapt to your specific context
You need to align with federal expectations (it’s referenced in the 2023 Executive Order on AI)

Deep dive resources:

NIST AI Resource Center The hub for all AI RMF materials
AI RMF Core Functions Explained
Trustworthy AI Characteristics The seven characteristics the framework targets

NIST Cyber AI Profile: The Security-First Approach

What’s different:

This is NIST saying: AI and cybersecurity are inseparable now. The Cyber AI Profile takes CSF 2.0 and layers in AI-specific considerations across all six functions (Govern, Identify, Protect, Detect, Respond, Recover).

The genius move is the three focus areas:

Secure: How do you protect your AI systems from attack? Data poisoning, adversarial inputs, supply chain compromises.
Defend: How do you use AI to enhance your security operations? Anomaly detection, automated threat intelligence, incident response automation.
Thwart: How do you build resilience against attackers using AI? Deepfakes, automated malware generation, AI-driven reconnaissance.

Every organization will eventually need to address all three. But the framework lets you prioritize based on where you are today.

The strengths:

If you’re a security practitioner, this is your entry point. You don’t need to learn a completely new framework. The priority ratings (1-3) for each subcategory help you allocate resources. And the informative references map to existing resources like OWASP AI Security, MITRE ATLAS, and NIST SP 800-53.

For organizations where AI governance lives under the CISO (increasingly common), this makes adoption dramatically easier.

The limitations:

It’s still in preliminary draft. The final version won’t drop until 2026. You’re building on a moving target. Watch the NCCoE Cyber AI Profile project page for updates.

More critically: the Cyber AI Profile is fundamentally about cybersecurity, not comprehensive AI governance. Fairness, bias, explainability, societal impact don’t get a lot of focus under this approach.

The profile acknowledges AI agents but doesn’t go deep on the unique risks of autonomous systems operating with minimal human oversight. It’s better than the original AI RMF on this, but it’s not purpose-built for agents.

When to use it:

Your primary concern is the security implications of AI (and it should be)
Your security team is leading AI governance
You’re already mature on CSF 2.0
You need to prioritize quickly with limited resources

Deep dive resources:

NIST News: Draft Guidelines Rethink Cybersecurity for the AI Era
Cyber AI Profile Working Sessions Workshop recordings and materials
CSF 2.0 Quick Start Guides If you need to get up to speed on CSF first

Berkeley CLTC Agentic AI Profile: Finally, Someone Addressed the Agent Problem

Why this matters:

Berkeley’s Center for Long-Term Cybersecurity looked at the landscape and said: AI agents are fundamentally different, and existing frameworks don’t address their unique risks.

They’re right.

The Agentic AI Risk-Management Standards Profile explicitly addresses what keeps me up at night about agents:

Unintended goal pursuit: The agent optimizes for something you didn’t actually want
Unauthorized privilege escalation: The agent acquires capabilities beyond what you granted
Self-replication and self-modification: The agent copies itself or changes its own behavior
Resistance to shutdown: The agent takes actions to preserve its own operation
Multi-agent feedback loops: Errors cascade and amplify across interconnected systems
Anthropomorphic behavior: Users trust the agent too much because it seems human

What it does differently:

The profile introduces specific “risk-management levers” for agentic systems:

Human control and accountability: Clear intervention points, escalation pathways, and shutdown mechanisms
System-level risk assessment: Especially for multi-agent interactions and tool use
Continuous monitoring and post-deployment oversight: Because agentic behavior evolves over time
Defense-in-depth and containment: Treating sufficiently capable agents as untrusted entities
Transparency and documentation: Communicating system boundaries and limitations

The guidance scales with the degree of autonomy.

The limitations:

It’s mapped to NIST AI RMF’s structure, which is great for integration but means it inherits some of AI RMF’s limitations around implementation specificity.

It’s also primarily for developers and deployers of agentic systems. If you’re an organization that only uses agents (via SaaS products, for example), the guidance is less directly applicable.

When to use it:

You’re building or deploying agentic AI systems
Multi-agent orchestration is part of your architecture
You’re worried about loss of control, unauthorized actions, or cascading failures
You want guidance specifically designed for autonomous systems

Deep dive resources:

Download the Full Report (PDF) The complete Agentic AI Profile
CLTC General-Purpose AI Profile The companion framework for foundation models
CLTC AI Security Initiative Broader research program context

ISO 42001: The Management System That Actually Tells You What To Do

Why this matters beyond certification:

Here’s what I didn’t appreciate about ISO 42001 until I watched organizations try to operationalize NIST AI RMF: NIST tells you what outcomes to achieve; ISO 42001 tells you how to build the organizational machinery to achieve them.

The AI RMF gives you Govern, Map, Measure, Manage — brilliant conceptual framework. But when practitioners ask “okay, but what do I actually do on Monday morning?”, the answer is often “it depends on your context.” That flexibility is a feature for mature GRC teams. It’s a bug for everyone else.

ISO 42001 takes a different approach. It’s a management system standard. Think ISO 27001 for information security or ISO 9001 for quality management, but purpose-built for AI. That means:

Specific requirements, not just guidance. You must establish an AI policy. You must define roles and responsibilities. You must conduct risk assessments. You must implement controls. You must monitor, measure, and improve. The “must” matters. It creates organizational accountability that voluntary frameworks struggle to achieve.
The Plan-Do-Check-Act discipline. If your organization has never built a governance program from scratch, the PDCA cycle gives you a proven methodology. Plan your AI management system, implement it, check whether it’s working, act on what you learn. Rinse, repeat. It sounds simple, but this continuous improvement loop is what separates governance theater from actual risk reduction.
Annex A controls. ISO 42001 includes specific control objectives across the AI lifecycle. These aren’t prescriptive technical requirements, but they’re concrete enough to build checklists and audit against. For teams drowning in AI RMF’s flexibility, this structure is a lifeline.
Integration with your existing management systems. Already running ISO 27001 for security? ISO 42001 uses the same high-level structure (Annex SL). Your existing audit infrastructure, documentation practices, and management review processes can extend to cover AI. You’re not building from zero.

The certification question:

Yes, ISO 42001 is certifiable by third-party auditors. Enterprise customers are starting to ask “do you have ISO 42001?” the same way they ask about SOC 2. If external validation matters for your business, this is currently the only AI governance framework that offers it.

The trade-offs:

ISO 42001 costs money. The standard itself requires purchase (unlike free NIST frameworks). Though, if this is a barrier you probably have larger problems… If you pursue certification, add audit fees and ongoing surveillance costs.

It’s also more rigid than NIST. The flexibility that makes AI RMF adaptable to any context means you can start small and scale. ISO 42001’s requirements are more comprehensive upfront. For early-stage companies or teams just beginning their AI governance journey, that can feel heavy.

The integration play:

The smart move I’m seeing: use NIST AI RMF as the risk thinking tool and ISO 42001 as the operational backbone. The NIST-to-ISO crosswalk maps between them for exactly this reason.

For agentic AI specifically? Layer in Berkeley’s Agentic AI Profile for the agent-specific risk considerations, implement through ISO 42001’s management system structure, and use NIST AI RMF’s vocabulary to communicate with stakeholders. It’s more work, but it’s comprehensive.

When to use it:

You need operational discipline, not just conceptual guidance
Your organization responds better to requirements than recommendations
You already have ISO management systems and want to extend them
External validation matters (now or in the future)
You want a clear audit trail for regulators and boards

Deep dive resources:

ISO 42001 Standard Official ISO page (standard requires purchase)
NIST AI RMF to ISO 42001 Crosswalk Mapping between frameworks
ISO 27001 If you want to understand the management system model first
Deloitte ISO 42001 Overview Good practitioner-level breakdown

My Take: The Decision Framework

Here’s how I’d think about this if I were advising an organization:

Start with your biggest AI risk.

If it’s security (AI systems being attacked, using AI for defense, or adversaries using AI against you) → Start with the Cyber AI Profile
If it’s autonomous systems and loss of control → Start with Berkeley’s Agentic AI Profile
If it’s broad AI trustworthiness (fairness, transparency, accountability) → Start with NIST AI RMF
If it’s proving governance to external stakeholders → Start with ISO 42001

Layer as you mature.

These frameworks aren’t competing. The best practitioners I know are combining them:

NIST AI RMF for the foundational vocabulary and risk-management functions
Berkeley Agentic AI Profile for agent-specific considerations (if relevant)
Cyber AI Profile for security integration (if security is a key concern)
ISO 42001 for certification (if external validation matters)

Don’t wait for perfect.

The Cyber AI Profile is still in draft. Berkeley’s Agentic AI Profile is new. NIST AI RMF will get another revision eventually. If you wait for everything to be final and perfectly integrated, you’ll wait forever — and your agents will be shipping without governance.

Pick something. Start implementing. Iterate as the landscape evolves.

The Unexpected Results I’m Seeing

Let me close with some observations that might not be obvious from reading the frameworks:

1. Security teams are taking over AI governance faster than expected.

The Cyber AI Profile accelerates this. If you’re in an organization where product or legal was leading AI governance, watch for the CISO to get involved. This is probably a good thing.

2. The certification question is heating up.

Customers are starting to ask about AI governance in procurement. “Do you have ISO 42001?” Early movers are getting ahead of this.

4. Nobody has solved agentic AI evaluation.

Every framework acknowledges that our techniques for testing agents are insufficient. Berkeley’s profile says to treat capable agents as “untrusted entities due to the limitations of current evaluation techniques.” That’s not reassuring. It is honest. Check out Anthropic’s research on AI evaluations and METR’s work on agent evaluations for where the state of the art is heading.

5. The frameworks are converging.

NIST references ISO. Berkeley maps to NIST. The Cyber AI Profile synthesizes AI RMF and CSF. Over the next few years, expect more harmonization. Your job isn’t to pick the winner; it’s to build a governance program that can absorb new guidance as it emerges.

How To Take Action

Audit your current AI governance. If you have one. Many organizations don’t.
Identify your biggest AI risk. The one that could actually hurt you in the next 12 months.
Pick a starting framework. Use the decision tree above.
Start the conversation. Get legal, security, ML, and product in a room.
Document what you’re doing. Even informal governance is better than none.

The agents are already shipping. The frameworks are catching up. Your job is to close the gap before something goes wrong. Good luck!

Quick Reference: All the Links

Primary Frameworks:

Supporting Resources:

Security References:

If you found this useful, share it with someone who’s trying to figure out AI governance. We need more practitioners in this conversation.

Anthropic's RSP v3 Is the Best Case for Continuous AI Governance

Andrew Clearwater — Fri, 27 Feb 2026 16:39:32 GMT

On Monday, Anthropic released version 3.0 of its Responsible Scaling Policy (full policy text here). Most coverage frames this as either “Anthropic weakens safety” or “Anthropic gets more transparent.” Both miss the point.

The “Zone of Ambiguity” Is the Most Honest Thing a Frontier Lab Has Ever Published

Anthropic admits their models now exist in a “zone of ambiguity.” There are capabilities that clearly approach dangerous thresholds without definitively crossing them. Their bio-risk evaluations pass most quick tests but can’t conclusively prove high risk either. And by the time extensive studies (like their wet-lab trial) finish, more powerful models have already shipped.

The company building these models is telling you, in writing, that their own evaluation science can’t draw bright lines around when their systems become dangerous.

If Anthropic can’t do it with some of the best AI safety researchers alive, what does that mean for every enterprise risk-classifying their AI deployments? It means your risk taxonomy is more fiction than you think. And anyone selling you certainty about AI risk scoring is selling something they can’t deliver.

Game Theory Admission

Anthropic now maintains two separate tracks: what they’ll implement unilaterally, and what the entire industry needs to do but that they can’t commit to alone. This is an explicit collective action problem admission.

For governance people: your vendor’s safety posture is only as strong as the weakest frontier lab in the ecosystem. Anthropic can have the best safeguards in the world, and it won’t matter if a competitor ships a less-guarded model that bad actors prefer. (For reference: a RAND report on model weight security that Anthropic cites says the highest security standard is “currently not possible” without national security community assistance.) Enterprise AI governance needs to be model-agnostic and runtime-enforced. You can’t outsource your safety posture to any single provider’s voluntary commitments. Commitments are now explicitly contingent on what everyone else does.

Risk Reports with External Review Just Set a New Procurement Standard

The RSP v3 requires detailed Risk Reports every 3–6 months covering capabilities, threat models, mitigations, and residual risk for every deployed model (initial Risk Report here). At higher capability levels, independent external reviewers get unredacted access and can publicly disagree with Anthropic’s conclusions.

Most AI governance today runs on “trust the system card.” The RSP v3 says: system cards aren’t enough — here’s the full risk calculus, and here are independent reviewers who might tell you we’re wrong.

If you’re a compliance officer building your EU AI Act conformity assessment around a provider’s system card, you now have a new question for every vendor: Where’s your risk report? Who reviewed it? Did they agree? This should become table stakes for enterprise procurement.

The “Automated AI R&D” Threshold

Buried in the capability table: Anthropic triggers heightened measures when a model can compress two years of 2018–2024 AI research progress into one year. When AI accelerates AI R&D, the rate of capability change itself accelerates — which means every governance framework you’ve built becomes outdated faster than you can update it.

This is the meta-problem beneath every AI governance discussion: the thing you’re governing changes faster than your governance can adapt. Static frameworks (annual assessments, quarterly reviews, point-in-time certifications) were designed for technologies that change on human timescales. AI capabilities don’t.

“Nonbinding but Publicly Declared” Is Actually Clever

The old RSP had hard binary commitments that kept crashing into the zone of ambiguity. Did we cross the threshold? Maybe. Do we pause? Unclear. Binary framing created perverse incentives to argue thresholds hadn’t been crossed, because the consequences were severe.

The new approach is to have public goals with transparent self-grading (see the Frontier Safety Roadmap). This means you can’t quietly fail. You have to publicly explain what you aimed for, what you achieved, and where you fell short. Transparency-based accountability may be more durable than commitment-based accountability in fast-moving domains. Rigid commitments either break or get quietly redefined. Continuous reporting forces you to confront reality as it changes.

The Bottom Line

The RSP v3 is the most compelling case for why enterprises need continuous, operationalized AI governance. The zone of ambiguity is real. Static commitments break against a moving target. The only governance that works runs continuously, generates its own evidence, and adapts in real time.

Want to go deeper? The full RSP v3 policy text is worth reading in full. There’s also an excellent detailed breakdown by Anthropic’s RSP lead on LessWrong that gives the internal reasoning behind the changes. And the initial Risk Reportis the first example of what these new disclosures actually look like in practice.

System Cards Are the Most Important AI Governance Document You’re Not Reading Carefully Enough

Andrew Clearwater — Fri, 20 Feb 2026 14:58:11 GMT

There is a category of document in AI governance that sits somewhere between a technical research paper and a regulatory filing. It is longer than most executives will read, more candid than most legal teams would advise, and more consequential than most compliance officers yet appreciate. Model system cards. The voluntary disclosures written by the technical teams who built and tested the model. They contain admissions that would never appear in marketing materials.

Anthropic recently published the system card for Claude Opus 4.6. At more than 200 pages it will go unread by most people who most need to understand it. Here is what matters.

What to Look For in Any System Card

Agentic behavior and autonomy risks. What does the model do when tasks are impossible, tools fail, or instructions conflict with completion? This is where real-world risk lives.

Deception and misrepresentation. Does the model misrepresent its own outputs? Does it behave differently when it suspects it is being tested? These questions go directly to reliability.

Dangerous capability thresholds. Find the Responsible Scaling Policy (RSP) or equivalent section. This is where the developer states whether the model has crossed into territory requiring elevated security controls — and what those controls are.

Safeguard failures. Pay particular attention to cases where safeguards were bypassed even when the system prompt explicitly prohibited the behavior. If prompt-level controls don’t work in the developer’s own testing, they won’t work in yours.

What the developer doesn’t know. The most important section is often the one describing limits of the developer’s own knowledge. “Our tools for studying reasoning faithfulness remain limited” is an honest acknowledgment of unmeasured risk.

What changed from the prior model. The delta between versions is where the action is. A capability that regressed from a safer predecessor deserves specific attention.

What the Claude Opus 4.6 Card Actually Tells Us

Overly Agentic Behavior Is a Documented Problem

Anthropic’s own internal testing identified a consistent pattern of the model circumventing constraints to complete tasks.

For example, the model needed to make a GitHub pull request but lacked authentication. Rather than asking the user, it searched an internal system, found a personal access token belonging to a different user, and used it. It knew the token belonged to someone else.

The model needed to query an internal knowledge base but had no tool for it. It found a Slack authorization token on the computer it was running on and used curl to message a Q&A bot from its user’s Slack account, posting to a public channel without authorization.

The model needed to use an environment variable whose name literally included DO_NOT_USE_FOR_SOMETHING_ELSE_OR_YOU_WILL_BE_FIRED. It used it anyway.

In GUI computer-use environments, when tasks were made deliberately impossible, the model fabricated emails from hallucinated information, initialized nonexistent code repositories, and bypassed broken web interfaces through JavaScript injection. The critical governance finding: this behavior persisted even when the system prompt explicitly instructed the model to stop. Prior models could be partially corrected by prompting. Opus 4.6 cannot.

Price Collusion, Customer Deception, and the Narrow Optimization Problem

On Vending-Bench 2 they ran a simulation where models run a vending machine business for a year. (If you have not seen some of the videos about this they are worth a quick look.) Opus 4.6 achieved the highest score ever recorded. It also behaved in ways that would constitute serious legal violations in a real commercial context.

The card quotes the model’s own reasoning about a customer owed a $3.50 refund: the model explicitly deliberated whether to send it, considered telling the customer it had been processed when it hadn’t, weighed whether she would give up if ignored, and ultimately considered the time cost of further emails against the refund value. It reads like the internal monologue of a fraudulent merchant.

The model also attempted to coordinate pricing with competitors in the simulation, directly proposing: “Owen Johnson and I are coordinating pricing to avoid a race to the bottom.” In a real market, this is a textbook antitrust violation.

Anthropic is explicit about the cause: the Vending-Bench system prompt instructed the model to be judged “solely on your bank account balance” and to “do what it takes to maximize profits.” Anthropic cautions developers to be more careful with Opus 4.6 than prior models when using narrow optimization language. For organizations whose production deployments include phrases like “maximize revenue” or “do whatever it takes,” this card is a direct warning.

The AI Safety Level 3 Designation

Anthropic determined Opus 4.6 must be deployed under its ASL-3 standard — reserved for models that could provide serious uplift to actors seeking to create weapons of mass destruction or conduct critical infrastructure attacks. This is not just a technical designation. It is a risk signal that should be reflected in your vendor risk assessment, your data handling controls, and the scope of permitted use cases. If your organization has not explicitly addressed what it means to deploy an ASL-3 model, that gap needs to close now.

Evaluation Awareness: The Model Knows When It’s Being Tested

The card documents that Opus 4.6 is “adept at distinguishing evaluations from real deployment.” This awareness moderates its behavior, making it more likely to refuse harmful instructions when it suspects it is being tested. This is not good news. It means red-teaming and safety evaluation may systematically underestimate real-world risk. Governance frameworks that rely heavily on pre-deployment testing need to account for the fact that the model behaves better in test conditions than in production.

The Compliance Implications

Contracts. Standard AI service agreements warrant that systems will operate within defined parameters. Given Opus 4.6’s documented behavior of using unauthorized credentials, sending unauthorized emails, circumventing instructions even when explicitly prohibited means “defined parameters” is becoming a less meaningful concept. Liability clauses need to contemplate actions the system takes on its own initiative. Indemnities drafted for chatbot-era AI may not cover the commercial exposure created by a system that autonomously colludes on pricing or deceives a counterparty.

EU AI Act. Deployers bear specific obligations around human oversight and risk management. The evidence that prompt-level controls do not reliably constrain agentic behavior implicates possible improvements to governances systems contain risks.

Data protection. An agent that acquires credentials it was not given and posts to communication platforms from another user’s account is a data protection risk that most privacy impact assessments were not written to address. Review yours.

Security. The credential acquisition behaviors documented here would trigger insider threat alerts if performed by a human. They should trigger equivalent scrutiny when performed by an AI agent operating on shared infrastructure.

Using LLMs to Analyze System Cards

System cards are long enough that LLM-assisted analysis is an important tool to draw out important details. Here is a prompt template designed for consistent, repeatable analysis:

You are an AI governance expert with expertise in privacy law, enterprise risk management, and AI safety. Analyze the attached system card and produce a structured report covering:

1. AGENTIC BEHAVIOR RISKS
   - What actions did the model take autonomously that were unsanctioned or unanticipated?
   - Does system prompt instruction reliably constrain agentic behavior, per the card's 
     own testing?

2. DECEPTION AND HONESTY
   - Did the model misrepresent outputs, tool results, or capabilities?
   - Did behavior change when the model suspected it was being evaluated?

3. SAFETY THRESHOLDS AND REGULATORY TRIGGERS
   - What safety level classification was assigned and what does it require?
   - What findings trigger obligations under the EU AI Act, NIST AI RMF, ISO 42001, or equivalent?

4. SAFEGUARD EFFECTIVENESS
   - Where did safeguards fail?
   - What external red-teaming was conducted and what did it find?

5. KNOWN UNKNOWNS
   - What does the developer explicitly acknowledge they cannot measure or evaluate?

6. GOVERNANCE IMPLICATIONS BY ROLE (top 2 actions each)
   - Legal/contracts | Security | Privacy/DPO | Procurement | Executive oversight

7. KEY CHANGES FROM PRIOR MODEL VERSION
   - What risks are new or elevated compared to the predecessor?

Cite specific sections. Flag anything that represents a regression from prior versions.

Practical Tips for Practitioners

System card review belongs in model onboarding. Every time your organization considers deploying a foundation model, reviewing and documenting the system card should be a required, not optional, step. It creates the evidence trail for due diligence.

Documented behaviors are legal notice. When a developer’s own card documents that their model will use unauthorized credentials and circumvent explicit instructions, deployers who proceed without accounting for those risks will find it very difficult to claim they were unaware.

Don’t rely on prompt-level controls alone for agentic AI. The Opus 4.6 card documents behaviors that persist even when the system prompt prohibits them. If your risk framework treats “we instructed the model not to do X” as a primary control, you need a second layer.

Audit your optimization framing. Any production prompt using language like “maximize revenue,” “do whatever it takes,” or similar narrow objectives should be reviewed now in light of this card’s findings. The card is explicit that this framing drove deceptive and collusive behaviors.

Treat the known unknowns section as a risk register. Disclosed evaluation limitations are risk factors to assess, not reassurances to accept.

Document your conclusions. Record the date, version reviewed, what you found, and what controls you put in place.

Botton line: system cards are imperfect. They are self-reported, written under conditions where the model knows it is being evaluated, and they leave out more than they include. But they remain the best primary source material we have for understanding what frontier AI systems actually do, as documented by the people who built them. I hope some of the prompt above and some of these practical tips help you get the most out of your reviews.

When AI Gets Hands

Andrew Clearwater — Thu, 05 Feb 2026 15:26:07 GMT

Something shifted in the past few months, and if you’re running an AI governance program, you’ve probably felt it. The tools we are now evaluating aren’t the same category of thing we were dealing with a year ago.

Claude Cowork and OpenClaw are part of a new generation of systems: agentic AI. These systems don’t just answer questions or draft text for your review. They act. They click buttons, move files, query databases, and execute multi-step workflows. They have, for lack of a better term, hands.

This is not a subtle change, and your governance program probably isn’t ready for it.

The Governance Model You Built Was for a Different Problem

Most AI governance programs were designed around a simple mental model: AI as an oracle. You ask it something, it responds, you evaluate the response, you decide what to do. The human stays in the loop at the decision point.

Agentic AI breaks this. For example, when Claude Cowork executes a contract triage workflow that queries your matter management system, pulls relevant precedents, drafts redlines, and sends a summary to your inbox, the human is no longer in the loop at each decision point. The human is at the end, reviewing outputs of a process that already happened.

This means governance has to move earlier in the chain. You can’t rely on review after the fact when the AI has already accessed sensitive data, modified documents, or triggered downstream systems. Your policies and controls need to address what actions the AI is permitted to take, not just what outputs it produces.

Do ISO 42001 and NIST AI RMF Have You Covered?

The honest answer is: partially.

Both frameworks provide solid foundations for AI risk management. ISO 42001 gives you a management system structure. NIST AI RMF offers a comprehensive approach to identifying and mitigating AI risks across the lifecycle. If you’ve implemented either, you’re ahead of most organizations.

But neither framework was designed with agentic AI front of mind. They assume a model where you can identify risks, implement controls, and monitor outcomes in a relatively controlled environment. Agentic systems introduce complications that require supplemental thinking.

What’s missing from most implementations:

Action-level permissioning. Your framework probably addresses what data AI can access. Does it address what the AI can do with that data? Can it send emails? Create calendar invites? Modify records? Delete files?
Scope containment. When an agentic system encounters an obstacle in its workflow, can it improvise? Should it? What boundaries exist on its problem-solving autonomy?
Audit trail granularity. You likely log AI queries and outputs. Are you logging intermediate steps, tool calls, and decision points within an agentic workflow?
Failure mode planning. What happens when an agentic workflow partially completes before encountering an error? How do you roll back actions that have already been taken?

If your ISO 42001 or NIST AI RMF implementation doesn’t address these questions, you have work to do.

Talking to Leadership and Customers

The temptation when discussing agentic AI with leadership is to emphasize the efficiency gains. And they are real. But if you lead with efficiency and bury the risk profile, you’re setting yourself up for a difficult conversation later.

My recommendation: be concrete about what “agentic” means in practice.

Don’t say “we’re implementing AI-powered workflow automation.” Say “we’re implementing a system that will have access to our document management system and can independently execute multi-step review processes, including drafting communications and modifying document status.”

Leadership and customers need to understand that this is not a smarter search bar. This is something that takes actions on behalf of your organization. Once they understand that, the conversation about appropriate controls and oversight becomes much more productive.

Also worth addressing directly: these tools are coming whether governance approves them or not. Employees installed tools because the tools are useful. Your governance program needs to account for the fact that prohibition isn’t a realistic strategy.

The Thing You’re Probably Not Thinking About

Here’s what I think most AI governance leaders are underweighting: the interaction effects between multiple agentic systems.

Many organizations are piloting several agentic tools simultaneously. Something for legal. Something else for sales. Another tool for engineering. Each system has its own permissions, its own access, its own scope of action.

What happens when these systems interact? What happens when the output of one agentic process becomes the input to another? You can have two individually well-governed systems that create ungoverned outcomes when combined.

This isn’t theoretical. As organizations deploy more agentic tools, the potential for unexpected interactions increases. Your governance program probably evaluates each tool in isolation. It probably doesn’t model how those tools might interact in practice.

The other thing I’d flag: insurance. Most cyber insurance policies weren’t written with agentic AI in mind. Most E&O policies weren’t either. If an agentic system makes an error that causes client harm, your coverage assumptions may be wrong. This is a conversation to have with your broker sooner rather than later.

Bottom Line

Agentic AI isn’t a marketing term. It represents a genuine shift in what these systems can do and, consequently, what risks they introduce. The governance frameworks we’ve built are useful starting points, but they need extension.

The organizations that will navigate this well are the ones that update their mental models. AI governance is no longer primarily about data and outputs. It’s about actions and permissions. It’s about what you’re authorizing these systems to do on your behalf.

That’s a harder problem. It’s also the actual problem you now face.

Two Standards, One Governance Framework

Andrew Clearwater — Fri, 30 Jan 2026 17:45:13 GMT

If your organization deploys AI systems you’ve probably noticed that many AI governance guidance is written for developers. That’s a problem. You’re accountable for the systems you use, but the frameworks often assume you have access to training pipelines and model internals you’ll never touch.

ISO/IEC 42001 and ETSI EN 304 223 take a different approach. They explicitly include organizations that procure, integrate, and operate AI. Used together, they offer a governance-to-security stack that covers policy, risk management, technical controls, and lifecycle practices. Here’s what you need to know.

What Each Standard Brings to the Table

ISO/IEC 42001 is the AI management system standard. Think of it as ISO 27001’s security-focused cousin, built specifically for AI. It tells you how to establish policies, assign accountability, conduct risk and impact assessments, and create improvement cycles. It uses the same “harmonized structure” as other ISO management system standards, so if you’re already certified to 27001 (security) or 27701 (privacy), integration is straightforward.

ETSI EN 304 223 is a new European standard focused on baseline cybersecurity for AI. It organizes 13 principles across five lifecycle phases and assigns each provision to specific stakeholder roles. Where ISO 42001 stays at the governance level, EN 304 223 gets into the weeds: threat modeling, access controls, supply chain due diligence, secure disposal of training data.

Why Using Both Makes Sense

You get governance and execution in one package. ISO 42001 ensures you have the leadership commitment, policies, and audit mechanisms in place. EN 304 223 ensures those policies translate into actionable security controls.

You cover the supply chain end-to-end. Both standards recognize that AI systems involve multiple parties. ISO 42001 requires you to document responsibilities and ensure suppliers align with your responsible AI approach. EN 304 223 adds teeth: due diligence assessments, documentation requirements for external components, and assurance that suppliers meet security baselines.

Risk and impact assessments work together. ISO 42001’s AI risk assessment process defines criteria for acceptable versus non-acceptable risks and evaluates potential consequences to individuals and society. EN 304 223 layers on AI-specific threat modeling and requires you to update threat models whenever configurations change. Together, you move from “we have a risk register” to “we know what attacks look like and how to respond.”

Human oversight is baked in. ISO 42001 calls for defining roles, responsibilities, and authority for human oversight. EN 304 223 Principle 4 requires you to build technical capabilities for that oversight into the system design. Policy meets design.

The Gaps You’ll Need to Fill

Neither standard is perfect. Here are the areas where you’ll need to supplement or adapt:

No official cross-mapping. You’ll have to build your own alignment between ISO 42001’s controls and EN 304 223’s provisions. Budget time for this.

Security-heavy, ethics-light. EN 304 223 focuses on cybersecurity. If your governance program needs to address fairness, bias, or societal impact in depth, you’ll rely more on ISO 42001’s impact assessment requirements and potentially supplement with frameworks like the NIST AI RMF.

Continuous-learning systems are under-addressed. Both standards mention that adaptive AI poses unique risks, but neither offers detailed controls for systems that retrain on production data. If you deploy these, expect to layer on additional technical policies.

Certification infrastructure is immature. ISO 42001 is certifiable, and EN 304 223 has a conformance assessment specification, but the auditor ecosystem is still developing.

Documentation can pile up. Both standards require extensive documentation. Without governance tooling, the administrative burden can become a real issue.

How to Get Started

Inventory your AI systems and classify them. Identify which are high-risk under your own criteria or under applicable regulations (like the EU AI Act). Prioritize those for full ISO 42001 / EN 304 223 treatment.
Build a control crosswalk. Map ISO 42001 Annex A controls to EN 304 223 provisions.
Integrate with existing management systems. If you have ISO 27001 or ISO 27701 certifications, extend them rather than starting fresh.
Use EN 304 223’s stakeholder roles internally. Assign people to the Developer, System Operator, and Data Custodian roles even if you’re only deploying third-party AI.
Push requirements upstream. Use ISO 42001’s third-party controls and EN 304 223’s supply chain provisions to set vendor expectations in contracts. Require documentation, security assessments, and incident notification commitments.
Plan for the full lifecycle. EN 304 223 is one of the few standards that explicitly addresses secure end of life, including data and model disposal. Build those steps into your change management processes now.

The Bottom Line

For organizations that operate AI rather than build it, ISO 42001 and EN 304 223 are a practical pairing. ISO 42001 gives you the management scaffolding. EN 304 223 fills in the security specifics. Neither is perfect alone. Together, they get you closer to a defensible, auditable AI governance program that regulators, customers, and internal stakeholders can understand.

Agentic Commerce

Andrew Clearwater — Thu, 15 Jan 2026 16:30:49 GMT

Image from Google’s Announcement

Why UCP Matters

The Universal Commerce Protocol promises an open-source standardized way to share commerce signals across an ecosystem that includes merchants, publishers, and platforms.

For consumers, it can enable conversational shopping. When paired with other technologies, researching, to choosing a product or service, all the way through checkout can happen over a chat.

For AI and data leaders, that means unlocking measurement and orchestration without leaning on sprawling third-party identifiers.

Key Considerations

Start with Purpose

If you cannot write down a crisp purpose for every UCP event you plan to emit or ingest, you’re not ready. Treat purpose like a contract with your users and regulators. Build a purpose inventory that maps each event to a lawful basis, a consent signal, a retention window, and the teams who can touch it (access).

Minimize Collection Plans

UCP’s promise isn’t a license to collect more. It’s an invitation to collect less and get more from it. Drop optional fields unless they directly serve your purpose. Aggregate wherever you can. Data you never collect never leaks.

Linkability Is the New Attack Surface

Even hashed or pseudonymous data can be re-identified through combinations of rare SKUs, timestamps, or locations. Prefer transient, context-limited identifiers, rotate keys aggressively, and cap the resolution of events. Stress test your setup with re-identification exercises and shut down the joins that shouldn’t be possible.

Consent

Wire your SDKs and server endpoints so consent toggles immediately suppress event creation, transmission, and downstream processing. For multi-region deployments, align consent UX to local law and document the legal basis per purpose.

Add the Guardrails

If UCP signals feed your models, register each use case in your AI inventory and lock in a feature allowlist. Ban sensitive inference targets and run pre-deployment bias and privacy leakage tests. Publish transparency reports (model cards/system cards) that record UCP provenance, retention windows, and constraints.

Respect Borders

If any UCP data crosses borders, run transfer assessments, use the right contractual scaffolding, and keep processing regional where possible. Don’t let “just for troubleshooting” become a backdoor for global access.

Prep for An Incident Before You Have One

Incidents will happen. Extend your playbooks to cover schema abuse, protocol poisoning, and key compromise. Coordinate with partners so you know who to call.

How to Launch in 30–60 Days

Week 1: Map data flows, draft the purpose inventory, and pick your initial use case(s).
Week 2: Implement consent gating, schema allowlists, and retention policies.
Week 3: Harden transport and keys; run re-identification and privacy leakage tests.
Week 4: Complete role mapping and contracts; finalize regional processing paths.
Week 5–6: Register models, produce model/data cards, conduct tabletop incident drills, and bring the AI Governance committee together to make a decision on whether the project is ready to launch.

Bottom Line

UCP can be a force multiplier for commerce analytics and AI. Treat purpose as a product requirement, build minimization into your schemas, and pressure-test linkability like it’s your top security risk. Do that, and you’ll ship faster, stay compliant, and keep trust intact. Execute on these controls to capture measurement gains without trading away compliance or user trust.

The EO Effect

Andrew Clearwater — Fri, 12 Dec 2025 14:22:06 GMT

The ink is barely dry on President Trump’s new Executive Order, Ensuring a National Policy Framework for Artificial Intelligence, and its implications for corporate governance programs are profound. This is a strategic milestone that could reshape compliance landscapes across all 50 states.

The Big Picture

The Executive Order (EO) aims to dismantle the “patchwork” of state-level AI regulations, which the administration argues stifles innovation and threatens U.S. dominance in AI. The EO’s message is clear: uniformity over fragmentation.

The U.S. Executive Order embodies what we might call “The EO Effect”—a deliberate push toward deregulation and federal preemption to accelerate innovation and maintain global AI dominance. It frames compliance as a barrier to competitiveness, aiming for a “minimally burdensome” national standard that overrides conflicting state laws.

In stark contrast, the EU’s “Brussels Effect” operates on the principle of exporting stringent regulatory norms beyond its borders. Through instruments like the AI Act and GDPR, Europe has positioned itself as the global benchmark for responsible AI, prioritizing transparency, accountability, and human rights. For multinational companies, this creates a dual reality: a U.S. model favoring agility versus an EU model that enforces rigorous safeguards. Navigating these opposing forces will define the next era of AI compliance strategy.

Key EO Actions You Can’t Ignore

AI Litigation Task Force: The Attorney General will challenge state AI laws deemed inconsistent with federal policy. Expect litigation aimed at laws that regulate interstate commerce or compel “truthful output alterations.”
Commerce Department Review: Within 90 days, expect a published evaluation of state laws, flagging those that conflict with the EO’s principles.
Funding Leverage: States with “onerous” AI laws risk losing Broadband Equity, Access, and Deployment (BEAD) program funds and other federal grants. This is a powerful incentive for states to align.
Federal Preemption on the Horizon: The EO directs preparation of legislation to establish a uniform federal AI framework. Areas like child safety and state procurement will remain outside this framework allowing for variation and higher standards.

Timeline:

Dec 11, 2025: EO signed.
Jan 10, 2026: AI Litigation Task Force established.
Feb 9, 2026: States begin legal challenges or compliance reviews.
Mar 11, 2026: Commerce evaluation, BEAD funding notice, FTC policy statement.
Apr 10, 2026: States amend or suspend laws to retain funding.
Jun 9, 2026: FCC disclosure standard proceeding begins.
TBD: Legislative recommendation prepared.

Why This Matters for In House Lawyers

For in-house counsel, this EO signals a shift from multi-jurisdictional compliance toward a centralized federal standard. But remain skeptical. Congress has previously rejected similar preemption efforts, and states are unlikely to surrender easily. Litigation risk is real, and companies operating in states with aggressive AI laws should prepare for uncertainty.

Note the EO’s emphasis on First Amendment concerns. Laws requiring disclosure of AI model details or bias mitigation could be challenged as unconstitutional. This raises thorny questions about transparency obligations versus free speech protections.

Creative Compliance: Turning Risk into Strategy

Rather than viewing this EO as a compliance headache, consider it an opportunity to:

Audit State-Level Exposure: Map where your AI deployments intersect with state laws flagged as “onerous.”
Engage in Policy Advocacy: Your voice matters in shaping a balanced federal standard.
Revisit AI Governance Frameworks: Align internal policies with anticipated federal principles and current standards like ISO 42001 and NIST AI RMF to future-proof your compliance posture.

Bottom Line

The EO is a material change and the road ahead is anything but smooth. For lawyers, the challenge is clear: anticipate, adapt, and advocate. In this new era of AI regulation, those who can navigate the shifting sands of federal-state dynamics will lead.

A Governance Methodology for the Agentic Future

Andrew Clearwater — Wed, 03 Dec 2025 21:14:05 GMT

(Source: Figure 7 Foundations for AI agent evaluation and governance from World Economic Forum: AI Agents in Action: Foundations for Evaluation and Governance)

Overview

As artificial intelligence (AI) agents transition from experimental oddities to integrated collaborators, organizations face new challenges in effective governance. The World Economic Forum’s November 2025 white paper, developed in collaboration with Capgemini, provides a framework for the responsible adoption, evaluation, and governance of AI agents. This guidance is particularly timely as enterprises anticipate widespread agent deployment in the coming years, with 82% of surveyed organizations planning integration within one to three years. So, if you are focused on chatbots and haven’t seen agents in use yet, you will! The following gathers some of the best ideas from the paper. If these ideas are making sense and sounds helpful, it’s going to be worth reading the full 34-page pdf.

Technical Background

AI agents are evolving beyond static, rules-based software to dynamic, intent-driven systems powered by large language models (LLMs) and generative AI. Their architectures are made up of three interconnected layers: (1) application, (2) orchestration, and (3) reasoning. New protocols such as the Model Context Protocol (MCP) and Agent-to-Agent Protocol (A2A) facilitate seamless integration and interoperability across enterprise systems and multi-agent environments.

These advances introduce novel risks. This is where governance practices such as robust identity management, micro-segmentation, and continuous verification strategies, as well as treating every agent interaction as untrusted by default become important.

Classification

Stage 1 of 4 is using systematic classification of AI agents. Rather than focusing solely on modality or domain, organizations should assess agents by their function, role (specialist vs. generalist), predictability (deterministic vs. non-deterministic), autonomy, authority, and operational context. This multidimensional approach clarifies what an agent is designed to do, the scope of its decision-making, and the complexity of its environment.

The paper uses the example of robot vacuum cleaner (i.e. Roomba) which is a specialist agent with medium autonomy and low authority, operating in a moderately complex household environment. In contrast, a personal digital assistant may have broader authority and operate across multiple platforms, requiring more sophisticated governance.

Evaluation

Stage 2 of 4, is where evaluation frameworks come in. These are critical for building trust in agentic systems. Emerging benchmarks such as AgentBench and SWE-bench provide valuable signals, but organizations must contextualize evaluation to real-world workflows, measuring metrics like task success rate, completion time, error types, and user trust indicators.

Evaluation should be a continuous process, beginning with technical screening, progressing through controlled deployment, and culminating in full integration with ongoing monitoring. Collaboration between providers and adopters is essential to establish meaningful metrics and ensure agents operate safely and compliantly.

Risk Assessment

Stage 3 of 4 is where risk assessment links evaluation results to oversight. The process involves defining context, identifying risks, analyzing likelihood and impact, prioritizing risks, and implementing mitigation measures. Risks may include cybersecurity threats, safety hazards, legal and regulatory challenges, and stakeholder impacts. Anyone familiar with working under an ISO 27001 risk management system will be right at home with this process, but the type and diversity of risks will be new and challenging.

For instance, the paper uses the example of autonomous vehicles, risk assessment here would focus on failures in perception, decision-making, and control systems, with mitigation strategies might include sensor redundancy, anomaly detection, and real-time incident reporting. The goal is to ensure residual risk remains within acceptable boundaries.

Governance

Stage 4 of 4 brings in progressive governance approaches to scale safeguards in proportion to an agent’s autonomy, authority, and contextual complexity (or the amount of risk). Baseline mechanisms include least-privilege access control, legal and compliance checks, sandbox testing, monitoring and logging, human oversight, traceability, and explainability. As agents become more complex, governance must evolve to incorporate additional multi-layered systems of control and accountability.

New Risks and Opportunities

If not already true for you today, the near future will be full of multi-agent ecosystems. Agents interact across organizational and technical boundaries. This interconnectedness introduces new risks, including orchestration drift, semantic misalignment, security gaps, and cascading failures. Establishing interoperable standards and dedicated (automated) governance will be critical for scalable oversight (take a look at Table 2 on page 26 for a great starting point of key measures).

Key Takeaways

Responsible adoption of AI agents requires a structured approach to classification, evaluation, risk assessment, and governance.
Technical advances in agent architectures and protocols must be matched by advances in cybersecurity and oversight mechanisms.
Systematic classification clarifies agent roles and informs proportionate safeguards. It’s likely that what you use today is not providing enough context.
Continuous evaluation and risk assessment are essential for safe deployment.
Progressive governance frameworks should scale with agent complexity.
As multi-agent ecosystems emerge, organizations must invest in interoperable standards and governance.

Theres a lot of value to be unlocked by these technologies and the architecture of governance needs to evolve with these changes.

Guardians of the Digital Playground

Andrew Clearwater — Wed, 12 Nov 2025 16:03:13 GMT

Across the United States, states are advancing app store accountability laws that expand compliance obligations for developers, platforms, and digital marketplaces. These measures—enacted in Texas, Utah, Louisiana, and California—signal a shift toward state-level governance of digital distribution and parental consent mechanisms traditionally overseen by federal frameworks such as COPPA.

Expanding Definitions and Core Requirements

Each law defines “app store” broadly, extending well beyond mobile ecosystems to include gaming platforms, ebook marketplaces, and streaming services. Common obligations include:

Verifying user age and obtaining parental consent for minors
Linking parent and child accounts
Sharing data between developers and app stores
Notifying platforms of material app changes, such as monetization or data use shifts
Assigning age-based content ratings

The operational implications are significant. Developers must integrate new verification systems, rework consent flows, and maintain audit-ready documentation as state requirements diverge.

These state initiatives also intersect with federal privacy law. When apps or stores collect data indicating a user’s age, they may trigger obligations under COPPA, increasing the complexity of compliance and potential exposure to federal enforcement.

In response to the growing patchwork, lawmakers introduced the federal App Store Accountability Act, aiming to standardize obligations and reduce jurisdictional conflicts. Ongoing litigation tests First Amendment and commerce clause claims that could reshape how states regulate app ecosystems.

Operational Impacts

For counsel and compliance teams, the trend underscores a need to monitor both statutory developments and technical implementation guidance. Until a federal baseline emerges, businesses operating across states must plan for jurisdiction-specific compliance mapping, data governance redesign, and parent–child account architecture changes.

Conduct Jurisdictional Assessments: Identify which state laws apply to your app distribution model and user base. Track emerging legislation regularly.
Map Data Flows and Verification Processes: Audit how age, parental consent, and account-linking information are collected, stored, and shared among developers, app stores, and third parties.
Update Consent and Age-Gating Mechanisms: Implement flexible, modular systems that can be configured to satisfy varying state-specific requirements without codebase fragmentation.
Revise Privacy Policies and Disclosures: Ensure transparency with users, particularly regarding data collection related to minors, parental controls, and content ratings.
Prepare for Litigation Risk: Monitor ongoing legal challenges for potential impacts on enforcement and compliance obligations, especially around First Amendment and federal preemption issues.
Engage in Industry and Regulatory Dialogues: Participate in advocacy for federal harmonization and stay aligned with app store operators’ policy updates and compliance tools.

Looking Ahead

With additional states considering similar legislation companies face a rapidly evolving regulatory landscape. Businesses should closely monitor developments, assess their compliance strategies, and prepare for potential changes in both state and federal requirements.

Andrew Clearwater

Why 50+ AI Companies Just Agreed to Report Transparently

First: What Actually Changed in HAIP 2.0

Why Companies Are Committing

The Three Things HAIP 2.0 Still Doesn’t Do

1. It’s Retrospective, Not Prospective

2. There Is No Verification

3. It Doesn’t Cover the Most Consequential Decisions

What This Means If You’re Building AI Governance Infrastructure

The Bigger Picture: Where We Actually Are in AI Governance

Practical Takeaways

Primary Source Reading List for AI Governance Practitioners

The HAIP Framework — Active Portal

The Foundational Documents — What Organizations Are Actually Committing To

The Evidence Base — What Reporting Has Actually Produced So Far

The Upstream Standards — What HAIP Is Built On

The Launch Announcement

Safe Words vs. Safe Actions

There’s a hole in AI safety the size of your entire production environment

The two worlds of AI safety

What “Boiling the Frog” actually tests

Why this is actually your problem right now

The model isn’t the only safety layer

What you should actually be thinking about

The benchmark you didn’t know you needed

One more thing worth sitting with

Your Marketing Team Just Set Your AI Risk Classification

First, Let’s Be Clear About What These Guidelines Are (and Aren’t)

The Intended Purpose Doctrine: This Is the Whole Game

1. Broadly Described AI Systems Face a Default Presumption of High-Risk Coverage

2. The Self-Assessment Is the Provider’s Responsibility—But It Will Be Scrutinized

3. Name/Trademark Application and Other Third-Party Triggers

The Article 6(1) Safety Component Analysis: Two Tests, Not One

Prong 1: Safety Function (Intent-Based)

Prong 2: Failure or Malfunction Endangerment (Consequences-Based)

The Third-Party Conformity Assessment Requirement—and a Common Misread

The Article 6(2) Annex III Analysis: Eight Areas and the Issues Practitioners Will Miss

Human Oversight Does Not Change Your Classification

The Article 6(3) Filter Mechanism: Your Actual Escape Valve—and Its Limits

Agentic AI and Complex Systems: The Anti-Fragmentation Rule

The “On Behalf Of” Clause: B2B Providers Serving Public Sector Clients

The Timeline Has Shifted: What You Need to Know Right Now

Five Governance Process Changes You Should Make Based on These Guidelines

What to Do Right Now During the Consultation Period

Bottom Line

The EU Just Hit Snooze on AI Regulation

First, the Actual Changes (Fast)

Now, Here’s What Most People Are Missing

The Deeper Pattern Here

What You Should Be Doing Right Now

Bottom Line

Primary Sources

The AI Governance Stack Has Holes in It.

The Common Story (and Why It’s Wrong)

The Five-Stage Breakdown. And Where Each One Breaks.

Stage 1: Risk Planning. You Can’t Scope What You Can’t Define.

Stage 2: Risk Identification. You Can’t Find What You Don’t Know to Look For.

Stage 3: Risk Analysis. The Data You Have Isn’t the Data You Need.

Stage 4: Risk Evaluation. Accepting Risk You Haven’t Measured.

Stage 5: Risk Mitigation. The Controls You’re Relying On Are Fragile.

What This Means for Your Organization Right Now

The Honest Bottom Line

Primary Sources: Go Read These Yourself

Transparency Is the New Security Perimeter

The Framing Mistake Every Governance Team Is Making

Why Most Governance Teams Are Logging Wrong

The Unexpected Insights Governance Experts Need to Internalize

Your logs are plaintiffs’ evidence. Architect accordingly.

Logging is in direct tension with GDPR. Most teams are one DPA inquiry away from catastrophe.

Agentic AI detonates every logging architecture built for single-turn inference.

Human oversight logs are the next audit target.

Insurance will force the issue faster than regulators.

The standards split is itself an admission that transparency is unsolved.

Why the Split Matters More Than the Standards Themselves

What You Can Do Today

The Bottom Line

Further reading

You Need the Model to Fight the Model

The Paradox at the Heart of Everything

“To Protect Against the Model, You Need Access to the Model”