Skip to main content
Governance11 min readPublished May 8, 2026

AI Governance Checklist for Operations Teams

Before your AI system touches a client file, a financial record, or a regulated workflow, there are 15 governance questions you need to answer: data flow, model allowlists, prompt versioning, audit logs, human-in-the-loop gates, and breach-notification posture. This checklist walks through each one.

Key takeaways

  • NIST AI 600-1 (July 2024) names four obligations: governance, content provenance, pre-deployment testing, and incident disclosure.
  • EU AI Act prohibitions and AI literacy requirements have applied since 2 February 2025; high-risk obligations bind from 2 August 2026.
  • ISO/IEC 42001:2023 is the first international AI management system standard. Certifications are now achievable.
  • BAA scope is product-specific. ChatGPT consumer is not covered; OpenAI API under a signed BAA is.
  • The minimum-viable governance pack is 5 documents and ships in a week. There is no excuse for not having it before production.

AI governance, in 2026, is no longer optional. The NIST AI Risk Management Framework has been published since January 2023, with a Generative AI Profile (NIST AI 600-1) since July 2024. The EU AI Act's prohibitions and AI literacy obligations have applied since 2 February 2025. ISO/IEC 42001 (the first international AI management system standard) was published in December 2023. HIPAA's audit-control requirements at 45 CFR 164.312(b) apply to AI systems handling PHI. The regulatory floor is established.

What is still optional, in practice, is whether a given firm chooses to operate above the floor. Two cases set the public benchmark for what happens when they do not. Moffatt v. Air Canada (B.C. Civil Resolution Tribunal, 2024) held the airline liable for negligent misrepresentation when its chatbot fabricated a bereavement-fare policy and rejected the argument that the chatbot was a separate legal entity. Mata v. Avianca (S.D.N.Y., 2023) sanctioned two attorneys $5,000 for filing a brief with fabricated citations generated by ChatGPT. The lesson in both: the deployer owns the output. Full stop.

Below are 15 governance questions, organised into five themes (Data, Model, Process, People, Incident), with a one-line minimum-viable answer for each. At the end is the five-document governance pack we ship with every Acme Consulting engagement.

Theme 1: Data

Q1. Where does PHI or PII enter the AI pipeline, and where does it leave?

Why it matters: data flow gaps are how regulated content ends up in third-party training corpora. The fix is documentation, not posture.

Minimum-viable answer: a one-page data flow diagram listing every system, vendor, region, and retention window the data touches (input, embeddings, prompts, outputs, logs).

Q2. Do you have a signed BAA with every AI vendor that touches PHI, and is the specific product or endpoint in scope?

Why it matters: BAA scope is product-specific. A vendor's BAA may cover one product line and not another. ChatGPT consumer is not covered; OpenAI API under a signed BAA is. NotebookLM is not covered by Google Workspace's BAA.

Minimum-viable answer: a vendor matrix listing product, endpoint, BAA execution date, and the named services in the BAA's covered list.

Q3. Is zero-data-retention or training-data exclusion enabled and contractually confirmed for every endpoint?

Why it matters: default API behaviour at major providers retains prompts for up to 30 days for abuse monitoring. ZDR is opt-in via enterprise agreement. If you have not explicitly arranged it, you do not have it.

Minimum-viable answer: ZDR confirmation email or contract clause filed per provider, plus a periodic API health check that asserts the expected retention behaviour.

Theme 2: Model

Q4. Is there a model allowlist with named versions, and a process to admit new models?

Why it matters: silent model upgrades change behaviour and break evaluations. 'GPT-4' is not a version. 'gpt-4.1-2025-04-14' is.

Minimum-viable answer: an allowlist file in version control with pinned snapshots, plus a one-page admission checklist (eval pass, BAA scope, cost ceiling, intended use cases) that a new model has to clear.

Q5. Are prompts versioned and tied to a commit hash or template ID stored with every output?

Why it matters: you cannot reproduce or defend an output if you cannot recover the exact prompt that produced it. Defence in litigation, regulatory inquiry, or an internal incident review all start here.

Minimum-viable answer: prompts in git, template ID logged on every call, change reviews required.

Q6. Have you run pre-deployment evaluations and adversarial red-team tests for your specific use case, not generic benchmarks?

Why it matters: NIST AI 600-1 names pre-deployment testing as one of four core obligations. Generic MMLU scores tell you nothing about your contracts workflow.

Minimum-viable answer: 50 to 100 task-specific evaluation cases with golden answers, plus a red-team pass covering prompt injection, jailbreak, PHI exfiltration, and hallucination on out-of-scope queries.

Theme 3: Process

Q7. What outputs require human-in-the-loop sign-off before they reach a client, patient, or regulator?

Why it matters: EU AI Act Article 14 requires effective human oversight for high-risk systems. The Air Canada and Mata cases both stem from removed humans.

Minimum-viable answer: a written list of decision types that cannot ship without a named human approver, with the approval logged.

Q8. Is every AI interaction captured in an audit log with the full reproducibility tuple?

Why it matters: HIPAA at 45 CFR 164.312(b) requires audit controls. Defending a bad output requires the ability to reconstruct it. 'We logged the question' is not enough.

Minimum-viable answer: logs containing input, output, model and version, prompt template ID, retrieval context IDs, evaluator scores, human reviewer and decision, timestamp, user and tenant. Retain six years for HIPAA-covered workflows.

Q9. Is there a data retention and deletion policy that applies to vector stores, caches, and logs, not just the source database?

Why it matters: embeddings are derivative PHI. Most teams forget the vector index when honouring deletion requests, and silently fail their own retention policy.

Minimum-viable answer: a retention table per data store (source, embedding index, prompt cache, output log, eval set) with deletion SLAs and a quarterly purge job.

Theme 4: People

Q10. Who owns AI risk, and do they have authority to halt a deployment?

Why it matters: ISO/IEC 42001 requires top-management commitment and a defined AI accountability structure. A committee is not an owner.

Minimum-viable answer: a named accountable executive (often COO or CISO), an AI review committee that meets monthly, and a documented kill-switch path.

Q11. Have staff who use AI on regulated workflows completed AI literacy training, and is it documented?

Why it matters: EU AI Act Article 4 requires AI literacy for staff operating AI systems, effective 2 February 2025.

Minimum-viable answer: a 30-minute training module covering acceptable use, prompt-injection awareness, what not to paste, and escalation. Signed acknowledgement retained.

Q12. Have you run vendor due diligence on AI providers (SOC 2, ISO 42001, sub-processor list, jurisdiction)?

Why it matters: a BAA is necessary but not sufficient. You still own breach liability when your vendor fails.

Minimum-viable answer: a short vendor questionnaire covering certifications, sub-processors, data residency, breach SLA, and termination data return.

Theme 5: Incident

Q13. Do you have an AI-specific incident classification taxonomy and a runbook?

Why it matters: a hallucinated diagnosis, a prompt injection that exfiltrates a record, and a model outage are three different incidents with three different responses.

Minimum-viable answer: 5 to 7 incident types (hallucination causing harm, PHI leakage, successful prompt injection, model regression, vendor outage), each with first-30-minute steps.

Q14. What is your breach-notification posture if an LLM vendor has an incident affecting your data?

Why it matters: HIPAA's Breach Notification Rule (45 CFR 164.400-414) requires notification within 60 days of discovery. You have to know what your vendor will tell you and when.

Minimum-viable answer: vendor notification SLA recorded in the BAA, plus an internal triage path that maps vendor incidents to your 60-day clock.

Q15. Do you log and review AI incidents, and feed lessons back into evaluations and prompts?

Why it matters: NIST AI 600-1 names incident disclosure as a core GenAI obligation. ISO 42001 demands continual improvement (Plan-Do-Check-Act). Without a feedback loop, you keep failing in the same ways.

Minimum-viable answer: a simple incident log, plus a monthly review that closes the loop by adding a regression evaluation case for every confirmed incident.

The minimum-viable governance pack (five documents, one week)

There is no good reason to be unable to produce the documents below before production. The pack covers all five themes above, can be assembled in a week, and gives auditors and risk committees everything they typically ask for in the first conversation.

  1. AI Use Policy (2 pages): approved use cases, prohibited inputs (PHI/PII rules), approved vendors and endpoints, escalation contacts.
  2. Vendor and BAA Register (spreadsheet): vendor, product, endpoint, BAA date, ZDR status, sub-processors, data residency, breach SLA.
  3. Model and Prompt Allowlist (markdown in git): pinned model versions, prompt template IDs with owners, change-review rule.
  4. Audit Log Specification and Retention Schedule (1 page): the nine-field log tuple (input, output, model, prompt ID, retrieval IDs, eval scores, reviewer, timestamp, tenant), retention windows per data store, deletion SLA.
  5. AI Incident Runbook (2 pages): five to seven incident types, first-30-minute actions, notification tree (internal, vendor, regulator), post-incident eval-addition requirement.

Why we ship the governance pack on day one

The five-document pack is a deliverable on every Acme Consulting engagement, alongside the evaluation set and the acceptance criteria. We ship it on day one because retrofitting governance after a pilot is in production is the most expensive, and most demoralising, work in AI consulting. The pack is small. The cases (Air Canada, Mata, the privilege-loss decisions of 2025 and 2026) are large. There is no version of this you can put off.

If you can answer all 15 questions in writing today, you are running a governed AI operation. If you cannot, the pack above is the shortest path to running one.

Sources

Want to talk through this?

Book a 30-minute discovery call

We will review your specific workflow against the framework above and tell you whether a Readiness Assessment would pay off. No pitch, no obligation.

Book a free discovery call →

Related guides

AI Governance Checklist for Operations Teams | Autra