Canvas Medical SAIL 2026 · Puerto Rico

How We Govern Clinical AI

Governance is the continuous practice of monitoring, evaluating, iterating, and re-evaluating clinical AI systems throughout their operational lifecycle. It is distinct from point-in-time evaluation. It requires architectural choices made at design time, not compliance activities added after deployment.

Four Evaluation Dimensions

We integrate four evaluation dimensions within a single continuous system. While governance frameworks have been proposed in the literature, no published work has demonstrated operational governance of a deployed clinical AI agent with empirical evidence.

Dimension 1

Rubric Validation

Case-specific rubrics authored by expert clinicians encode what correct documentation should contain for each clinical encounter.

Dimension 2

Live Clinician Feedback

Real-use failure detection during deployment. Clinicians report issues through in-workflow mechanisms during live patient encounters.

Dimension 3

Technical Performance

Latency, failure rates, and reliability monitored through tiered logging architecture with per-stage attribution.

Dimension 4

Cost Tracking

Economic sustainability of governance. Token attribution, compute costs, and clinician time tracked across every governance activity.

These dimensions are connected by controlled experimentation that gates every engineering change. Candidate system versions are tested against the full benchmark before deployment. No change ships without quantitative evidence.

Design Principles for Governability

Governability is an architectural property: the degree to which a system's design enables governance. These design choices, made during system development, determine whether governance is tractable.

Structured Outputs

Typed, schema-defined objects at every pipeline stage enable automated validation and cross-version comparison.

Explicit Intermediate Reasoning

Inspectable reasoning layers enable failure attribution to specific pipeline stages.

EHR-Bounded Action Space

Constrained to predefined clinical actions validated against the patient chart.

Computable Performance Objective

Case-specific rubrics produce quantitative scores enabling controlled version comparison.

Peer-Reviewed Papers

Data & Code