Deployers managing multiple high-risk AI systems need a portfolio register, cross-system risk analysis, and quarterly executive reporting. Operator escalation guides, break-glass testing checklists, and seven key monitoring metrics with defined thresholds provide the operational framework for ongoing compliance.
Portfolio management for deployers of multiple high-risk AI systems requires a register tracking system identity, provider, risk classification, Annex III category, FRIA status, compliance record currency, monitoring status, and next review date. Cross-system risk analysis identifies patterns across the portfolio including multiple systems from the same provider exhibiting similar issues, common data quality problems, and regulatory developments affecting entire categories. The operator escalation guide defines graduated responses from immediate break-glass activation for imminent risk through pattern-based escalation for suspected group unfairness to observation logging for isolated anomalies. Break-glass testing verifies system halt within 30 seconds, pending case preservation for manual decision, provider notification within five minutes, AI Governance Lead alert within five minutes, DPO notification within 15 minutes for personal data, and senior management briefing within one hour. Seven monitoring metrics provide operational oversight with defined thresholds: output distribution shift flagging investigation above 0.10, override rate triggering automation bias investigation below 2 percent, review dwell time decline beyond 20 percent, complaint volume sustained increase above 50 percent, error rate increase above two percentage points, subgroup divergence exceeding 1.5 times the aggregate, and calibration case performance below 80 percent requiring retraining.
Deployers operating multiple high-risk AI systems require portfolio-level governance that extends beyond managing each system individually.
Deployers operating multiple high-risk AI systems require portfolio-level governance that extends beyond managing each system individually. A portfolio register tracks all high-risk systems with their system identity and provider, risk classification and Annex III category, FRIA status, compliance record currency, monitoring status, and next review date. This register provides the AI Governance Lead with a single view of the organisation's AI compliance posture.
Cross-system risk analysis monitors for patterns across the portfolio that individual system monitoring would miss. Multiple systems from the same provider exhibiting similar issues may indicate a provider-level problem rather than system-specific deficiencies. Common data quality problems across several systems may point to a shared data source issue. Regulatory developments such as new guidance from the AI Office or enforcement actions against comparable systems may change the compliance posture of an entire category of systems simultaneously.
The eight-module deployer compliance record maps each deployer obligation to a discrete documentation area with defined review frequencies.
The eight-module deployer compliance record maps each deployer obligation to a discrete documentation area with defined review frequencies. Module D1 covers system identification and provider reference including the Declaration of Conformity receipt, reviewed at onboarding and on provider updates. Module D2 covers intended purpose and deployment context, reviewed quarterly and on context changes. Module D3 contains the Fundamental Rights Impact Assessment under Article 27, reviewed annually and on material changes.
Module D4 documents human oversight arrangements, training records, and break-glass procedures under Articles 26(2), 14, and 4, reviewed quarterly and on operator changes. Module D5 covers monitoring, incidents, and provider communications under Articles 26(4), 26(5), and 73, with monitoring reviewed monthly and incidents tracked continuously. Module D6 addresses data protection including the DPIA, lawful basis, and data subject rights under GDPR, reviewed annually and on processing changes. Module D7 covers EU database registration for public authority deployers under Article 49(3). Module D8 documents the review schedule, Article 25 reassessment, and version history, reviewed quarterly.
Operators at Level 2 of the oversight pyramid need a clear, simple decision framework for responding to unexpected system behaviour.
Operators at Level 2 of the oversight pyramid need a clear, simple decision framework for responding to unexpected system behaviour. If anyone is at immediate risk, the operator activates the break-glass procedure immediately without waiting for approval or further analysis.
Seven monitoring metrics provide the quantitative foundation for deployer-level oversight, each with threshold guidance and escalation paths through the oversight pyramid.
Seven monitoring metrics provide the quantitative foundation for deployer-level oversight, each with threshold guidance and escalation paths through the oversight pyramid.
Output distribution shift measured by Population Stability Index should trigger investigation above 0.10 and escalation above 0.25, routed from Level 1 to Level 3. Override rate should trigger investigation below 2 per cent for potential automation bias or above 30 per cent for potential calibration issues, escalating from Level 2 to Level 3. Review dwell time should trigger investigation when average time per case declines more than 20 per cent from the baseline. Complaint volume should trigger escalation on sustained increases above 50 per cent from Level 3 to Level 4.
The break-glass test checklist should be executed regularly with results documented in Module D4. The checklist verifies system halt, case preservation, notification chains, and documentation completeness.
Override rates persistently below 2% suggest operators are accepting outputs without adequate scrutiny. Investigation should determine whether operators have the training, tools, and incentives to exercise genuine oversight.
Number and classification of systems, aggregate compliance status, approaching review deadlines, open issues, resource constraints, and regulatory developments affecting the portfolio.
Through a portfolio register tracking all systems, cross-system risk analysis for common patterns, and quarterly executive reporting on compliance status and resource constraints.
Immediate risk triggers break-glass activation. Patterns of unfair treatment escalate to management and AI Governance Lead. Poor performance escalates with examples. Isolated observations are logged for monitoring.
System halt within 30 seconds, case preservation, provider notification within five minutes, AI Governance Lead alert within five minutes, and senior management briefing within one hour.
Seven metrics: output distribution shift, override rate, review dwell time, complaint volume, error rate, subgroup divergence, and calibration case performance, each with defined escalation thresholds.
Quarterly portfolio reporting to executive leadership summarises the number and classification of systems in the portfolio, the aggregate compliance status across all systems, approaching review deadlines that require resource allocation, open issues requiring attention or escalation, resource constraints affecting the organisation's ability to maintain compliance, and regulatory developments that may affect the portfolio.
If no one is at immediate risk, the operator assesses whether the observation is a single unusual output or a pattern. A single unexpected output should be documented, the operator should apply professional judgement, and the observation should be logged for monitoring. A pattern of unusual outputs requires further assessment: if the pattern suggests unfair treatment of a particular group, the operator escalates to their manager and the AI Governance Lead immediately. If the system appears to be performing poorly or behaving differently from its documented purpose without a fairness dimension, the operator escalates to their manager with documented examples. Observations that do not fit these categories are logged and monitored for recurrence.
Error rate should trigger escalation when the rate increases more than 2 percentage points, from Level 1 to Level 3. Subgroup divergence, measuring error rate differences across protected characteristic subgroups, should trigger escalation when any subgroup exceeds 1.5 times the aggregate rate, routing directly to Level 4. Calibration case performance measuring operator accuracy on known-answer cases should trigger retraining below 80 per cent and suspension below 60 per cent.
Break-glass procedures should be tested annually against a defined checklist: system halts within 30 seconds, pending cases are queued for manual decision rather than lost, provider and AI Governance Lead are notified within 5 minutes, the system remains halted until criteria are met and restart is authorised, and all records are documented in Module D4.
CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.