The Three Questions Every Board Is Asking About AI (And the Three It Should Be)

Updated
What Boards Get Wrong About AI ⊹ Blog ⊹ BN Digital
Fig. 0

A Governance Gap That Looks Like Governance

Financial services boards are paying more attention to AI than at any previous point in the technology's commercial history. The agenda time has increased. The briefing materials have thickened. The non-executive directors who could not confidently define a large language model two years ago can now discuss AI risk in terms that satisfy the audit committee's expectation of engagement. By any observable measure, AI governance at board level has improved.

By the more important measure — whether the board's oversight of AI is actually reducing the firm's exposure to AI-related risk — the improvement is considerably more modest. Most boards are asking more questions about AI. They are not yet asking the questions that produce useful answers.

The gap is not about competence or engagement. It is about the questions themselves. The questions most boards are asking are the right questions for understanding the firm's strategic position relative to peers, regulatory posture, and investment levels. They are the wrong questions for understanding whether the AI systems currently deployed are performing safely, whether the governance infrastructure built around them is functioning, and whether the firm would know about an AI-related problem before it became a public event. These are different questions, and they require different information to answer them.

The Three Questions Boards Are Asking

"Are we investing enough?" This question generates peer comparison data: what competitors are reporting in terms of AI investment, what industry surveys suggest about typical spend, where the firm sits in that distribution. The data is interesting and largely useless for governance purposes. AI investment figures reported externally are not comparable across firms — they capture different cost categories, include different proportions of infrastructure spend, and are often communicated with strategic positioning in mind rather than accounting precision. More fundamentally, investment levels tell a board nothing about deployment quality. The firm that has spent twice as much as its peers and deployed AI that performs poorly is not better governed than the firm that has spent half as much and deployed AI that performs well. The investment question is a strategy question. It is not a governance question.

"Are we compliant with the AI Act?" This question generates legal opinions. The legal opinions describe what steps have been taken toward compliance — which systems have been risk-classified, which documentation has been prepared, which assessments are underway. What the legal opinions do not typically address is whether the AI systems classified as compliant are continuing to behave in the ways that justified the classification, or whether the compliance assessments conducted at deployment remain valid as the systems operate in the real world. AI Act compliance is not a state that is achieved at deployment and maintained automatically. It is a condition that must be demonstrated continuously through ongoing monitoring, incident reporting, and periodic reassessment. The board that has received confirmation of initial compliance and treats this as ongoing compliance has misunderstood the regulatory framework it is being asked to govern.

"Are our competitors ahead of us?" This question generates competitive intelligence: public announcements from peer firms, conference presentations, press coverage of AI initiatives, analysis from sector consultants. It produces a picture of what competitors are saying about their AI ambitions, which differs considerably from what they are actually deploying. The observable AI landscape in financial services — the announcements, the pilots, the partnership deals — is not an accurate representation of the operational AI landscape, which is where the genuine competitive differentiation is accumulating. Boards oriented toward the observable landscape are governing against a reference point that does not accurately describe the competitive reality.

The Three Questions Boards Should Be Asking

"What are the error rates of AI systems currently in production, and are they within the parameters accepted at deployment?"

This question is more specific than anything currently on most board agendas, and its specificity is the point. It presupposes three things: that explicit error rate parameters were established before deployment; that monitoring infrastructure exists to measure actual error rates in production; and that someone has compared the actual rates against the accepted parameters and can report whether they remain aligned.

Asking the question is a test of all three presuppositions simultaneously. The management team that receives this question and returns a clear answer — here are the error rates, here are the parameters we accepted, here is the current status — has demonstrated the existence of a governance infrastructure that is genuinely operating. The management team that returns a reassuring but non-specific answer — our AI systems are performing well, we are satisfied with the quality — has demonstrated the absence of that infrastructure, which is itself the governance finding.

Most financial services firms deploying AI in production have not established explicit error rate parameters for those deployments. The systems were validated against test samples, found to perform acceptably, and deployed. The question of what error rate is acceptable in production — and what monitoring exists to detect when that rate is exceeded — was not asked before deployment and has not been answered since. This is a governance gap that is invisible to boards because it is a gap in the absence of information rather than a problem in information that has been reported.

"How are we detecting AI system performance degradation between validation events?"

AI systems degrade in production in ways that conventional software does not. The model that performed within validated parameters at deployment may, six months later, be encountering a distribution of inputs that differs from the testing distribution — producing outputs that are technically within the model's capability but increasingly misaligned with what the validation was designed to ensure. This degradation can be gradual, invisible in any individual output, and significant in its cumulative effect on the quality of decisions the firm is making on the basis of AI-generated outputs.

The monitoring infrastructure required to detect this degradation is not optional for a firm that intends to govern AI responsibly. It requires ongoing measurement of output quality against defined standards, automated alerting when quality metrics drift outside acceptable ranges, and a defined process for investigating and responding to detected degradation before it produces a material operational consequence.

Asking this question forces a concrete answer about what monitoring exists and what it has found. The management team that answers with reference to a specific monitoring framework, a defined alert threshold, and a record of alerts investigated is demonstrating operational governance. The management team that answers with reference to the vendor's support agreement is demonstrating something different.

"What would it take for an AI-related incident to reach us before it became a public event?"

This question is the governance equivalent of a fire drill. It tests whether the escalation pathways that the board presumably believes exist have actually been defined, communicated, and tested. The answer most boards receive, if they ask the question carefully enough to require a specific answer rather than a general reassurance, is that the escalation pathway has not been explicitly designed for AI incidents — that the existing technology incident response framework would apply, but that no one has tested whether that framework is calibrated to the failure modes of AI systems, which differ materially from the failure modes of conventional software.

The specific failure modes of AI systems that require tailored escalation pathways include: gradually degrading performance that produces no discrete incident but generates cumulative decision-making errors over time; statistically normal but individually consequential errors that occur within accepted error rate parameters; outputs that are technically within the model's specification but that interact with real-world conditions in ways the validation did not anticipate; and reputational events triggered not by system failure but by the public disclosure of AI use in contexts where clients or regulators expected human judgment.

None of these failure modes are well-served by an escalation framework designed for system outages. All of them are predictable in advance. The board that has not asked whether the firm's incident response framework is calibrated to these failure modes has not governed the risk.

The Non-Executive Director Preparation Problem

Effective AI oversight at board level requires more than engagement. It requires the ability to evaluate management's answers to specific technical questions — to distinguish a response that demonstrates genuine governance capability from a response that is reassuring, well-delivered, and substantively empty.

This distinction requires a level of AI literacy that most non-executive directors have not developed, and that the preparation programmes available to them do not consistently produce. The one-day AI board training, the briefing from the chief technology officer, the consultant's presentation on AI risk — these provide the vocabulary for AI governance conversations without providing the depth to conduct them effectively. A non-executive who cannot articulate the difference between model validation and model monitoring, or who does not know what a performance drift alert is and why it matters, is not positioned to evaluate whether management's monitoring framework is functioning or merely described.

This is not a criticism of non-executive directors, who are generalists asked to govern an increasingly specialised operational landscape. It is an observation about the preparation infrastructure that boards need to build — and that most have not. The firms whose boards have specifically developed AI evaluation capability — through targeted technical education, through access to independent technical expertise rather than solely management briefing, and through structured evaluation of management AI reporting against external benchmarks — are exercising materially more effective oversight than those whose AI governance relies entirely on management's self-reporting.

The audit committee is the natural home for this capability, because AI system reliability is an operational risk question with direct financial exposure implications. The audit committee that has developed the technical depth to evaluate AI monitoring reports, to probe incident response capability, and to assess whether the firm's AI governance infrastructure is proportionate to its AI deployment exposure is doing the governance work that the question of AI risk actually requires.

What Good Board-Level AI Reporting Looks Like

The board report that supports effective AI governance is not the report that most firms currently produce. The current report typically covers: AI investment levels and project status; regulatory compliance posture; competitive positioning; strategic AI initiatives underway or planned. These are strategy and compliance inputs, not governance inputs.

The governance inputs are different: performance data on AI systems currently in production, expressed against the parameters accepted at deployment. Monitoring alert data — how many alerts were generated, what triggered them, how they were investigated and resolved. Incident data — any AI-related operational failures, near-misses, or performance degradation events that occurred in the reporting period. Shadow AI detection data — any identified instances of AI tool use outside the approved framework. And a forward-looking assessment of material AI risks in the coming period, with the specific monitoring and mitigation measures in place.

This report does not exist in most financial services firms because the underlying monitoring infrastructure does not exist. Building the report requires building the infrastructure first — which is an investment in governance capability rather than AI capability, and one that most AI investment proposals do not include.

The firms that build this infrastructure are creating something more valuable than better board reporting. They are creating the operational capability to detect AI system failures before they become consequential — which reduces the probability of the high-visibility AI incident that no board wants to explain to regulators, clients, or journalists. The board that treats AI governance reporting as a compliance formality and the board that treats it as a genuine operational discipline are making different bets about whether the first major AI-related incident at their firm will be caught internally or discovered externally.

The latter discovery mode is, in the experience of firms that have been through it with other classes of operational risk, considerably more expensive.

The Independent Assurance Gap

Most financial services firms obtain independent assurance on their financial controls, their technology risk frameworks, and their operational resilience arrangements. The assurance function — whether internal audit, external auditor, or specialist third-party reviewer — provides the board with a perspective on governance that is independent of management's self-reporting, and that is specifically designed to identify gaps between documented governance and actual practice.

AI governance is not consistently subject to equivalent independent assurance. The board receives management reporting on AI activity, management assessment of AI compliance posture, and management assurance that AI governance frameworks are functioning as intended. The independent verification that would identify gaps between documented governance and operational practice — the equivalent function to what internal audit performs for financial controls — is largely absent.

This absence has a specific consequence: the board cannot know what it does not know about AI governance gaps, because the only source of information about those gaps is the management function that is also responsible for closing them. The AI policy that is formally documented but informally unenforced, the monitoring framework that is described in board papers but not fully operational in practice, the escalation pathway that is defined but untested — these gaps would be identified by an independent assurance function and are unlikely to be identified by management self-reporting.

Several large financial services firms have begun extending their internal audit scope to include AI governance specifically: assessing whether the monitoring infrastructure described in governance policies is actually operational, whether the documented escalation pathways have been tested, whether the error rate data being reported to senior management accurately reflects production AI system performance. The early findings of these extended audit programmes are, in most cases, not reassuring. The gap between documented AI governance and actual AI governance practice is larger than management reporting suggests — not because management is being misleading, but because the documentation has been produced faster than the operational infrastructure it describes.

The board that has not asked whether its AI governance is subject to independent assurance is making decisions about AI risk on the basis of information that has not been independently verified. This is an unusual position for a financial services board, accustomed as it is to requiring independent assurance on financial, operational, and technology risk. Extending that requirement to AI governance is not a new principle. It is the application of an existing principle to a new risk category.

The Question Nobody Is Asking

There is a fourth question that sits beneath all three of the questions boards should be asking, and that goes largely unasked: "Do we have the right governance framework for a system whose outputs are probabilistic by design?"

The entire edifice of financial services governance — board oversight, audit committee review, regulatory reporting, management accountability — was built for deterministic systems and deterministic processes. The payroll calculation is either right or wrong. The trade was either executed or not. The regulatory filing either meets the standard or it does not. Accountability structures, investigation processes, and remediation frameworks all assume that errors are discrete events with identifiable causes.

AI systems are different in kind. Their errors are statistical, not discrete. A model that produces incorrect outputs at a four percent rate is not malfunctioning. It is performing as designed, within a specified tolerance. The question of who is accountable for the specific output that falls within that four percent — the output that happened to be consequential — does not have a clean answer in governance frameworks built for deterministic systems. The answer requires a different kind of accountability structure: one that addresses the organisational decision to deploy a system with a specified error rate, the design of human oversight processes to manage the expected error volume, and the monitoring infrastructure to detect when actual errors exceed the expected rate.

This is a governance framework that most financial services boards have not yet built. The governance framework that financial services boards need for probabilistic AI systems exists in outline form in the regulatory guidance that is emerging — from the FCA's model risk management guidance, the ECB's supervisory expectations for AI in banks, and the EU AI Act's requirements for high-risk AI system governance. The common thread across these frameworks is an expectation that the organisation can demonstrate ongoing evidence of AI system performance within validated parameters, not merely evidence that the system was validated before deployment. The boards that have understood this expectation and built their governance reporting to meet it are ahead of both the regulatory curve and the operational risk curve simultaneously.

The organisations that have built this governance capability — specific error rate monitoring, defined alert thresholds, tested escalation pathways, and independent assurance on AI governance practice — report a secondary benefit that is less often discussed: internal confidence in AI deployment increases. Teams that know their AI systems are monitored, that performance degradation will be detected before it produces material operational consequences, and that the escalation pathway for AI incidents has been designed and tested are more willing to deploy AI in consequential contexts. The governance infrastructure is not just a risk management tool. It is an enabler of more ambitious and more operationally embedded AI deployment — which is where the strategic value lies.

The AI systems are in production. The framework will need to exist before the first material incident makes its absence visible. The boards that build it proactively will be better positioned than those that build it in response.

Related Articles

[]