The Comfortable Fallacy of "Human in the Loop"

There is a phrase that has become the security blanket of every bank board presentation and every regulatory submission concerning artificial intelligence in finance: human in the loop. It sounds reassuring. It implies oversight, accountability, the steady hand of experienced judgement guiding the machine. We have heard it invoked in Brussels, in Basel, in the boardrooms of Canary Wharf and Wall Street alike. It is, in our view, becoming dangerously misleading, not because the principle is wrong but because the phrase has decoupled from any operational meaning, and that decoupling is now the most important unacknowledged risk in how the financial sector governs its own adoption of automated decision-making.

The difficulty is not one of principle but of practice. Human oversight is, in the abstract, the correct response to the deployment of consequential automated systems. The question is what "human in the loop" means when a credit scoring system processes hundreds of thousands of applications per day, when a trading algorithm executes in microseconds, when a fraud detection system evaluates every card transaction in the payment network in real time. The honest answer, which almost no institution is willing to articulate in its regulatory submissions, is that "human in the loop" at this scale means something categorically different from "human in the loop" at the scale at which the phrase was originally meaningful: the loan officer reviewing an individual credit file, the compliance officer reading a suspicious transaction report, the risk manager studying a position report. The humans are still there. They are no longer reviewing individual decisions. They are reviewing processes, outputs, and exceptions, which is an entirely different cognitive task with entirely different failure modes.

The European Union's AI Act, which entered into force in August 2024, requires human oversight for high-risk AI systems [1]. Financial services credit scoring, certain fraud detection applications, and AI-assisted investment advice all fall within the high-risk category. Article 14 of the Act specifies that high-risk AI systems must be designed and developed in such a way that natural persons can "duly monitor" their operation and intervene where necessary [1]. This is a genuine requirement, seriously intended, and we do not dispute its intent. What we do dispute is the interpretation that has already emerged in implementation: "human in the loop" rendered as exception handling, audit sampling, and periodic model performance review, with the human's role defined entirely by what the system flags for human attention. The human is not in the loop. The human is at the end of the loop, reviewing whatever the system decided merited review, in a role whose scope and depth is defined by the system's own logic. That is not oversight. That is ratification with extra steps.

JPMorgan's COiN (Contract Intelligence) programme, deployed to review commercial credit agreements, is one of the more honestly described examples. The system can review documents in seconds that previously required 360,000 hours of annual legal review [2]. The human lawyer who "oversees" the COiN review does not read the documents that COiN has processed. She reviews COiN's outputs, exceptions, and flagged items. The cognitive task she performs is entirely different from the task she performed before the system existed, and the validity of the oversight claim depends entirely on the quality of COiN's own judgement about what to flag and what to pass. If the system has a systematic blind spot, the human reviewer who sees only what the system surfaces cannot detect it. The human is in the loop only insofar as the loop includes her; the boundary of that inclusion is set by the machine.

The human is not in the loop. The human is at the end of the loop, reviewing whatever the system decided merited review. That is not oversight. That is ratification with extra steps.

What the Regulation Actually Requires

The regulatory frameworks that govern AI in financial services are, in their technical requirements, more sophisticated than their critics give them credit for. The European Banking Authority's guidelines on internal governance for AI [3] and the Basel Committee's principles for operational resilience [4] both contemplate the possibility that human oversight of automated systems requires fundamentally different organisational structures from human oversight of human decision-making. The problem is the gap between these technical requirements and the practical implementation they receive. Banks routinely satisfy the letter of oversight requirements by appointing a "model risk manager" whose remit includes AI systems, by establishing an AI ethics committee that meets quarterly, and by implementing audit logging of system decisions. None of these mechanisms, individually or in combination, constitutes the kind of ongoing, operationally embedded oversight that would detect a systematic bias in a credit scoring model before that bias propagates across hundreds of thousands of decisions.

The Federal Reserve's SR 11-7 guidance on model risk management, first issued in 2011 and updated in 2023 to address machine-learning models [5], is the most substantive attempt by a major regulator to specify what meaningful oversight actually requires: independent model validation, ongoing performance monitoring against defined benchmarks, clear escalation procedures for model degradation, and explicit limits on model autonomy that trigger human review at defined thresholds. These are genuine requirements, and institutions that meet them are closer to actual oversight than institutions that do not. But SR 11-7 was written for models that a trained quantitative analyst can read, understand, and challenge. The guidance has not yet been updated for the generation of machine-learning models whose internal representations are not interpretable even to their developers, and whose behaviour in edge cases cannot be reliably predicted from their training performance. The gap between the regulatory standard and the actual architecture of deployed systems is widening, and the phrase "human in the loop" is filling that gap without closing it.

A Clearer Framework

We would propose that the financial services industry adopt, and regulators enforce, a more precise taxonomy than the binary human/automated distinction that currently dominates the conversation. There is a meaningful difference between decisions where a human reviews every case before action (full review), decisions where a human reviews a statistically representative sample of cases and has authority to halt the system if the sample reveals problems (statistical oversight), decisions where a human reviews flagged exceptions and the system operates autonomously on all other cases (exception review), and decisions where the system operates fully autonomously with retrospective human audit only (retrospective audit). These are categorically different levels of oversight. They have different failure mode profiles, different regulatory implications, and different accountability structures. Calling all of them "human in the loop" obscures the distinctions that matter most for understanding what can go wrong and who is responsible when it does.

The financial sector will not resolve the practical tension between the scale of automated decision-making and the capacity for meaningful human review by invoking a phrase that papers over the difficulty. The regulatory frameworks that govern this space are moving toward greater specificity, as they should. What is required of the industry is an equivalent specificity in its own governance: an honest account of what human oversight of a given system actually consists of, what it can and cannot detect, and what the residual risks are that fall outside the scope of any oversight mechanism that is consistent with operating at scale. The alternative is a regime of comfortable fictions, which will endure precisely until the first serious failure of a consequential automated system whose operators can credibly claim they had a human in the loop, without any of the accountability that the phrase is understood to imply.

References

European Parliament and Council. "Regulation (EU) 2024/1689 on Artificial Intelligence (AI Act), Article 14: Human Oversight." Official Journal of the European Union. 12 July 2024. eur-lex.europa.eu
JPMorgan Chase. "How JPMorgan Chase uses AI." JPMorgan Chase Technology. 2024. jpmorganchase.com
European Banking Authority. "Guidelines on Internal Governance (EBA/GL/2021/05)." European Banking Authority. 2 July 2021. eba.europa.eu
Basel Committee on Banking Supervision. "Principles for Operational Resilience." Bank for International Settlements. March 2021. bis.org
Federal Reserve / OCC. "SR 11-7: Guidance on Model Risk Management." Federal Reserve Board of Governors. April 2011 (updated 2023). federalreserve.gov

The Comfortable Fallacy of “Human in the Loop”

What the Regulation Actually Requires

A Clearer Framework

Support

Legal & Privacy

Services