On the twenty-ninth of January, 2025, the Bank for International Settlements published a document whose thirty-six pages commanded rather less public attention than the subject warranted. Titled "Governance of AI Adoption in Central Banks," it was the outcome of deliberations among member central banks of the Americas and proposed, with characteristic BIS diffidence, a framework of ten actions for the responsible deployment of artificial intelligence within the institutions responsible for price stability and financial order.1 The document was precise, measured, and procedural. It was also, read carefully, an admission of something that practitioners in the field have understood for some years but that official communiqués rarely acknowledge with such candour: machine learning models have arrived at the centre of monetary policy, and no one has yet worked out who bears responsibility when they are wrong.
This is not a small question. It is, in one sense, the central question of our age.
The Weight of Discretion
To appreciate the novelty of the present situation, one must recall what monetary policy was, not so long ago, in its essential character. When Walter Bagehot published "Lombard Street" in 1873, he described the Bank of England's management of the money market as an exercise of continuous, personal, and ultimately unjustifiable judgment.2 The Governor and his Court felt the state of credit the way a physician might feel a pulse; they could not reduce their diagnoses to a formula, and they did not pretend otherwise. The Bank Rate moved when the men in Threadneedle Street decided that it must, and accountability, such as it was, resided entirely in those men and in the institution they embodied.
The twentieth century institutionalised that discretion in various ways, producing committees, voting records, published minutes, and eventually the elaborate apparatus of inflation targeting that defines contemporary central banking. The Federal Open Market Committee's Summary of Economic Projections, published quarterly since 2007, represents the apotheosis of this tradition: a formal, named, individually attributed statement of what the members of the world's most powerful monetary institution believe about the future of prices, growth, and interest rates. Each participant is on record. Each projection can be compared against the outcome. When the record is poor, the record is visible.
What machine learning introduces is something categorically different from this tradition, and the difference is not merely technical.
What the Machines Now Do
The scope of algorithmic involvement in central bank operations has expanded with considerable speed over the past five years. At the Federal Reserve Bank of St. Louis, researchers have constructed large language models capable of generating conditional inflation forecasts that can be compared directly against the Survey of Professional Forecasters.3 At the ECB, staff routinely use machine learning to nowcast euro area inflation by web-scraping price data and applying these models to account for possible non-linearities that standard linear specifications cannot capture.4 The Bank of England, in its 2024 survey of artificial intelligence in UK financial services, found that seventy-five per cent of firms were already using some form of AI in their operations, with the Bank's own research division applying what it terms a fusion of machine learning with economic theory to identify non-linear drivers of consumer price dynamics that linear models, calibrated on data from 1997 to 2024, systematically miss.5
The performance record is, in certain respects, impressive. An International Monetary Fund working paper published in September 2024, drawing on a broad range of machine learning approaches applied to inflation forecasting, found that penalised regression models, and in particular the LASSO specification, systematically outperformed the benchmark models during 2022 and 2023.6 More strikingly, the IMF authors observed that the forecasting errors of the LASSO model appeared smaller than those by professional forecasters in 2022-23, the two years in which the professional consensus, including the committees of the world's major central banks, was most spectacularly wrong. It is a finding that deserves to be read slowly.
As the preceding chart illustrates, the divergence between the FOMC's successive median projections and the actual path of inflation between 2021 and 2022 was not a matter of marginal error; it was a structural failure of the analytical framework the committee employed. The San Francisco Federal Reserve, in a November 2024 assessment of FOMC forecasting performance, found that the average forecast error in the 2020-24 period was approximately three times larger than in the pre-pandemic period, a deterioration affecting households, professional forecasters, and policymakers alike.7 The December 2020 Summary of Economic Projections placed the median expectation for 2021 PCE inflation at 1.8 per cent; the outturn was 5.5 per cent, a miss of nearly 370 basis points for a projection made only twelve months prior.
The Failure, and Its Particular Character
It would be convenient, from the perspective of those now advocating greater algorithmic involvement in policy, to conclude from this record that the machines would have done better. The conclusion is partially true and substantially misleading.
The IMF's LASSO results, along with parallel work by the Central Bank of Brazil and others, demonstrate that certain machine learning architectures are capable of adapting more rapidly to shifting data relationships than classical vector autoregression or dynamic factor models. The capacity of neural networks to capture non-linear interactions between variables, and the agility of penalised regression in identifying which among a large set of candidate predictors carry genuine forecasting signal, confer genuine advantages in periods of structural change.6
The question is not "would the algorithm have made fewer errors?" but rather: given that errors are inevitable, who is accountable for them?
But it bears emphasis that the machine learning literature on inflation forecasting is, in its most honest forms, careful about the limits of its claims. Models trained on pre-pandemic data had, by definition, no experience of pandemic-era demand and supply dislocations. The extraordinary fiscal transfers of 2020 and 2021, the supply chain distortions that followed, and the interaction of pent-up demand with constrained productive capacity represented a combination of shocks for which no historical training set could have prepared a statistical model, however sophisticated. The IMF paper's LASSO model performed well in 2022 and 2023 partly because, by that point, the data from 2020 and 2021 was available as inputs; the prospective challenge, to forecast the pandemic's inflationary consequences before they materialised, would have defeated the algorithm as thoroughly as it defeated the committee.
This is not a counsel of despair. It is a counsel of precision. The appropriate question is not "would the algorithm have made fewer errors?" but rather "given that errors are inevitable, who is accountable for them, and how does that accountability function when the decision-maker is a programme rather than a person?"
The Accountability Gap
Constitutional arrangements differ, but the general architecture of central bank accountability in advanced economies follows a recognisable pattern. The legislature delegates monetary authority to an independent institution. The institution is governed by a board or committee of appointed officials who are individually and collectively responsible for the decisions they make. When those decisions result in outcomes diverging substantially from the institution's mandate, the officials are subject to scrutiny: legislative hearings, published correspondence, the reputational discipline of the academic and professional community, and, in extremis, the possibility of removal.
When Jerome Powell testified before the Senate Banking Committee in March 2022, explaining why the Federal Reserve had described inflation as "transitory" for the better part of a year while prices rose at rates unseen since the Carter administration, he was performing a function that the constitutional order requires, and that the constitutional order cannot perform through an algorithm. The committee members' questions were often ill-informed and occasionally absurd; the process was nonetheless essential. Accountability is not merely about accurate forecasting. It is about the capacity of democratic societies to interrogate the exercise of delegated power.
The BIS governance framework, to its credit, identifies this problem with some clarity, calling for clear accountability as one of its central principles and emphasising the need for traceability from data to decision.1 The ECB, in a speech delivered by a senior board member in July 2024, was similarly explicit about the institution's intention to keep human judgment at the forefront of policy decisions even as AI tools proliferate within its analytical infrastructure.4 The Bank of England's supervisory publication FS2/23, issued in October 2023, proposed that AI models used in capital adequacy calculations should be subject to the same validation standards as the internal ratings-based models mandated by Basel III, a requirement that implicitly acknowledges the accountability problem even as it addresses only one of its dimensions.8
What these frameworks have in common is a structural choice: human beings retain formal decision authority, and the machine's output is treated as an input to that human decision rather than as the decision itself. The FOMC votes. The Governing Council votes. The governor, in the Japanese case, retains the authority to break a deadlock. The algorithm advises; the committee decides; the committee is accountable.
This arrangement is sensible. It is also under pressure from a direction that the governance frameworks have not yet adequately addressed.
The figure above places these considerations in quantitative context. The LASSO model's mean absolute forecasting error in 2022-23 appears materially lower than that of the professional forecaster consensus during the same period, itself considerably smaller than the FOMC's own error rate over the full 2020-24 window. If this pattern were to persist, and there are structural reasons related to the scalability of data ingestion and the absence of institutional inertia that suggest it might, the political economy of central bank decision-making would face a question it has not confronted before: at what point does the superior forecasting performance of an algorithm make human decision authority, and human accountability, simply indefensible on grounds of outcomes?
What the Law Does Not Yet Know
The legal architecture of central bank accountability was constructed entirely without reference to algorithmic agents, and it shows. The Federal Reserve Act specifies that the members of the FOMC are responsible for open market operations; it is silent on the question of what happens when a member votes in accordance with a machine learning recommendation, in circumstances where the recommendation was mechanically derived from a process that no member fully understands.
A detailed treatment of this problem appeared in the Journal of Central Banking Law and Institutions, which described what its authors termed the AI paradox in central banking: institutions that deploy AI gain analytical power but simultaneously dilute the traceability that democratic accountability requires.9 The paper observed that civil liability principles, including organisational fault and potential product liability on the part of the model developer, apply in theory but that the field remains one where legal norms are substantially underdeveloped. This is, to put the matter charitably, an understatement of the problem. No court has yet adjudicated a claim arising from a central bank's algorithmic forecast. No legislature has yet amended its central bank statute to address the governance of machine-assisted policy.
The analogy that presents itself is to the episode of the Bank of England's gold reserve management in the years before the First World War. The Bank operated, then as now, under a framework designed for an earlier era of financial organisation, deploying instruments that had been devised for a system of bilateral credit rather than the increasingly securitised and international capital market that had grown up around it. The Bank was not negligent; it was operating in good faith within a framework that had not kept pace with the evolution of the system it was meant to govern. The result, as Liaquat Ahamed documented in his account of the interwar central banking disasters, was a series of decisions that were technically defensible under the prevailing framework and catastrophic in their consequences.10
A Counter-Argument Worth Taking Seriously
There is a case to be made that algorithmic decision-making is not less accountable than human committee deliberation but more so; and it is a case that deserves engagement rather than dismissal.
The objection runs as follows. A machine learning model, unlike a committee member, produces a recommendation that is in principle fully auditable. Every parameter, every training observation, every weighting can be examined. The ECB's compliance framework, with its emphasis on traceability from data to decision, reflects precisely this aspiration. A human governor, by contrast, may change his mind for reasons he cannot fully articulate, may be influenced by information not part of the formal record, and may make decisions under the influence of cognitive biases, career incentives, and institutional pressures that are neither documented nor visible. The black box complaint about machine learning can be directed with equal force, and considerably more empirical support, against the human committee.
This is a serious argument. It is also, ultimately, an argument about instrument properties rather than about accountability structures. Even if a machine learning model were fully auditable, which in the case of complex deep learning architectures it is not, auditability is not the same as accountability. Accountability requires not merely that a decision can be traced but that someone can be held responsible for it, in the sense of bearing consequences, answering to legitimate authority, and being subject to removal or sanction. A model cannot be held responsible; it can only be adjusted or decommissioned. The accountability for the model's deployment, and for the weight given to its recommendations, reverts inevitably to the human beings who chose to build it, deploy it, and follow it.
What to Watch
The BIS governance framework of January 2025 is not the end of this conversation but its opening move. The European Union's AI Act, as it extends its reach into financial services applications, will impose mandatory explainability requirements on AI systems used in credit decisions; whether analogous requirements will apply to the internal analytical tools of central banks, whose formal decisions remain in the hands of human committees, is a question that has not yet been definitively resolved.11 The Federal Reserve's draft supervisory guidance, circulated in early 2026, requiring banks over $100 billion in assets to maintain a centralised model inventory and report production AI systems quarterly, represents a precedent for the kind of transparency that, applied to the Fed's own internal modelling practices, would substantially change the information available to Congressional oversight.12
The Bank of Japan's experience is instructive as a cautionary tale about the limits of any framework, human or algorithmic, when confronted with genuinely novel conditions. The yield curve control policy, introduced in 2016 and subjected to successive forced modifications between 2022 and its effective abandonment in March 2024, represented precisely the kind of rule-based framework that its designers imagined would be transparent and accountable.13 The policy's rigidity, however, produced market distortions that accumulated until the BOJ was compelled to act against its stated intentions, widening the fluctuation band five times before Governor Ueda acknowledged that the risk of being forced to abandon the policy against the bank's will was, in his words, not zero. The lesson is not that rules are bad; it is that any framework, however well designed, can be overwhelmed by the complexity of the system it attempts to govern. Machine learning adds a new layer of complexity to that already complex system. It does not resolve it.
The Last Human Decision
Walter Bagehot, in a passage that students of monetary policy have been quoting for a century and a half, described the ideal central banker as one who should sit still and make no movement that is not necessary. The aphorism captures something essential: the virtue of restraint in the exercise of power, the wisdom of not mistaking activity for judgment. It is worth considering whether the present enthusiasm for algorithmic augmentation of monetary policy reflects the opposite tendency, a desire to substitute the appearance of rigour, quantified and optimised, for the more difficult and more valuable quality of wisdom that Bagehot had in mind.
Machine learning will improve central bank forecasting. It already has. It will also, in failing, fail differently from human committees, in ways that existing accountability frameworks are not designed to handle. The last human decision in monetary policy will not be the decision to set the rate; that decision may already be more machine-assisted than the published minutes suggest. The last human decision will be the decision to accept responsibility, before the legislature and before the public, for whatever the algorithm recommended and whatever consequences followed from following it.
That decision has no algorithmic solution. It has only the old-fashioned kind.