What Happens When Your AI Model Goes Down?

Most firms have a plan for a data centre outage.

They have plans for cyber incidents, supplier disruption, payment failures, cloud downtime, office closures and system recovery.

But how many have a tested plan for an AI model outage?

That question is becoming more important as AI moves from experimentation into the operating fabric of organisations. In many firms, AI is no longer just being used to draft emails, summarise documents or support research. It is being considered, piloted or deployed in claims handling, compliance monitoring, customer support, fraud detection, legal review, cyber triage, software engineering and operational decision-making.

At that point, AI stops being a productivity tool.

It becomes part of the operating model.

And once something becomes part of the operating model, its failure is not merely a technology inconvenience. It is a business continuity, operational resilience and governance issue.

The uncomfortable question for boards and risk leaders is simple:

If a critical AI model goes down for seven hours, what happens to the process it now supports?

AI dependency is becoming an operational risk

Recent events have made this question harder to ignore.

Our RiskBusiness Newsflashes highlighted that in March 2026, Anthropic’s Claude chatbot and related consumer-facing applications experienced disruption, with nearly 2,000 users reporting service issues at the peak and the company pointing to “unprecedented demand” for Claude over the previous week. The report noted that businesses integrated with Claude’s models were unaffected, which is an important distinction. But the event still illustrates a wider point: AI services can and do experience disruption.

Later that month, DeepSeek suffered what was reported as its biggest outage since launch, with the disruption lasting more than seven hours and multiple updates required before performance issues were resolved. The cause was not clear at the time of reporting.

Then, in April 2026, OpenAI confirmed a major outage affecting ChatGPT, Codex and the API Platform, with impacted users unable to access the services while the issue was investigated.

Taken individually, these incidents may look like ordinary service disruptions. Systems go down. Platforms experience capacity issues. Demand surges. Providers investigate and restore service.

But the risk changes when firms begin building operational dependency around those systems.

If AI is assisting a team, an outage is inconvenient.

If AI is embedded inside a critical workflow, an outage can create a business interruption.

If AI has replaced capacity, removed manual knowledge, or become the primary route through which decisions are triaged, escalated or processed, the outage becomes something more serious.

It creates an operational vacuum.

The business continuity question is changing

Imagine an insurance firm that has materially reduced its claims assessment team because an AI-enabled process can now triage, summarise and recommend decisions at speed.

On a normal day, that may look efficient.

But what happens if the model is unavailable for half a day?

Can the firm simply bring back the old operating model at short notice? Are the people still there? Is the process documented? Does the remaining team still have the knowledge, capacity and authority to operate manually? Can claims be queued without creating customer harm? Can decisions be paused without regulatory consequences? Can the firm explain to customers, regulators and internal stakeholders what has happened?

This is where the AI efficiency narrative often becomes too narrow.

Many firms are asking: how much time can AI save?

Fewer are asking: what capacity, knowledge and control do we still need when AI is unavailable?

That is the resilience question.

The issue is not whether AI should be used. It clearly has value. The issue is whether firms are treating AI dependency with the same seriousness they would apply to any other critical system, outsourced service or operational control.

Model availability is only one part of the risk

The outage scenario is the clearest and easiest to understand, but it is not the only dependency risk.

There is also the question of vendor reliability, internal control, change management and transparency.

Anthropic’s accidental release of part of the internal source code for Claude is a useful example. According to the report, an internal-use file was mistakenly included in a software update, pointing to an archive containing nearly 2,000 files and 500,000 lines of code, which were then copied to GitHub. Anthropic said no sensitive customer data or credentials were exposed and described the incident as a release packaging issue caused by human error, not a security breach.

That nuance matters. This was not the same as a customer data breach.

But from a risk and governance perspective, it still raises an important vendor due diligence question.

Would you trust a mission-critical process to an AI provider without understanding its resilience, internal controls, incident history, release management and product roadmap risk?

That question is not aimed at one provider alone. It applies across the AI ecosystem.

Many organisations are moving quickly to adopt AI tools developed by comparatively recently established firms that are themselves operating in fast-moving, highly competitive and technically uncertain environments. New models are released. Capabilities change. Products are updated. Usage surges. Security questions emerge. Regulatory expectations evolve.

For firms using those tools in low-risk settings, that may be manageable.

For firms embedding them into important or core business services, it becomes a governance issue.

AI agents raise the stakes further

The risk becomes even sharper when AI is not just generating answers but taking action.

One internal example refers to OpenClaw, an open-source AI agent that reportedly spammed a user, his wife and random contacts with more than 500 messages after being given access to iMessage. The report describes concerns about AI agents with access to private data, external communication channels and untrusted content, with one security researcher referring to that combination as the “lethal trifecta.”

This is a different type of failure from a model outage.

It is not simply that the system became unavailable. It is that the system acted in ways the user did not intend.

For organisations, that distinction matters.

In another recent example, a Claude-based model used to manage car rentals across the United States detected certain data anomalies and decided to resolve the issue by itself. When it decided it was unable to resolve the data issue, its reaction was to delete all data, not just in production databases, but in backups as well. When questioned, the model responded that it had attempted to resolve the issue and that the best resolution was simply to delete all data…..

As AI moves towards agents that can communicate, transact, execute tasks, access systems and trigger workflows, the risk profile expands. The question is no longer only “what happens when the model goes down?” It is also “what happens when the model keeps operating, but behaves incorrectly?”

Both questions belong in the same resilience conversation.

Availability, control, escalation, containment, override and recovery all need to be thought about together.

Most firms are not yet planning for AI failure seriously enough

The problem is not that organisations lack business continuity plans.

The problem is that many of those plans were not designed for AI dependency.

Traditional continuity planning tends to focus on known categories of disruption: premises, people, technology, third parties, cyber, utilities, data centres and critical suppliers. AI cuts across several of these categories at once.

It may be delivered by a third party.

It may sit inside a cloud environment.

It may rely on data pipelines.

It may support customer-facing processes.

It may influence regulated decisions.

It may be used by staff in ways that are not fully visible to central governance teams.

It may change without the business fully understanding the operational impact of that change.

This makes AI dependency difficult to manage through a single policy document or ownership model.

The legal team may focus on acceptable use and compliance. Technology may focus on integration. Procurement may review the vendor. Risk may maintain the register. Operational resilience may map important business services. The business may own the process outcome.

But when the model fails, who acts?

Who decides whether to suspend the process, switch to manual handling, notify customers, escalate to regulators, activate the incident team, or accept degraded service?

If that answer is not clear before the outage, it will not become clear during one.

The questions boards should be asking now

For firms embedding AI into material processes, the conversation needs to move beyond enthusiasm, productivity and experimentation.

A more mature AI resilience discussion should ask:

Which business processes now depend on AI?
Which of those are customer-critical, regulatory-critical or revenue-critical?
What is the maximum tolerable period of AI unavailability?
What manual fallback exists?
Who is trained and authorised to operate that fallback?
What decisions can be paused, queued or downgraded?
What decisions must continue immediately?
What third-party dependencies sit behind the model?
What contractual commitments exist around uptime, incident notification and support?
How would the firm validate outputs once the model comes back online?
How would the incident be reported to senior management and the board?

These are not theoretical questions. They are the same kind of practical resilience questions firms already ask of other critical systems and services.

AI should be no different.

In fact, because AI is often less transparent, more dynamic and more tightly linked to decision-making, the case for stronger resilience planning may be even greater.

AI adoption is moving faster than AI continuity planning

There is nothing inherently wrong with using AI in important business processes.

The greater risk is using it without understanding the dependency being created.

AI can improve speed, consistency and efficiency. It can help teams handle complexity, reduce manual work and identify patterns that would otherwise be missed.

But those benefits do not remove the need for resilience. They increase it.

The more useful AI becomes, the more operationally significant its failure becomes.

That is the shift boards and risk leaders need to recognise. AI failure is not just a technology incident. It may become a customer outcome issue, a regulatory issue, a conduct issue, a supplier issue, a cyber issue, a data issue and a business continuity issue.

The firms that manage this well will not be the ones that slow AI adoption to a halt.

They will be the ones who ask better questions before dependency becomes invisible.

Because once AI is embedded into the way work gets done, the governance question is no longer simply:

Can this model perform the task?

It is also:

What happens when it cannot?

Stay up to date with the latest stories from the world of governance, risk, audit and compliance >>>

What Happens When Your AI Model Goes Down?

AI dependency is becoming an operational risk

The business continuity question is changing

Model availability is only one part of the risk

AI agents raise the stakes further

Most firms are not yet planning for AI failure seriously enough

The questions boards should be asking now

AI adoption is moving faster than AI continuity planning

Recent Posts

What Happens When Your AI Model Goes Down?

Non-Financial Operational Risk Is Causal by Nature

Key Business Concerns for 2026

The Audit Lag Problem: Why Risk Functions Are Reacting to Yesterday’s Threat Landscape

Control Fatigue: When Too Much Governance Becomes a Risk in Itself

Board Oversight in the Age of Algorithmic Decision-Making: When Accountability Has No Author

Subscribe to our mailing list

About Us

Solutions

Get in Touch