The SOC Cannot Be Retrofitted with AI. It Has to Be Redesigned.

The SOC Cannot Be Retrofitted with AI. It Has to Be Redesigned.

Part four of a ten-part series on the operational implications of AI-accelerated attacks.

Four structural problems with bolt-on AI — all consequences of the same architectural choice.

The problem the SOC is being redesigned for

Two developments across late 2025 and early 2026 should set the terms of this conversation before any architectural argument begins.

The first was Gambit Security’s report on an AI-assisted campaign against Mexican government infrastructure. Gambit described a single operator using commercial AI platforms as operational tooling across nine Mexican government organisations, with the campaign affecting around 195 million identities and detailed tax records. The important point is not that the AI was magic. It was that ordinary attacker work — reconnaissance, command generation, exploit support, scripting, data processing and lateral movement — could be scaled by one operator against the kinds of gaps most real environments already have. Mandiant’s own M-Trends 2026 commentary cautions that 2025 should not yet be read as the year in which breaches resulted directly from AI; the Gambit campaign is better understood as evidence of how AI compresses the work, not yet as proof that AI is the cause of major incidents.

The second was Anthropic’s announcement of Project Glasswing and Claude Mythos Preview, a gated frontier model that Anthropic says has already identified thousands of zero-day vulnerabilities across critical infrastructure, including vulnerabilities in every major operating system and every major browser. Anthropic’s Frontier Red Team also reports that Mythos Preview can identify and develop many related exploits autonomously in controlled testing. That points to the next phase of the problem: AI does not only compress the time required to exploit what defenders already know is broken. It also increases the probability that exploitable flaws will be found before defenders know they exist.

These are two different problems and they are converging. Gambit-style attacks compress the time required to work through known attack surface. Mythos-class capability expands the attack surface to include vulnerabilities defenders may not yet know about. The current generation of attacks compresses the window to fix what is already known. The next generation may find exploitable flaws before defenders know they need to patch them.

The defensive consequences track. IOC- and signature-only detection loses durability when AI-assisted attackers can generate, adapt or discard tooling per engagement. That does not make IOCs useless: they still matter for scoping, containment, retrospective hunting and intelligence sharing. But the signal that survives longer is behaviour — TTPs, causal chains and deviations from normal execution. Lateral movement still looks like lateral movement. Credential abuse still looks like credential abuse. Exfiltration still looks like exfiltration.

This is the problem the SOC is being redesigned for. Not simply more alerts to investigate. Not simply automating analysts. The redesign is for a threat profile where attacks move faster than human reasoning can comfortably catch up, where each attack can be bespoke enough to weaken pre-built signatures, and where the most durable defensive signal is what the attacker does once they are in. That requires AI to reason at the speed and depth of the attack, on data structured to support that reasoning. Many current AI-in-the-SOC products are still built primarily for the alert queue.

Reasoning from the wrong end of the book

The current generation of AI-in-the-SOC products is structured around an assumption that most of the modernisation spend in the category depends on being correct: that the analyst’s job, whether performed by a human or by an AI agent, begins when the alert lands.

The alert lands in the queue. The investigator picks it up. The AI assists with triage, enrichment, recommendation and sometimes containment. The framing is intuitive because it matches the workflow analysts have used for two decades. It is working from the wrong end of the problem.

The alert is not the start of the story. It is the end of one. By the time the alert exists, the events that produced it have already happened. They live upstream in raw telemetry, process trees, identity authentication flows, network sessions, package execution traces, API calls, cloud control-plane activity and endpoint state. The causal chain runs from those events through detection logic to the alert, not the other way around. The investigator who picks up the alert is being asked to reconstruct chapters one through nineteen of the book from chapter twenty, working backwards through indexes optimised for storage and search rather than reasoning.

Bolt-on AI reasons backwards from the alert. A redesigned SOC reasons forward on the event stream, with state preserved at decision time.

To be fair to alerts: a good correlation rule is itself a reasoning artefact. It has already collapsed many events into a structured finding, and a well-tuned rule represents real detection engineering work. The problem is that the bounds of that finding are fixed at write time. Anything an investigator needs to know outside the rule’s original window has to be re-fetched, re-tokenised and re-reasoned about. The alert is not a lossy thumbnail. It is a bounded one — and investigation almost always needs to break those bounds.

For a human analyst, this has always been the job. Investigators learn to be good at it. They build mental models of the environment, develop the judgement to piece together a coherent narrative from fragmentary evidence, and know when the data they have is not the data they need. The work is hard, but it is what experienced analysts do.

The complication is that the model does not transfer cleanly to an AI investigator. The reasons it does not transfer are structural, not incidental.

Four structural problems with bolt-on AI

The first is data resolution at investigation time. Enterprises already ingest enough telemetry to cover most MITRE ATT&CK techniques in the abstract. CardinalOps’ Fifth Annual State of SIEM Detection Risk Report (2025) found roughly 90% potential MITRE ATT&CK coverage from ingested data, against only 21% actual detection coverage in the production sample, with 13% of existing detection rules broken and never firing. The AI’s problem is different again. When it queries backwards from an alert, the raw events at the resolution it needs — the unaggregated process tree, the unsampled DNS stream, the full network session rather than its five-tuple summary, endpoint telemetry at full fidelity rather than a downsampled subset — may have been aggregated, summarised, normalised, tiered or moved to colder storage. The view is not necessarily the full causal chain. It is whatever fraction survived the storage economics, plus the alert itself.

The data is there. The detections are not. CardinalOps 2025: 90% potential coverage, 21% actual detection coverage, 13% of rules broken.

The second is latency. Each query an AI investigator runs against an indexed store takes time. Sometimes it is seconds. Sometimes longer. A non-trivial investigation can require dozens of lookbacks, joins, pivots, enrichments and timeline reconstructions before the AI has enough context to reason well. Meanwhile, attacker timelines are compressing. Mandiant’s M-Trends 2026 reports that the median time between initial access and handoff to a secondary threat group fell to twenty-two seconds in 2025, down from over eight hours in 2022. CrowdStrike’s 2026 Global Threat Report puts the average eCrime breakout time at twenty-nine minutes, with the fastest observed breakout at twenty-seven seconds; in one intrusion, data exfiltration began within four minutes of initial access. These numbers should not be read as proof that every acceleration is AI-driven. They should be read as evidence that the defender’s decision window is shrinking.

The clocks the defender does not share. Defender pipeline measured in tens of minutes; attacker breakout averages 29 minutes, with handoffs and worst-case events in seconds.

The third is token cost. An AI investigator that repeatedly queries and re-reads the same underlying data to rebuild context at each step pays for that work in tokens. For a small case the cost is trivial. For a larger case it moves from a few pence to several pounds per investigation. At production SOC scale — realistic alert volumes, long-running incidents, model upgrades, retrieval and caching infrastructure — the economics become material. Five-figure monthly inference bills are not difficult to model for a busy enterprise SOC. The issue is not the cost of any one investigation. The issue is an architecture that forces the AI to rediscover context every time.

The fourth, and the one most often missed, is decision context. When an event happens on the stream, the system can know what immediately preceded it: the process that spawned it, the identity that authenticated, the network session it came over, the patch level of the host, the asset’s classification and the recent behavioural pattern around the user or workload. At that moment, these are not expensive query results. They are state. Once the event is in an indexed store, that state has to be reconstructed from disparate logs, joined across schemas and reasoned about as if the AI were seeing it for the first time. Reasoning quality degrades not because the model is weak, but because the substrate has dropped the context that would have made the reasoning straightforward.

What modern SIEMs already do, and why it does not close the gap

It is worth being fair to modern SIEM and security analytics platforms. Microsoft Sentinel has UEBA. Elastic has anomaly detection. Splunk has UBA. These are real capabilities, and the engineering behind them matters. Behavioural baselines, statistical anomaly detection, peer-group analysis and identity risk scoring should not be dismissed as marketing claims.

The architectural question is narrower and more important: where does the AI reason, and what state is available at the moment of reasoning? Does the system reason over preserved causal context as the events occur, or does it reconstruct context after the fact from data structured for storage, search and retention?

Public product material does not justify reducing every vendor implementation to a single design pattern, so the safer claim is not that these platforms lack behavioural analytics. They clearly do not. The safer claim is that behavioural analytics alone does not answer the substrate question. Anomaly detection on indexed or post-ingested data may still leave the AI reconstructing causality after the fact.

The same caution applies to EDR-native investigators such as CrowdStrike Charlotte AI, SentinelOne Purple AI and Microsoft Security Copilot in Defender contexts. They sit closer to endpoint and XDR telemetry than a SIEM-bolted copilot, which helps. But the diagnostic test remains the same: is the AI reasoning over data structured for that purpose when the events occurred, or is it rebuilding a partial story afterwards?

The category error

These four problems are not solved by better prompts, better models, longer context windows or SOAR integrations alone. They are consequences of where the AI has been inserted.

The SIEM was designed for a different job: retention, search, compliance evidence, rule evaluation and forensic reconstruction. It does that job well. The mismatch appears when that indexed store becomes the only substrate for a continuously reasoning AI investigator.

The image worth holding in mind is the one of reading a book backwards. The alert is the last chapter. The AI is being asked to reconstruct the earlier chapters by paging back through an index, on data that may no longer be available at the resolution required, paying token cost for every page it has to re-read, while the events the book is about continue to unfold faster than the reading.

The intuition that should follow is that the architecture has to invert. Reasoning has to happen earlier, closer to the event stream as events occur, rather than only backwards from the alert after the fact. That is not a feature change. It is a foundational change to where in the stack reasoning happens, what data is available to it and how that data is structured.

What a redesigned SOC actually requires

A SOC designed from the beginning around AI-assisted reasoning looks structurally different from a SOC that has had AI added to it.

Reasoning happens closer to the event stream, as events occur, rather than only against an indexed store after the fact. The AI has access to events as they happen, with upstream state available as context rather than as a later query result.

Behavioural detection runs continuously and produces structured context as a by-product of normal operation, rather than reconstructing that context on demand. When an alert is produced, the chain of events that led to it is already in a form the AI can reason over because the structuring happened upstream rather than at investigation time.

The AI has access to the upstream causal chain because the chain has been preserved and represented for that purpose. Latency is lower because the system is not repeatedly asking the storage layer to recreate history. Token cost is bounded because the AI reasons over pre-structured context rather than raw, redundant data that has to be re-fetched and re-tokenised on each query.

Detection engineering does not disappear in this model. It moves. Rules express patterns on the stream as well as against the store. Hunts become continuous behavioural assessments rather than point-in-time investigations. Sigma-style detections and query languages such as KQL and ES|QL remain useful where retrospective search, compliance retention and the SIEM’s system-of-truth role still apply.

This is not a call to rip out the SIEM. The SIEM remains the system of truth for compliance retention, forensic search, regulator-facing evidence and the long tail of investigations that span weeks rather than minutes. What changes is where AI reasoning happens. It happens on the stream alongside the SIEM, not only inside it.

The analyst is still central

The redesign is about what the AI reasons over, not whether a human remains in the loop. The AI does the work it is good at: fast correlation, exhaustive lookups, hypothesis generation, causal timeline drafting and surfacing the next best question. The analyst makes the calls that require judgement, context and accountability: escalation decisions, containment decisions, customer communication and regulatory reasoning. In a regulated environment, that is not optional. It is the model.

What changes is the quality of the work the AI can do under the analyst’s direction. With state preserved at decision time and the upstream chain available in a form the AI can reason over, the AI’s contribution becomes more substantive. Without those things, the AI is doing the same backwards-reconstruction work the analyst has always done, only with less environmental intuition and more confidence than it may have earned.

Stream-native reasoning does not displace evidence preservation. The events the AI reasons over still land into the SIEM as the system of truth, retained on the schedule the regulator requires and available for audit and forensic continuity. The architecture preserves both clocks: the speed needed for detection and the durability needed for assurance.

A note on vendor claims

There are good reasons to be cautious about claims of architectural reinvention in this market. The incentive to overstate the difference between genuinely new architecture and existing architecture with new branding is significant, and buyers are right to be sceptical. Many products currently positioned as AI-native SOC platforms may, on close examination, be AI copilots layered on top of conventional data paths.

The diagnostic question worth asking is straightforward, and it cuts through most of the marketing noise: when the AI investigates an incident, is it reasoning over data that was structured for that purpose at the moment the events occurred, or is it reconstructing context after the fact from data that was structured for another purpose?

That answer separates redesign from retrofit.

The argument is not that bolt-on AI is useless. It has clear value for bounded triage tasks, enrichment, reporting and first-pass classification — work where the structural limitations are tolerable because the task is bounded and the cost ceiling is known. The argument is that bolt-on AI alone cannot be the foundation of an AI-native SOC, because the foundation is in the wrong place. The redesign required is not a near-term product feature. It is a different architecture, and it has to be built from the data plane up.

What that architecture looks like in practice — the cost model, the deployment shape, the migration path from an installed SIEM, the way the analyst’s role changes inside it, and where regulated retention requirements continue to live — is the subject of part five.

This argument has shaped how CUMULO and the operating model around it have been built at e2e-assure. Part five takes it into the operational specifics.

 

Sources: Gambit Security technical report on the Mexico campaign; Anthropic Project Glasswing and Claude Mythos Preview materials; Anthropic Frontier Red Team Mythos Preview write-up; Mandiant M-Trends 2026; CrowdStrike 2026 Global Threat Report; CardinalOps Fifth Annual State of SIEM Detection Risk (2025); Microsoft Sentinel, Elastic Security and Splunk UBA product documentation.

Related Posts

Why AI in the SOC comes down to the data you hold and the accountability you’ll stand behind, not the model you buy. By Rob

Author: Rob Demain, CEO & Founder The threat model behind most security programmes has held up well for a long time. The adversary buys or