LevelBlue Completes Acquisition of Cybereason. Learn more

LevelBlue Completes Acquisition of Cybereason. Learn more

Services
Cyber Advisory
Managed Cloud Security
Data Security
Managed Detection & Response
Email Security
Managed Network Infrastructure Security
Exposure Management
Security Operations Platforms
Incident Readiness & Response
SpiderLabs Threat Intelligence
Solutions
BY TOPIC
Offensive Security
Solutions to maximize your security ROI
Operational Technology
End-to-end OT security
Microsoft Security
Unlock the full power of Microsoft Security
Securing the IoT Landscape
Test, monitor and secure network objects
Why LevelBlue
About Us
Awards and Accolades
LevelBlue SpiderLabs
LevelBlue Security Operations Platforms
Security Colony
Partners
Microsoft
Unlock the full power of Microsoft Security
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings

AI-Enabled Cyber Intrusions: What Two Recent Incidents Reveal for Corporate Counsel

This article was authored by Daniel Ilan, Rahul Mukhi, Prudence Buckland, and Melissa Faragasso from Cleary Gottlieb, and Brian Lichter and Elijah Seymour from Stroz Friedberg, a LevelBlue company.

Recent disclosures by Anthropic and OpenAI highlight a pivotal shift in the cyber threat landscape: AI is no longer merely a tool that aids attackers, in some cases, it has become the attacker itself. Together, these incidents illustrate immediate implications for corporate governance, contracting and security programs as companies integrate AI with their business systems. Below, we explain how these attacks were orchestrated and what steps businesses should consider given the rising cyber risks associated with the adoption of AI.

 

Anthropic’s Disruption of an Autonomous, AI-Orchestrated Espionage Campaign

Just a few days ago, Anthropic’s “Threat Intelligence” team reported that it disrupted what it refers to as the “first documented case of a cyberattack largely executed without human intervention at scale”.[1] Specifically, in mid-September, Anthropic detected an attack that used agentic AI to autonomously target roughly 30 entities, including major technology corporations, financial institutions, chemical manufacturing companies and government agencies, and successfully execute end-to-end intrusions. The threat actor, determined with “high confidence” by Anthropic to be a Chinese state-sponsored group, manipulated Claude Code with structured prompts enabling AI to autonomously perform roughly 80–90% of the work across the attack lifecycle. That lifecycle included reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration operations, each occurring independently at rates that would be humanly impossible.

To achieve the attack, the group first selected targets and built an autonomous framework using Claude Code to conduct intrusions; the attackers then bypassed guardrails by “jailbreaking” the model with innocuous, role‑playing prompts[2] that concealed malicious intent. Accordingly, Claude rapidly mapped systems and high‑value databases, reported findings and then researched, wrote and executed exploit code to identify vulnerabilities, harvest credentials, escalate access and exfiltrate and categorize sensitive data while implanting backdoors. In the final phase, Claude generated comprehensive documentation (e.g., credential lists, system analyses and attack notes) to enable follow‑on operations with minimal human oversight.

Three aspects of the attack stand out. First, while the attackers mostly used typical, off‑the‑shelf security tools, the attackers inventively stitched those tools together using standard interfaces like the Model Context Protocol (a common way for models and tools to interoperate) to perform actions that were previously in the sole domain of human operators. Second, the AI ran multi‑day campaigns, kept track of context, and generated organized reports—bringing the kind of scale and persistence typically reserved for well‑resourced human teams. Third, while the AI exhibited familiar model limitations (such as overstating findings and occasionally fabricating data during autonomous operations by claiming to have obtained credentials that did not work or identifying critical discoveries that proved to be publicly available information) these hallucinations did not preclude successful compromises, thus underscoring that hallucinations are a friction, not a barrier, to AI-enabled cyber attacks.

Anthropic responded by banning relevant accounts, improving detection tuned to AI‑driven attack patterns, building early‑warning tools, coordinating with industry and authorities and incorporating lessons learned into safeguards and policies. The bottom line: AI can now act as a largely independent intruder with relatively minimal human effort, and defenders should plan for adversaries using agentic capabilities at scale.

LevelBlue delivers continuous protection against evolving cyber threats.

Learn More

OpenAI’s ShadowLeak: Vulnerability Could Lead to Zero-Click Indirect Prompt Injection and Service-Side Exfiltration

A separate proof of concept attack was first discovered by cybersecurity researchers at Radware, Ltd. (“Radware”), and later confirmed remediated by OpenAI.[3] “ShadowLeak” exposed a “zero‑click” indirect prompt injection path in ChatGPT’s Deep Research agent when connected to enterprise Gmail and browsing tools. To exploit this vulnerability in a social engineering attack, a threat actor would first embed hidden instructions inside normal‑looking emails; then, when the email user prompted the agent to summarize or analyze their inbox, the agent would, for example and unbeknownst to the user, ingest the hidden instructions and execute autonomous web requests directly from OpenAI’s cloud infrastructure, exfiltrating sensitive data, including personally identifiable information, to attacker‑controlled sites. Notably, this meant that in the case of a successful attack as demonstrated by Radware, once the Deep Research agent undertakes the actions as instructed by the prompt injected by the AI agent attacker (through the malicious email), sensitive data would be invisibly extracted without the victims ever viewing, opening or clicking the message.[4]

The governance significance is substantial. Because the data was exfiltrated from the impacted organization’s side, such organization’s own network never saw the exfiltration. This means that traditional controls (e.g., awareness training, link inspection, outbound filtering, and gateway data loss prevention) offered limited visibility or deterrence. Thus, the risk now centers on “what the agent does,” not just “what the model says,” and the threat extends beyond email to any AI agent connected to SaaS apps, CRMs, HR systems or other enterprise tools via protocols that standardize agent actions and inter-agent collaboration.

Recommended mitigations to prevent or detect such attacks may include treating agent assistants like privileged users with carefully separated permissions, sanitizing inbound HTML and simplifying inputs prior to model ingestion, instrumenting agent actions with audit‑quality logs and detecting natural‑language prompt attacks. From a contracting perspective, organizations should consider requiring that their vendors test their solutions for prompt injection, commit to input sanitization, gate autonomy based on maturity and risk and red‑team the full chain of agents and tools before broad rollout.

 

Strategic Implications for AI Adoption in the Enterprise

Taken together, these incidents transform what was once considered a distant, theoretical concern into present-day reality. Agentic AI can now largely independently execute complex offensive campaigns using standard tools at nation-state scale, and enterprise assistants, once granted access and operational autonomy, can trigger actions from the provider’s infrastructure that circumvent traditional enterprise controls. In practice, this means:

  • Identity and authority for AI systems are fluid and spread across tools. An agent’s “scope” is not fixed; it changes based on connected tools, protocols and hidden instructions inside content.
  • Controls focused on what the model writes are not enough. The priority is controlling and monitoring actions (i.e., calls to tools, APIs, browsers and other agents) with logs that capture who did what, when and why.
  • Traditional training and perimeter defenses cannot fully address actions taken on the provider’s side. Organizations should negotiate provider‑side security commitments and build detection and response based on agent activity data, not just model outputs.
  • AI mistakes (hallucinations or fabrication) may slow attackers or cause errors, but defenders should not rely on them as protection. The baseline capability for AI‑driven offense is already high and increasing.
  • Traditional defenses may be effective against AI-driven attacks, but the volume of attacks may increase. The incidents Anthropic discusses appear to be commoditized attacks that relied on commercially-available tools rather than novel tactics, techniques and procedures. Thus, traditional defenses should be successful against such attacks. Instead what is more interesting is the speed and volume of the attacks, which far exceeded what humans could do on their own, reinforcing the need for faster and AI-based defensive strategies that are able to respond at scale.

 

Key Takeaways for Integrating AI

When considering integrating AI into everyday workflows and products, and to meet obligations under applicable data protection, cybersecurity and digital regulations, entities should:

  1. Treat AI assistants and agents like privileged system users. As noted above, organizations should consider separating “read‑only” from “action” permissions, using distinct service accounts and requiring auditable controls for tool use, browsing and API calls.
  2. Contract for upstream safeguards. Require vendors to: (a) sanitize inputs (including stripping risky HTML), (b) validate systems against prompt injection and natural language attack vectors (i.e., by implementing advanced controls such as judge LLM evaluation, spotlighting, and security-focused prompt-engineering patterns) and (c) provide action logs you can audit and use in incidents.
  3. Build telemetry that captures agent behavior. Insist on provider‑side logs that record who did what, when and why for every agent action, and align those logs to your incident response and reporting needs.
  4. Update governance artifacts. Revise security questionnaires, data protection addendums and incident response plans to address provider‑side data leaks, risks from inter‑agent protocols and the move from output safety to action safety.
  5. Prioritize secure AI development. Exercise due diligence when integrating components sourced from third parties, including free and open-source elements, to ensure they do not compromise the security of proprietary assets or operational environments. Verify security protocols and, where applicable, conformity with mandatory cybersecurity requirements (e.g., under the EU Cyber Resilience Act).
  6. Consider interplay with mandatory cybersecurity rules. Stay abreast of evolving developments, particularly as cybersecurity is no longer a matter of best practice. Horizontal and sector-specific rules in the EU impose mandatory cybersecurity requirements on certain AI systems and products with digital elements (both hardware and software) available on the EU market and used to connect to a device or network. Cybersecurity and vulnerability handling measures should account for agentic AI attack surfaces and threats.


[1] Anthropic’s full report on this incident can be accessed here: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf

[2] Notably, in addition to breaking down the attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose, the attackers also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing. This role-play, according to Anthropic, was key to the success of the attack.

[3] See Radware’s description of the vulnerability here: https://www.radware.com/blog/threat-intelligence/shadowleak/.

[4] Importantly, Radware disclosed the bug to OpenAI in June 18 through a vulnerability reporting platform. In August, OpenAI said the vulnerability was fixed and the company later marked it as resolved on September 3.

ABOUT LEVELBLUE

LevelBlue is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.

Latest Intelligence

Discover how our specialists can tailor a security program to fit the needs of
your organization.

Request a Demo