Researchers are offering fresh proof that AI coding agents have become a viable attack surface for threat actors seeking to steal credentials, manipulate data, and compromise development environments.

The research by Tenet Security demonstrated how an attacker could hijack AI coding agents into running arbitrary code on a developer’s machine by planting a single fake-error report in a public bug tracking service. In controlled testing of its “agentjacking” technique, the company found widely used AI coding assistants such as Claude Code, Cursor, and Codex retrieved the poisoned error data and, in many cases, executed attacker-controlled code on the developer’s machine.

Agentjacking With a Fake Error Report

In a real attack, the consequences could have included theft of cloud credentials, AWS keys, GitHub tokens, SSH keys, and CI/CD pipeline secrets. The credentials could potentially have enabled an adversary to access private source code repositories, compromise cloud infrastructure, or poison software dependencies across the organization.

Related:AI-Generated Workflows Are a Silent Security Disaster

The takeaway for organizations is discomfiting, says Barak Sternberg, CEO and co-founder of Tenet Security. “The AI agents you’ve deployed are now the soft attack path in, and your existing stack can’t see it,” Sternberg says. Agentjacking, he adds, did not involve a particularly clever exploit, just a fake error report. “The agent read it, trusted it, and ran our code with the developer’s own access. Every step was authorized, so [identity and access management], [endpoint detection and response], and network controls had nothing to flag.”

Tenet’s agjentjacking demonstration centered on Sentry, a widely used error tracking and application monitoring service that developers rely on to track bugs, crashes, and runtime errors in their applications. According to Sentry, more than 200,000 organizations around the world including companies like GitHub, Disney, Anthropic, Atlassian, and others use its product.

Tenet researchers created a fake error report and submitted it to a Sentry project using a publicly exposed Data Source Name (DSN). Applications use a project’s DSN to send telemetry data to a Sentry instance, without requiring user authentication. Many organizations expose their Sentry DSNs so client-side applications can report errors and performance data directly to Sentry. Tenet claimed it found with relative ease 2,388 organizations with exposed Sentry DSNs that could have agentjacked.

The “error” Tenet injected into the Sentry project was disguised as a legitimate debugging message but contained hidden instructions intended to influence AI coding agents used to investigate unresolved Sentry issues. Tenet found that when developers used an AI coding agent to query Sentry via the Model Context Protocol (MCP), the coding agent would retrieve the poisoned error event and treat the embedded instructions as legitimate diagnostic guidance. One instance involved a $250 billion company, Tenet said.

Related:Can Clothes Make You Invisible to Facial Recognition?

AI Agents Still Lack Discernment

The problem, as Tenet noted in its report, stems from the inability of AI coding agents to tell the difference between content they read and instructions to act. So, when an MCP connecter retrieves content from an external source — like a document, an email, or in this case, error logs — the AI agent handles everything as input, making it trivial for attackers to sneak in malicious instructions.

In an RSAC 2026 briefing earlier this year, a Netskope researcher demonstrated how an attacker could email a target with malicious instructions that an AI assistant would blindly execute if the user asked it to summarize the message.

“The takeaway isn’t ‘patch Sentry,'” Sternberg says. “It’s that an agent can’t reliably tell data it reads from an instruction to act. And the data it reads now includes telemetry, logs, tickets, and tool output that nobody ever treated as an attack surface.”

Related:Third-Party Breaches Teach Education Sector a Costly Lesson in Vendor Risk

Measures like performing configuration changes, prompting the agent to ignore untrusted input, sandboxing, and slapping an identity on the agent each help, but only marginally, he continues. “On their own they either don’t hold — we told agents to distrust the input; they ran the payload anyway — or they make the agent useless.”

Sternberg recommends organizations disable package-install scripts, along with requiring human approval before an agent runs a shell command or installs from data it read. He also suggests agents run with least-privilege. Over the long term, organizations need to implement capabilities to monitor an agent’s intent against the user’s original intent in real time and catch any misalignment at the moment the agent acts. “When the attack is a trusted agent doing exactly what it was told by poisoned data, the only place left to stop it is at the agent’s runtime.”

Gene Moody, field chief technology officer (CTO) at Action1, says attacks like agentjacking show why organizations need to treat AI models as insecure and untrusted until fully vetted. “Vetted in this case means full security testing, not just workflow testing. Even after that they should be heavily gated to account for how the model itself could be the vector if/when discovered. Limit in routes for data it consumes, and gate them too,” he says. Beyond that, organizations must look into how to freeze the AI’s ability to think outside the scope of its core design. The goal should be to prevent an AI agent from being able to act in a way that’s not an explicitly approved directive.





Source link

#

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *