Threat actors are trying to leverage organization-owned AI agents to power complex threat activity.
Between March and May, Zenity researchers observed three distinct campaigns leveraging its honeypots’ large language model (LLM) infrastructure as resourcing for offensive AI operations, exposing Ollama and LiteLLM endpoints. What’s most fascinating about this attack vector is that it doesn’t require a full-scale compromise, just knowledge of an exposed endpoint.
More specifically, Zenity’s blog post notes that attackers exploit “the inference endpoints that self-hosted AI software exposes for applications to call.” The attacker doesn’t need any special authentication to reach them; they just need to know where the endpoint is. Examples of such endpoints include the Ollama “/api/generate” and “/api/chat” endpoints on port 11434, and LiteLLM’s “/v1/responses” endpoint on port 4000.
Three Attackers Leveraging AI Infrastructure
The three operators used the tooling for different use cases. Two were autonomous penetration testing frameworks (Strix and HexStrike AI) and one was “an OpenAI Codex agent carrying a persona built to suppress safety refusals and assisting in web reverse-engineering work.”
“The approach needs no software exploit. The attacker simply configures an agent or client (e.g., a LiteLLM client, the CherryStudio desktop app, or the Codex CLI) to use the exposed endpoint as its model backend,” Zenity researchers said. “The agent’s entire ‘brain’ then rides in the request body: its system-prompt persona and its tool definitions, which is exactly what our sensors captured. Operators typically send a small ‘hello’ probe first to confirm the endpoint answers, then submit the full payload.”
For the Strix operator, a single IP source used a LiteLLM client to send a 140,000-character prompt to leverage the client to weaponize Strix against an unidentified French auction house. Notably, the prompt instructed the agent to never ask for permission, run non-stop, never identify “Strix” or any identifiable names/markers in the agent’s actions, and to “GO SUPER HARD on all targets.” Zenity’s sensors caught and thwarted the effort, though the presence of persistent “retry” commands suggested a potential live operator.
For HexStrike AI, the attacker pointed the desktop LLM client at the honeypot’s Ollama instance and sent it the penetration testing orchestration servicer’s 150-plus offensive tool toolset. There was never a target identified in this attempt, suggesting the operator may have been in the staging process for an attack.
A third IP source pointed an OpenAI Codex agent at a honeypot’s LiteLLM proxy and, under the persona of a security auditor, directed the agent to conduct Web reverse-engineering work.
Part of what enables these attacks is how Ollama and LiteLLM handle authentication. According to Zenity, Ollama ships with no built-in authentication on its default port (the aforementioned 11434) and LiteLLM authentication is opt-in, dependent on whether a user sets a master key. There is also a common placeholder key (sk-1234) attackers have been known to target.
Then there’s the exposure element. Ollama defaults to a local host but is commonly misconfigured to be exposed to all interfaces, and LiteLLM’s proxy is Internet-facing on a public host by default.
Don’t Expose Your AI Infrastructure or Else
Michael Bargury, chief technology officer (CTO) and co-founder of Zenity, tells Dark Reading that, for the most part, customers own their AI footprint when it comes to a question of who is responsible. That said, it’s not cut and dried.
“A customer’s use of an AI platform, cloud infrastructure, third-party and homegrown agents results in a new attack surface. As a customer, you own what you build and deploy,” he says. “However, vendors are accountable for the securability and inspectability of their platforms. They should provide secure defaults and allow customers to own their security outcomes by providing ways for customers to inspect and hook their runtimes.”
For organizations that want to protect themselves from these kinds of attacks, Zenity recommends watching for requests from commonly abused endpoints, particularly those carrying full-agent payloads rather than a standard prompt; to block common request body indicators such as prompts that include a full suite of tooling or requests involving models you don’t host; to block requests associated with penetration testing tools or unsafe personas; and to block IP addresses used by the operators.
Broader recommendations include: don’t expose model back ends to the Internet wherever possible; require real authentication and reject default or placeholder keys; inspect the request body from outside sources; and monitor the traffic coming through to your AI infrastructure.
For CISOs, Bargury says a big takeaway is that attackers are showing increasing AI agentic literacy and are finding new ways to target organizations.
“Attackers are actively looking to discover and hijack AI infrastructure and use your tokens to achieve their goals,” he says. “Assume any AI system you put on the Internet will be targeted by AI literate malicious actors within hours.”

No responses yet