Chinese companies released two new AI models in the past month that have pushed the boundaries of the nation’s capabilities for vulnerabilities discovery and caused concerns among some cybersecurity experts.
On June 13, Chinese firm Zhipu AI released an open-weight model, GLM 5.2, that subsequent testing found outperforms Anthropic’s Opus and Open AI’s GPT-5.5 on some bug-finding benchmarks and costs only $0.17 per vulnerability found. Two weeks later, another firm, 360 Security Technology, released a frontier-model-based security tool, Tulongfeng (aka “Dragon Saber”), that its founder touted as China’s version of Mythos, claiming it had already found more than 3,400 vulnerabilities, according to a Reuters report.
While the software that supports the vulnerability-discovery process makes a critical difference, the fact that an open-weight model performed so well in benchmarks highlights that defenders need to pay off their security debt as quickly as possible, says Chris Inglis, the former US National Cyber Director and a strategic advisor for ransomware-defense firm Halcyon.
“Commodity models now can run circles around defenses, and so defenses need to get serious about knowing their architecture, prioritizing the weaknesses within that architecture, and, in rapid priority order, ruthlessly patch and fix your configurations,” he says. “I think that’s possible. … I don’t think we’re past the point of actually saving ourselves.”
AI systems are getting better at finding vulnerabilities, and attackers are increasingly using the capabilities to improve their offense. In April, the Cloud Security Alliance warned that the release of frontier models — most notably, Mythos — could lead to an “AI vulnerability storm.” In May, Google revealed it had detected the first AI-created exploit being used by an attacker. Some researchers have warned that even the scant details in monthly patching could result in the quick exploitation of vulnerabilities, which currently occurs in three hours, on average.
Chinese models do not just perform well, they also have good cost-performance, says Margaret Cunningham, vice president of security and AI strategy at Darktrace, an AI cybersecurity platform.
“Better models tend to be more reliable, but reliability has to be weighed against cost, access, speed, and ease of deployment,” she says. “In practice, both attackers and defenders make economic decisions.” A model, she adds, just needs to be good enough to justify its use.
China’s Caught Up in Ways That Matter
For both defenders and attackers, the fact that some Chinese models, such as GLM 5.2, have open weights — meaning they can be installed on local hardware — is a strong point for defenders to adopt it and enables attackers to experiment with escaping any alignment that prevents offensive use. In addition, for companies that need to keep data inside their network, a high-performing open-weight model is better than a frontier model, says John Gallagher, vice president at Viakoo, a provider of automated Internet of Things (IoT) cyber hygiene.
“Chinese models are designed to be downloaded and run on private hardware, optimized for low cost, and be customized,” he says. “Right now for OT and critical infrastructure there are advantages to defenders from the Chinese approach because of data sovereignty and leakage risks of requiring use of cloud APIs for the leading US models.”
Yet, in many ways, large language models (LLMs) and other AI architectures — such as a the Mixture of Experts (MoE) approach used by GLM 5.2 — have become the least significant part of the equation. Modern AI systems are now good enough, especially with the right harnesses — or supporting software — to find vulnerabilities in two of the three buckets of security debt that most companies deal with in their environments, says Inglis, including known, but unpatched, vulnerabilities and unknown but easily discoverable vulnerabilities.
“It actually doesn’t matter whether it’s a frontier model or a trailing model, we can be taken to the cleaners by them,” he says. “And so I worry more about the state of the defense than I do about the capability of the offense, because any one of those models can today run circles around most of the defenses.”
Frontier AI models are only needed to find issues in the third bucket — zero-day exploits using complex vulnerabilities or complicated attack chains, he says.
Less About the Model, More About Integration
In its testing, cybersecurity firm Semgrep found that GLM 5.2 performed the best out of all standard models, with a 39% F1 score, a common way of measuring true positives and true negatives for LLMs and other machine-learning systems.
The fact that a Chinese company has developed the model is less important than the fact that it’s widely available, says Darktrace’s Cunningham.
“Whether a model originates in the US, China, or elsewhere is often less important than whether or not security teams can integrate AI into their operations in a meaningful way,” she says. “Most organizations still have work to do around visibility, workflows, governance, and decision-making. Those factors will determine defensive effectiveness long before marginal differences between leading models.”

No responses yet