
Illustrated by Jeff Prymowicz
Signature-based checks, rule engines, policy evaluation, static queries, and reproducible detections are the foundation of modern security tooling. They are dependable, relatively cheap, and capable of operating at the scale real organizations need. Even advanced security products are usually just more elaborate versions of the same idea: if you can define a condition precisely enough, you can detect it reliably.
LLMs are different.
They are slower, more expensive, and definitely less predictable than deterministic systems. Run the same analysis twice and you may not get the exact same output. Change the prompt, the context window, or the tools available to the model and the result may shift. This nondeterminism is part of what makes these systems fundamentally different from the security tools most teams are used to.
So where do LLMs and AI tools fit into existing workflows? This blog is about understanding what kind of AI security tools exist right now and what teams are actually using them for.
The same technology that is expanding the attack surface is also giving security teams new ways to analyze code, reason about risk, and scale security work. Attackers are already benefiting from the same shift. AI can accelerate code generation, compress reconnaissance and research time, and lower the cost of producing plausible exploits, phishing content, and other offensive outputs. At the same time, organizations are shipping AI-assisted code, AI-enabled features, and new model-driven workflows faster than their security practices are adapting. The result is a larger, faster-moving attack surface. Defenders should be empowered by the same class of tools, not left relying on workflows designed for a slower and more deterministic era.

Before the rest of this series gets into agents, hands-on review workflows, and direct comparisons with traditional tooling, it is worth understanding what this category actually looks like today.
“AI security tool” Spectrum
At one end are general-purpose coding agents and code assistants that were not built specifically for security, but are already being used for security work. This includes tools like Codex and Claude Code. These tools can read code, answer questions about behavior, explain logic, propose fixes, and help reviewers reason through complex flows. In practice, a strong coding agent in the hands of a security engineer is already a security tool, whether the vendor markets it that way or not.
A second category is security tooling with AI layered into an existing detection or review workflow. These tools usually combine a model with more traditional security signals: static analysis, rule engines, code scanning, alerting, or policy enforcement. In many cases, the AI is not replacing the existing engine so much as extending it. This category is mostly found within existing frameworks; perhaps one of your engineers builds a script that uses AI to vet results from Semgrep. In general, tools in this category help to explain findings, prioritize noise, draft fixes, or reason about issues that fall outside clean pattern matching.

Then there is the emerging category that gets the most attention and deserves the most caution: agentic systems that attempt more autonomous security work. These include agents for code review, multi-step analysis, and in the case of AWS’s own Security Agent service: live application assessment. Some of these workflows are promising. Some are real. Some are still much closer to research projects than reliable production controls.
The array of tooling is broad, but the underlying pattern is simple: some tools assist human analysis, some augment existing security controls with AI, and some push toward more autonomous security workflows.
What teams are actually doing
From Fortune 500s to VC-backed startups, we work across some of the most complex and heavily regulated industries in the world. And despite the AI hype cycle, what we consistently see is this: security teams are not adopting AI uniformly across their work. Certain use cases have proven meaningfully more practical than others and the gap is widening.
The strongest and most obvious workflow is code review. This is where LLM-based tooling already feels useful. Many models are proficient at reading code, tracking intent across files, explaining suspicious behavior, and helping reviewers investigate issues that do not reduce neatly to a single known pattern. That makes them valuable for both dedicated security reviews and general engineering workflows where security is one dimension of a broader code assessment.
The second common use case is threat modeling assistance. AI is well-suited to first-draft work: enumerating abuse cases, surfacing trust boundaries, proposing architectural questions, and helping teams turn vague concerns into something reviewable. It does not replace real design review, but it can compress the distance between "we should think about this" and "here is a usable threat model draft."
The third is policy and documentation work. Security teams spend a surprising amount of time converting expertise into artifacts other people can consume: secure coding guidance, exception language, review checklists, internal standards, decision logs, and explanatory notes. This work matters, but it rarely gets the attention or staffing it needs. LLMs are often strong here because the task is less about perfect precision than about quickly producing a credible first draft that an expert can refine.
Incident triage and investigation support are also increasingly common use cases. AI can summarize evidence, connect findings across sources, help frame likely impact, and generate useful next questions. In a mature team, that does not eliminate analyst judgment. It reduces the amount of time spent getting to the point where judgment becomes useful.
These keep showing up because they align with what current models are actually good at: reasoning over code and text, synthesizing context, and accelerating expert interpretation.

Tooling Maturity
The easiest way to cut through the noise is to separate what is working now from what is still mostly aspirational.
The most mature use-case for AI-assisted analysis of static artifacts: source code, pull requests, architecture notes, internal policies, incident summaries, design docs, and other text-heavy inputs. This is where the models have the clearest strengths and where teams are already getting practical value. Security reviews, guided remediation, threat-model drafting, and triage support all live here. General purpose LLM systems are already empowering engineers who understand the strengths and limits of AI technology.
What is less mature is anything that depends on strong repeatability, precise autonomy, or complex interaction with live systems. This is where expectations need to be managed. AI systems can be impressive when they analyze code in context or helping an expert reason through a problem. They are much less settled when asked to operate as reliable autonomous security workers. Multi-step website crawling, dynamic testing, long-running agent workflows, and unsupervised bug hunting all remain much less mature than the surrounding marketing suggests.
That does not mean there is no progress within those domains, there is, but it does mean teams need to distinguish between "helpful with supervision" and "dependable enough to trust as a control." Those are not the same thing.

Summary
The current AI security tooling landscape is not a leaderboard. AI technology does not invalidate traditional determinism-based tools, products, and workflows. AI represents an opportunity to complement those well-established security workflows and additionally uncover more risks with deeper, context-aware analysis, but it won't magically find all of the risks, all of the time. Strong code analysis does not automatically mean strong automated dynamic testing. A helpful assistant is not the same thing as a trustworthy reviewer. An impressive one-off result (which are frequent occurrences with modern AI) is not the same thing as an operationally sound workflow.
In the next chapter, we move from the tooling landscape to the systems built on top of it: skills, agents, and the emerging workflow layer that is turning base models into something closer to a repeatable security workforce. After that, we get more concrete: how to use these tools for real security reviews, and how they compare with the deterministic tooling security teams already depend on.
About the Author
Alex is a Principle Security Consultant at Cloud Security Partners. He has extensive experience helping engineering and product teams identify real attack paths, strengthen critical applications and cloud infrastructure, and build security practices that actually scale. He brings a blend of offensive security expertise and full-stack engineering experience with work spanning web applications, containerized services, thick-client applications, custom network protocols, and more.
Over the past decade, Alex has led deep technical security assessments, built tooling to automate large-scale reconnaissance and code review, reverse engineered proprietary systems, and supported incident response on high-impact issues. He's worked closely with organizations to reduce real-world risk without slowing down delivery.
Before his consulting career, Alex was a competitive member of the University of Central Florida's Knightsec capture-the-flag (CTF) team, where he helped drive the team to multiple national-level wins. He has also taught hands-on workshops at BSides Orlando focused on helping students learn how to approach CTF competitions and grow practical security skills.
Outside of client work, Alex spends his time building security-focused tools, experimenting with large-scale scanning and data pipelines, taming LLM's, and flip-flopping between tearing apart game engines and building them.
Stay in the loop.
Subscribe for the latest in AI, Security, Cloud, and more—straight to your inbox.
