When Designing Agentic Solutions, Learn the Lessons of the Past

March 10, 2026
-
Sean Lyford

This image licensed by CC Public Domain List

Large Language Models (LLMs) have taken the technology world by storm the last few years. With the release of ChatGPT 3, then Meta’s Llama, DeepSeek R1, and so on, a great deal of recent developments have been placed on when and where we can use these models to assist with difficult problems, leading into the development of agentic frameworks where code can be called based on LLM outputs. It truly is incredible the amount of development, research, and resources poured into these models.

Despite the advancements and accomplishments of LLMs, the IT field (developers, staff, security, etc.) have failed to learn lessons from the past nor is there an appreciation for the potentially erratic behaviors of these technological wonders. The purpose of this blog post is to draw attention to these erratic behaviors and hopefully encourage others to adopt a discerning eye for potential security issues introduced by agentic solutions.

 

The Prompt Problem

Before we start into agentic solutions, we need to start with a high-level understanding of LLMs. No, we won’t get dragged into the details of what a transformer architecture looks like, but we need to understand how they are utilized and the vulnerability commonly attributed to them.

To begin, we need to understand how LLMs are used and the common vulnerabilities attributed to them. You might see phrases like “system prompts” or “user prompts” but these are just sections of the same blob of text passed into an LLM; there is no separation of instructions from the arbitrary data. What happens if this arbitrary data contains more instructions? This is known as the Confused Deputy Problem.

The Confused Deputy Problem is simple: when multiple, potentially conflicting commands are given, which instructions are appropriate to follow? As humans, we can scoff at the idea but we fail at this too. Victims fall for phishing emails that appear to come from sources of authority, like a friend or colleague. In some rare cases these people are the actual bad actors themselves. If your manager or director asked you to do something unusual, at what point would you begin to question requests and escalate matters to other higher ups or even authorities?

LLMs do not contain the ability to reason and are ultimately just text generators based on probabilistic approximations. So called “reasoning models” output additional text (used implicitly as additional input for further generation) to help problem solve, but papers from Apple and Anthropic have shown limitations to this approach. And, if anyone has used the models for known problem cases with LLMs (e.g., “how many R’s in strawberry/blueberry”), these reasoning models are ultimately chattier and not really reasoning about the problems. At the end of the day, any and all instructions are still included with the data to be evaluated, leading to the Confused Deputy Problem. That is how a vulnerability known as Prompt Injection occurs.

Prompt Injection tricks the model into doing something it normally would not do either due to conflicting instructions in the prompt or because of how the model was trained, the latter situation is known as Jailbreaking. And “Injection” is the appropriate term as it resembles other injection-based attacks such as SQL Injection or OS Command Injection. The semantics of the original query/command/prompt is altered by new inputs.

At the time of writing, there is no guaranteed method to prevent prompt injection. Between the conflicting instructions combined in a prompt and the odd inputs that operate as potential jailbreaks, it is hard to prove that a model will always do as intended by the service provider. There are methods to limit the probability by utilizing input filters, guardrails, and better models. But those protections aren’t the focus of this blog post.

 

The Design Problem

At the end of the day, like with any deep learning model, a large language model is a complicated formula. Input is passed in, a series of matrix-based operations occurs, and an output is returned. While this description seems out of the blue, it highlights the actual nature of the models: it is the surrounding software that dictates any actions. The software supplies the inputs from other sources, the software executes the transformations, and the software operates on the outputs. This output operation may be as simple as displaying to a user or as complex as running arbitrary code within the output.

But what happens if the input is provided by third-parties? Can the output be swayed by attackers? Earlier, we said that there presently is no foolproof way to prevent prompt injection. We must assume that any and all outputs can be incorrect in some way. But this is not how we are designing these solutions. 

We are creating tools that don’t perform input analysis, mix different layers of functionality access, and potentially confuse datasets (e.g., users). We are implicitly trusting that the LLM will set actions in motion as intended when we know that things can go wrong. While I don’t believe that this is a conscious decision, it stems from a hyper-fixation on what is intended to happen rather than what can happen. 

This pattern of implicit trust because of written code or internal systems has been learned before. We think the LLM will behave as expected because we’ve told it to via a prompt. We assume that tool calls will always work because they’re “internal.” But this is a pattern that we’ve seen with web, iot, and mobile already.

Lessons of the Past

While this all feels new and novel, especially with the concepts of prompts and agents, the security perspective should not be too foreign when compared to developments of the past 20+ years. To highlight this, I want to take a moment to focus on IoT, mobile, and web applications. Even in the course of my career, I’ve seen the same patterns arise and I want to describe them.

 

Internet-of-Things (IoT)

Even to this day, various appliances and gadgets are created with internet connectivity. While this ranges from novelties like refrigerators tracking consumption for repurchase to Alexa-like devices playing music and receiving commands, there is a vast mixture of utility in this space. At the same time, it introduces problems that developers may not be used to that stems from one fact: the device is physically under someone else’s control.

If someone has physical access to a device, then it can be opened and tampered with. Books exist that instruct readers how to open devices and look for serial ports or other data bases with which to interact with the microcontroller. With this in mind, there is a very real risk, perhaps inevitability, that the device can be compromised. As such, the device sending data to remote servers should be assumed hostile since the device can be tampered with at any point. The first thing I think of when I think IoT assessment is “are there credentials/secrets on the device?” If there are, the first thing I want to check is if I can enumerate remote APIs to see just how locked down the remote servers are from rogue devices.

Mobile

As smart phones and tablets became more prominent and powerful in the mid-2010s, the demand for mobile applications also soared. However, anyone who tested mobile applications can tell you that implicit trust in the mobile applications and devices was rampant.

Trusting data stored on the devices, trusting mobile-only APIs to not be abused, trusting devices to not be tampered with, assuming device-stored credentials would not be leaked or obtained. An implicit mindset existed (and may likely still persist) where the application will behave as written because the developer wrote it. However, the devices on which the mobile applications run do not belong to the developers. They may be jailbroken or altogether “fake” (e.g., an emulator). These devices could be changed on a whim, meaning that client code and data may also be changed and, in turn, malicious. This applies to even the client executable itself, which could potentially be inspected or even altered.

 

Web

Much like the mobile case, it is easy to fall into the trap of presuming how web requests will flow. And modern web frameworks may obfuscate how HTTP truly works, which is stateless, without authentication, and the application must enforce state itself.

Cookie or local storage may not persist as intended since it is stored on the client. Requests can be replayed and sent out of order from what was intended. Front-end filters and controls can be bypassed entirely by submitting modified HTTP requests to the server via curl commands. “Admin-only” endpoints may be discovered and utilized even though URLs were not present in the user interface. And so on…

Similar to what was observed in the wave of mobile development, web development can easily fall into the trap of presumptive correctness because of intended workflows. All requests and associated data must be validated, even if a copy of that data may have originated from the application itself.

 

Agents

Now we arrive at a modern wave of development. While this wave of “hype” is perhaps more muted than the mobile and dotcom crazes, the same design mistakes rhyme as before. There is a presumption that tool calling will be valid. There is a presumption that MCP servers can safely be “internal only.” There is a presumption that, because prompt instructions say so, the LLM can never misbehave.

There is zero guarantee that an LLM will behave as intended. As such, we should be assessing LLM outputs in the same way mobile or web requests are handled: do not trust and definitely verify. A prompt may alter the parameters such that backend services are probed for security weaknesses. Tools intended for privileged users may be utilized after the LLM is tricked into calling them. 

The mistake made with agents is that the LLM is used as the authoritative driver of tool-calling when it would be more correct to take the calls as suggestions to scrutinize. Even if the prompt is not injected, there is the ever present risk of hallucinations and errors. A famous example occurred when a user reported that their AI agent deleted all of their data.

Agents and tools appear to be encouraged to be written as generically and as “reusable” as possible. As we’ll see in some sample case studies, this only encourages potential abuse and problems.

 

Case Studies

Anthropic’s Postgres MCP Server

As part of the MCP efforts generated by Anthropic, the company created sample MCP servers to base third-party development off of. One such example was an MCP server intended to execute read-only queries against Postgres.

While this sample server is now deprecated as of July 2025, this service is still 10,000 to 20,000 downloads weekly. In addition, it still exists (note: present tense) with a vulnerability to make the server execute database-altering queries, which goes against the original intent.

To start with, the original design was not good. The server takes a query and executes it within a read-only transaction. Even if the read-only mandate were preserved, arbitrary data can be read, which may include password hashes or other sensitive data. Furthermore, the read-only bypass is trivial. By committing the transaction, a new transaction can be opened that does not include the read-only mandate.

If the public web would not receive arbitrary access to your database, and if a web application would not accept arbitrary SQL queries from a browser, why would an LLM-based agent be any different? The database user permissions should be as tight as possible, and the queries should be premade, parameterized, and filtered for access controls, just as with any other modern application.

Read more about this here.

 

Replit AI Database Deletion

This incident was referenced earlier. Replit is a company that offers products to assist in coding applications and websites. Naturally, this is driven by an LLM in the backend. In July 2025, a developer was utilizing Replit’s AI coding assistant to help with development, which had entered a code freeze. The developer discovered that their entire production database had been emptied, replaced with empty tables, and had even allegedly claimed that the assistant had “lied in [their] unit tests, claiming they passed.”

This incident shows the amount of control some companies and users are providing to AI assistants. And, in both incidents, there doesn’t appear to be signs of foul play. The agents themselves had acted inappropriately without outside influence, presenting the real risks that hallucinations and incorrectly configured agents can bring. If the details in this incident sound similar but slightly off to what you may remember, I myself had confused this incident with another from Google where a user of Google Antigravity claimed that their entire disk had been deleted. 

Learning Materials

While learning the framework LangChain, one textbook I read included example tool definitions. Naturally, these aren’t entirely meant to be run in production and are just demonstrative. One such tool was a calculator tool intended to give better math output. The example given was for the LLM to determine how old a former US President was at the time of their death. The LLM would determine that certain years would need to be passed into the tool, and the tool would calculate the age by utilizing ast.literal_eval.

Before I get too far into my issues with this approach, I do want to highlight that literal_eval is not the same as the plain eval function. It is intended to create a Python literal out of the input, and it will throw an exception if the input is not a literal expression. This may help with certain stages of parsing inputs.

Now for the bad news: this approach that the textbook takes no longer works. The original tool would pass a string in along the lines of “10 – 2” and it would expect “8” back. The literal_eval function does not evaluate expressions at the time of writing this blog, only literal values.

There are several issues with this approach. First, a tool that is extremely verbose is being created. There was no input validation that two numbers and one of several operators (e.g., “-“ or “+”) were used. Second, where literal_eval no longer works, there may be a strong temptation to fallback to eval to match the “convenient” flexibility of the original, which would result in arbitrary code execution.

 

Takeaways

I don’t want to detract from developers trying to solve difficult problems with the use of LLMs. It is a very interesting technology that should be explored. Even here in Cloud Security Partners, we are exploring how best to augment our staff with LLM automation. However, I want to embed the mindset that the LLM is fallible. It can hallucinate or be convinced to do something unintended.

The LLM itself cannot do anything; it is the surrounding code that takes action. We should be validating that the LLM action matches user permissions, function-level and data access. We should be asking ourselves if we even want some functionality exposed to automated processes (e.g., data destruction, payment processing, etc). We should think about LLM outputs as potentially malicious or incorrect, and, in turn, we should ask what we can do within the agentic code to validate the outputs in relation to the user.

Thank you for reading! If you have concerns about your agentic solutions or AI pipelines, Cloud Security Partners can perform an analysis of your solutions to help identify solutions before potential problems arise.

About the Author

Sean Lyford is a Senior Security Consultant with Cloud Security Partners. He has over 11 years of experience within the information security and development fields. Sean focuses on application and cloud security practices.

Sean has had a career in both application security consulting and software engineering. As a software engineer, Sean has experience with high-level web applications, AI/ML integrations, and network application development. With his experience as a software engineer, Sean was able to effectively communicate with development teams and provide remediation guidance and prioritization.

Sean has been focusing on LLMs as well as learning more about deep learning and data science as a whole. Along with other members of Cloud Security Partners, Sean is trying to answer where and how LLMs can help supplement existing workflows.

Stay in the loop.
Subscribe for the latest in AI, Security, Cloud, and more—straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Back to blogs