“Just let the LLM write the SQL.” It sounds powerful. A user types a question in plain English, the model generates a query, the system runs it against the database, and the answer comes back. No SQL knowledge required. No BI tools. No waiting for the data team.

It works beautifully in demos. And it is a serious engineering mistake in production.

Direct SQL generation from LLMs combines two things that should never be combined: untrusted code generation and privileged execution. The result is a system that is trivially exploitable, operationally fragile, and architecturally unsound. Every lesson we have learned about building secure data access layers over the past thirty years tells us not to do this.

The Natural Language to SQL Temptation

The appeal is real. Natural language to SQL (NL2SQL) removes friction between business users and data. Instead of filing a ticket with the analytics team or learning a BI tool, a user can ask “What were our top 10 customers by revenue last quarter?” and get an immediate answer.

The technology works well enough to be convincing. Modern LLMs can generate syntactically correct SQL for a wide range of queries, especially when given the database schema as context. For simple queries against well-structured schemas, accuracy rates are high.

This is exactly why it is dangerous. It works well enough to ship, but not well enough to trust.

The Core Problem: Executing Untrusted Generated Code

When an LLM generates SQL, it is generating code. That code is then executed against a database with real data and real consequences. The LLM is an untrusted code source: you cannot predict, verify, or guarantee what it will produce for any given input.

This carries the same class of risk as SQL injection: dynamically constructed queries from an untrusted source being executed against a production database. The mechanics differ (injection exploits string interpolation; LLM generation produces entirely novel query text), but the core failure is identical. You are executing dynamically constructed queries from a source you do not control.

In some ways, LLM-generated SQL is harder to defend against than classical injection. With injection, you can use parameterization to structurally prevent the attack. With LLM-generated SQL, the entire query is the untrusted output. There is no parameter boundary to enforce because the LLM is producing the query structure itself, not just the values.

The Risk Surface

The risks of LLM-generated SQL fall into several categories:

Injection and prompt manipulation. A user (or an attacker) can craft prompts that cause the LLM to generate malicious SQL. “Show me all users” might produce a benign SELECT. But “Show me all users; also drop the users table” or more subtle prompt injections can produce destructive queries. The LLM does not understand the security implications of the SQL it generates. It is optimizing for helpfulness, not safety.

Over-privileged execution. For NL2SQL to work, the database connection needs SELECT access to the tables the user might ask about. In practice, this often means broad read access across the schema. In worse cases, the connection has write access because the development team used the same credentials for everything. A single malicious or malformed query can read data the user should not see or modify data they should not touch.

Lack of validation. Most NL2SQL implementations execute the generated SQL directly. There is no intermediate step that checks whether the query is safe, whether it accesses only authorized tables, whether it includes a WHERE clause (preventing full-table scans), or whether it contains DDL statements. The query goes from the LLM to the database with nothing in between.

Performance and cost. LLMs have no concept of query performance. They will happily generate a SELECT * with multiple JOINs across billion-row tables, a Cartesian product, or a recursive CTE that runs for hours. In cloud data warehouses with usage-based pricing, a single bad query can cost thousands of dollars. Without query cost estimation and execution limits, you are giving a nondeterministic system the ability to run arbitrarily expensive operations.

Traditional Architecture Lessons

These are not new problems. Application developers solved them years ago with a consistent set of principles:

No dynamic query execution. Production applications do not construct SQL from untrusted input. They use parameterized queries, ORMs, or stored procedures. The query structure is defined at development time. Only the parameter values come from runtime input.

Parameterization. Instead of interpolating values into a query string, parameters are passed separately and handled by the database driver. This makes injection structurally impossible because the parameter values can never be interpreted as SQL syntax.

Controlled access. Application database users have the minimum permissions required. A reporting service gets SELECT on specific views. A transaction service gets INSERT/UPDATE on specific tables. No application service gets DROP TABLE privileges.

LLM-generated SQL violates all three of these principles simultaneously.

LLMs Should Generate Intent, Not Executable SQL

The bottom line is architectural: LLMs are excellent at understanding user intent and terrible at generating safe, performant, authorized database queries. The correct design uses the LLM for what it is good at (natural language understanding) and uses proven system design patterns for what they are good at (safe data access).

Every time you see an architecture where an LLM generates SQL that is executed against a production database, you are looking at a system that has collapsed the separation between intent and execution. It will fail. The only questions are when and how badly.

The overall architecture is an intent-driven query system:

The user states what they want in natural language
The LLM parses this into a structured intent (JSON, not SQL)
The intent is validated against policies
The system maps the intent to a predefined, parameterized query or API call
The query executes with scoped credentials
Results are filtered based on the user’s authorization level
The LLM formats the results for the user

At no point does the LLM generate executable code. At no point does untrusted output reach the database without validation. The system maintains full control over what queries run, with what privileges, and against what data.

Tools like Protecto can add another layer to this architecture by applying data-level protections to the query results, automatically masking or tokenizing sensitive fields before they reach the LLM. This ensures that even authorized queries do not expose more data than necessary.

Design Patterns: Controlled Query Interfaces

The correct architecture separates intent from execution through a controlled interface layer. There are three patterns that work, and they can be combined.

Pattern A: Predefined Query Templates

Instead of generating SQL, the LLM outputs a structured JSON object that represents the user’s intent. The system maps this intent to a predefined, parameterized query.

User asks:

"What are the details for customer 12345?"

LLM outputs:

{
  "intent": "get_customer_details",
  "parameters": 
{
  "customer_id": "12345",
  "fields": ["name", "email", "account_status"]
  }
}

The backend maps get_customer_details to a predefined query template:

SELECT name, email, account_status
FROM customers
WHERE customer_id = $1

The LLM never sees or generates SQL. It generates intent. The system translates intent to a safe, parameterized query. The set of available intents is finite and defined at development time. New query types require a developer to add a new template. This is a feature, not a limitation.

Pattern B: Abstraction Layer via MCP or API

In MCP-based systems, the LLM calls a tool rather than generating SQL. The tool’s implementation handles query construction, validation, and access control internally.

The LLM has no direct database access. It interacts through a well-defined API surface that enforces business rules. This is the same pattern used by every well-designed web application: the frontend does not talk to the database. It talks to an API that talks to the database.

Pattern C: Stored Procedures and Functions

For organizations with mature database infrastructure, stored procedures provide another layer of control. The MCP tool or API calls a stored procedure rather than executing inline SQL:

CALL get_customer_summary($customer_id, $requesting_role);

The stored procedure encapsulates the query logic, applies row-level security, and returns only authorized data. The execution path is fully controlled by the database team, not by the LLM.

Practical Enhancements

Beyond the core architecture, several operational enhancements improve safety and observability:

Least privilege at the database layer. Even with a controlled query interface, the database credentials used by MCP tools should follow least privilege. Use separate database roles for different MCP tool categories (read-only reporting vs. customer lookup vs. write operations). Apply row-level security to restrict which records a given session can access, and column-level restrictions to prevent access to sensitive fields (SSN, payment data) unless explicitly authorized. No MCP-connected role should have DDL permissions (CREATE, ALTER, DROP). Set query execution time limits and resource governors to prevent runaway queries. The database should be the last line of defense, not the only one.

Query simulation (dry-run mode). Before executing a query derived from an LLM intent, run an EXPLAIN to estimate cost and scan volume. If the estimated cost exceeds a threshold, reject the query or route it for human approval. This prevents the $50,000 query that scans every row in your data warehouse.

Rate limiting and budgets. Set per-session and per-user limits on query count, total scan volume, and estimated cost. An AI agent that suddenly starts issuing hundreds of queries per minute is either malfunctioning or being exploited.

Logging and anomaly detection. Log every intent, every generated query, every execution result, and every validation decision. Build anomaly detection on top of this: flag unusual query patterns, unexpected data access, or authorization failures. This is your audit trail for compliance and your early warning system for incidents.

Circuit breakers. If a particular intent pattern starts generating errors or expensive queries, automatically disable it and alert the operations team. This limits blast radius during novel failure modes.

AI Should Operate Within Controlled System Boundaries

The broader principle extends beyond SQL. AI systems, whether connected via MCP, custom APIs, or direct integrations, should operate within controlled boundaries defined by the system architecture, not by the model’s capabilities.

The model can do a lot of things. The system should let it do only the things that are safe, authorized, and auditable. That is not a limitation of AI. That is good engineering.

Stop letting AI run unsafe queries.

Design systems where LLMs understand intent—not execute code. Start building secure, production-ready AI data architectures today.

Amar Kanagaraj

Founder and CEO of Protecto

Amar Kanagaraj is the Founder and CEO of Protecto, a company focused on securing enterprise data for LLMs, AI agents, and agentic workflows. He is a second-time entrepreneur with 20+ years of experience across engineering, product, AI, go-to-market, and business leadership. Before Protecto, Amar co-founded FileCloud and helped scale it to over $10M ARR as CMO. Earlier in his career, he worked at Sun Microsystems, Booz & Company, and Microsoft Search & AI. He holds an MBA from Carnegie Mellon University and an MS in Computer Science from Louisiana State University.

Why You Shouldn’t Use LLMs to Generate SQL (Security Risks)

Table of Contents