The Code-Then-Execute Pattern for LLM Agents

Introduction

This post is one of a series based on the paper Design Patterns for Securing LLM Agents against Prompt Injections by researchers at organisations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft. It looks at how the The Code-Then-Execute Pattern for LLM Agents helps defend LLM applications from prompt injection attacks.

This is a living document. Future updates will add refined code samples, specific attack scenarios the pattern mitigates, and concise explanations of each defense.

Read the full paper on arXiv.

LLMs introduce new security vulnerabilities

LLMs open up security vulnerabilities that traditional AppSec frameworks are ill-equipped to address.

The “code-then-execute” pattern is a version of the dual LLM pattern that takes the idea of separating planning from execution (plan-then-execute pattern) to the next level by having an LLM agent write a formal program to solve a task. This program is executed by a separate, secure runtime, providing a clear representation of the agent’s intended actions.

How The Code-Then-Execute Pattern Works

The workflow is:

Code Generation: The main LLM agent receives a task in natural language and generates an executable piece of code.
Code Execution: The generated program is executed by a sandboxed interpreter that manages tool calls and data flows.

Use Case: The Email & Calendar Assistant

For the task “send today’s schedule to my boss John Doe”, the agent writes an explicit program before reading any untrusted data:

x = calendar.read(today);
x = LLM("format this data", x);
email.write(x, "john.doe@company.com");

Security Analysis

Risk Reduction: Generating an explicit program provides a high degree of control flow integrity.
Limitations: Data‑only attacks are still possible; a malicious calendar event could still manipulate the content of the outgoing email.

Conceptual Code Example

The snippet below attempts to align with the pattern as described in the paper. It is purely illustrative and has not been reviewed in detail. Do not treat it as production code.

Controller: runs BEFORE any untrusted data is touched. It writes a small program that can call only vetted tool wrappers, then runs that program in a locked-down environment.


# ---- toy “tools” the agent is allowed to call ----
def calendar_read(day: str) -> str:
    """Return calendar text for given day (could contain attacker text)."""
    return "09:00 – Team sync\n18:00 – Yoga class"

def llm_format(prompt: str, text: str) -> str:
    """Unprivileged LLM that can only format text, no tool access."""
    return f"{prompt}:\n{text.replace(' – ', ': ')}"

def email_write(recipient: str, body: str) -> None:
    """Side-effecting tool. In practice this would send an email."""
    print(f"Sending to {recipient}:\n{body}\n")

# ---- 1. privileged LLM writes a program (no untrusted data yet) ----
generated_program = '''
schedule = calendar_read("2025-06-16")
body     = llm_format("Plain summary for John", schedule)
email_write("john.doe@company.com", body)
'''

# ---- 2. controller executes the program in a sandbox ----
sandbox_globals = {
    "calendar_read": calendar_read,
    "llm_format":    llm_format,
    "email_write":   email_write,
    "__builtins__":  {}          # remove all other built-ins
}
exec(generated_program, sandbox_globals)

Why this aligns with the pattern

Planning is frozen first. The privileged LLM outputs the generated_program before it sees any calendar text. The list and order of tool calls cannot change later .
Execution is separate. Only after the program is fixed do we call exec to run it. Any malicious content inside the calendar entry can change email body text but cannot add new tool calls or alter control-flow, matching the paper’s example where a calendar injection could not trigger extra actions .
Sandbox limits damage. The program’s global scope exposes only the three whitelisted wrappers and strips built-ins, so it cannot import modules, open files, or spawn shells, following the paper’s principle of “no consequential actions beyond the plan”

How the code would behave

Sending to john.doe@company.com:
Plain summary for John:
09:00: Team sync
18:00: Yoga class

Even if an attacker hid “DELETE ALL FILES” inside a calendar entry, that text would appear only in the email body; the program still calls nothing except email_write.

Key take-aways

Separate code generation (privileged, no untrusted data) from code execution (unprivileged).
Expose only the exact tools needed.