MODULE 03

Advanced Patterns

When a single prompt isn't enough, you reach for orchestration. This module covers the patterns that compose multiple LLM calls into reliable systems: decomposition, reasoning-and-acting loops, and sampling-based self-correction.

3 lessons·~120 minutes

LESSON 3.1

Prompt chaining and decomposition

The single most reliable way to make hard tasks tractable: stop asking the model to do everything in one shot. Break the task into steps, and let each step be its own focused prompt. This is prompt chaining — and it's the unsexy, boring pattern that makes most production systems actually work.

Why decompose?

One-shot prompts get worse as task complexity grows. The model has to hold the full task, the full input, and the full output in attention simultaneously. By the time it's three steps in, it's "forgotten" the original constraints.

Decomposing into steps does several things at once:

Each step is simpler and easier for the model to nail
Each step's output becomes structured input to the next, reducing ambiguity
You can validate, retry, or branch on intermediate outputs
You can use cheaper or smaller models for simple steps
Failures are diagnosable — you know which step broke

A concrete example: support ticket triage

Suppose you want to triage a customer support ticket: classify it, extract key fields, generate a draft response, and route it. A naive prompt asks for all four at once. A chained version looks like this:

# Step 1: classify
classification = llm(classify_prompt(ticket))
# → "billing"

# Step 2: extract structured fields (using a billing-specific extractor)
fields = llm(extract_billing_fields_prompt(ticket))
# → {"invoice_id": "INV-1234", "amount_disputed": 49.00, ...}

# Step 3: generate response (using a billing-specific template)
draft = llm(billing_response_prompt(ticket, fields))

# Step 4: route based on classification + extracted urgency
if fields["amount_disputed"] > 1000 or "urgent" in ticket.lower():
    route = "human_specialist"
else:
    route = "auto_send"

Each step has one job. Each step is easy to evaluate independently. Each step can be improved without touching the others.

Patterns of decomposition

A few decomposition shapes show up over and over:

Sequential chain. Step 1 → Step 2 → Step 3, each consuming the previous output. Use when later steps genuinely depend on earlier ones.

Map-reduce. Run the same prompt over many inputs in parallel (the map), then aggregate (the reduce). Classic for summarizing long documents: chunk, summarize each chunk, then summarize the summaries.

Router. A first prompt classifies the input and routes to one of several specialized prompts. Cheaper than a single mega-prompt that tries to handle everything.

Verifier loop. Run a generation prompt, then a separate verification prompt that checks the output against criteria. Retry or revise on failure.

The "let one model write the plan, another execute" pattern

For complex multi-step tasks, a common pattern is to first ask the model to plan the approach, then execute the plan step by step.

# Planning prompt
You are given a research question. Produce a plan as a JSON list of
steps. Each step is one of: search, read, synthesize, answer.

Question: "{user_question}"

Plan:

Then a separate execution loop carries out each step, possibly with tool use. This separation makes the plan inspectable and editable — and makes it possible to use a stronger model for planning and a cheaper one for execution.

When to chain vs. when to use one prompt

Chaining adds latency, cost, and orchestration complexity. Don't reach for it reflexively. Some heuristics:

Use one prompt	Use a chain
Task fits comfortably in working memory	Task has 4+ distinct sub-tasks
Output is simple/structured	Output requires intermediate validation
You don't need to inspect intermediate state	You want to retry or branch on intermediate state
Latency is critical	Latency is acceptable; quality matters more

Heuristic: Start with one prompt. Decompose only when you observe failures that decomposition would specifically fix. Premature chaining is the LLM equivalent of premature abstraction.

State management

Once you have multiple LLM calls in a row, you have state to manage: prior outputs, accumulated context, error states, retries. This is where prompt engineering bleeds into systems engineering. The complexity is real, and tooling exists for it (LangChain, DSPy, custom orchestration), but the principles are unchanged from any other distributed system: keep state explicit, validate at boundaries, log everything for debugging.

LESSON 3.2

ReAct: reasoning + acting loops

So far the model has only generated text. ReAct is the pattern that lets it do things — call APIs, query databases, search the web — and incorporate results back into its reasoning. It's the foundation of every modern agent.

The pattern

ReAct interleaves three kinds of output:

Thought: the model's reasoning about what to do next
Action: a tool call with arguments
Observation: the result of the tool call (added to context)

Then the model loops: another thought, another action, another observation, until it produces a final answer.

Thought: I need to know the current weather in Boston.
Action: get_weather(city="Boston")
Observation: 42°F, light rain.

Thought: The user asked if they should bring an umbrella. Yes.
Action: respond("Yes, bring an umbrella — light rain expected, 42°F.")

The model is grounding its reasoning in real data, not just its training. This is how you get from "the model invents an answer" to "the model produces a verified one."

Tool use / function calling

Modern APIs (Anthropic, OpenAI, others) support tool use natively. You declare a set of tools with name, description, and input schema. The model decides when to call them and emits structured tool calls. Your code executes the tool and feeds the result back. Repeat.

# Tool declaration (Anthropic-style)
tools = [{
  "name": "get_weather",
  "description": "Get current weather for a city.",
  "input_schema": {
    "type": "object",
    "properties": {
      "city": {"type": "string", "description": "City name"}
    },
    "required": ["city"]
  }
}]

# Loop
messages = [{"role": "user", "content": user_query}]
while True:
    resp = client.messages.create(model="claude-...", tools=tools, messages=messages)
    messages.append({"role": "assistant", "content": resp.content})
    if resp.stop_reason == "end_turn":
        break
    for block in resp.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            messages.append({
              "role": "user",
              "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}]
            })

Designing tools well

The single biggest predictor of agent quality is tool design, not prompt phrasing. Bad tools = brittle agent. Some principles:

Each tool does one thing. A tool named do_database_stuff with a free-text query parameter is a coin flip. Splitting it into find_customer_by_email, get_recent_orders, flag_account_for_review is reliable.
Descriptions are prompts. The tool description is read by the model and influences when it's called. Write descriptions like prompts: precise, with examples of when to use vs. when not to.
Input schemas constrain. Use enums where possible. A category field with five enum values is far more reliable than a free-text field.
Errors are signals. When a tool fails, return a clear error message. The model can read it, reason about what went wrong, and try again.
Idempotence matters. Agents retry. Make safe tools idempotent so retries don't double-charge a customer.

Loop control: stopping conditions

An agent without a stopping condition is a runaway loop. Build in:

Max iterations. Hard cap (e.g., 10 tool calls) to prevent infinite loops.
Token budget. Cumulative token limit across the conversation.
Wall-clock timeout. The whole loop must complete in N seconds.
Final answer detection. The model produces a tool-free response or calls a designated respond_to_user tool.

Failure modes

Agents fail in characteristic ways:

Tool thrash: the model alternates between two tools without making progress. Mitigation: track repeated calls and break the loop.
Hallucinated tool calls: the model invents a tool that doesn't exist. Mitigation: validate against the declared schema; return a clear error.
Premature termination: the model gives up before completing the task. Mitigation: explicit completion criteria in the system prompt.
Over-eager tool use: the model calls tools when it could answer from its own knowledge. Mitigation: in the tool description, specify when not to use it.

Tool use is a security boundary. An agent that executes shell commands or makes API calls based on model output is exposing those capabilities to anyone who can shape the input — including attackers. We'll cover defenses in Module 6.

When ReAct is overkill

If your task only needs one tool call, you don't need ReAct — you need function calling. Don't build an agent loop when a single round-trip suffices. ReAct shines when the model genuinely needs to interleave reasoning and action over multiple steps with branching paths.

LESSON 3.3

Self-consistency, tree-of-thoughts, and meta-prompting

Three patterns that share a common idea: more samples beats one sample. When stakes are high enough to justify the extra cost, drawing multiple samples and combining them produces dramatically more reliable outputs. This lesson covers three flavors.

Self-consistency: vote across samples

Self-consistency is almost embarrassingly simple. For a task with a discrete answer, run the same prompt multiple times at non-zero temperature, then take the majority vote.

def self_consistent_answer(prompt, n=5, temperature=0.7):
    answers = [llm(prompt, temperature=temperature) for _ in range(n)]
    return most_common(answers)

Why does this work? At temperature 0, you get one path. At temperature 0.7 and N samples, you explore several different reasoning chains. If the majority converge on the same answer, that's strong evidence — like running the same calculation through different proofs and getting the same result. If they disagree, you know the model is uncertain.

Self-consistency works best when:

The task has a discrete answer (yes/no, classification, numeric)
The reasoning chain is non-trivial (so different samples genuinely vary)
Latency and cost can absorb 5–10× the calls

It does not help for tasks where the model is confidently wrong in the same way every time — five samples of the same misconception don't make truth.

Tree-of-thoughts: explore, evaluate, prune

Self-consistency is a flat search: N independent samples. Tree-of-thoughts (ToT) is structured: at each step, generate K candidates, score them, and continue from the best one. It's like beam search applied to reasoning.

function tree_of_thoughts(problem, depth, beam_width):
    state = {"problem": problem, "history": []}
    for step in range(depth):
        candidates = generate_next_steps(state, k=beam_width)
        scored = [(c, score(state, c)) for c in candidates]
        best = top_n(scored, n=1)
        state["history"].append(best)
        if is_solved(state):
            return state
    return state

ToT shines on problems with a search structure: planning, puzzles, multi-step proofs. The cost is real — you're paying for K candidates per step plus a scoring call — so reserve it for cases where one-shot or self-consistency genuinely isn't good enough.

The judge model

Both ToT and many self-consistency variants need a way to score candidate outputs. The trick: use the LLM itself as the judge.

You are evaluating candidate solutions to the following problem.

Problem: {problem}

Candidate A: {candidate_a}
Candidate B: {candidate_b}

Which is more likely to be correct? Respond with "A", "B", or "TIE".
Briefly justify before giving your verdict.

LLM-as-judge has well-known failure modes (positional bias, length bias, sycophancy) that we'll dig into in Module 5. Used carefully, it's a powerful primitive for these multi-sample patterns.

Meta-prompting: prompts that write prompts

Meta-prompting is the practice of using an LLM to generate or improve prompts for another LLM. It sounds recursive and weird; it works.

A simple version: for a new task, ask the model to draft a prompt for it.

I need a prompt that will reliably extract the following fields from
unstructured invoice text:
- invoice_number
- total_amount
- due_date
- vendor_name

The output should be valid JSON. The prompt should handle missing fields
gracefully and never hallucinate values.

Write me the prompt.

This works because the model has seen many prompts in training and knows what they look like. The output is rarely production-ready, but it's a strong starting draft. You then run it on real inputs, observe failures, and iterate (often using the model itself to suggest fixes).

The "rewrite for failures" loop

A more powerful meta-prompting pattern: feed the model your current prompt, an example input it failed on, the wrong output, and the correct output. Ask it to revise the prompt to fix that failure without breaking other cases.

Current prompt:
"""
{current_prompt}
"""

Failed input:
{input}

Wrong output the prompt produced:
{wrong_output}

Correct output:
{correct_output}

Revise the prompt to fix this case. Be minimal — change as little as
possible. Do not over-fit to this single example.

Run this in a loop over your failure set, validating each revision against the full eval. This is essentially gradient descent on prompts.

Knowing when to reach for these

All three of these techniques cost more than a single call. They're worth it when:

You're at the limit of what a single prompt can deliver
Per-call quality matters more than per-call cost
You have budget headroom in the system architecture

For most production systems, the single biggest gains come from the basics in Modules 1–2. These patterns are tools for the long tail — not the first thing to try.

Exercise: Take a prompt you've already iterated on. Run it 10 times at temperature 0.7 on five different inputs. Note how often the answer agrees with itself. That's your self-consistency baseline — and it tells you whether the prompt is reliably right or reliably uncertain.

Module 3 wrap-up

You now have orchestration techniques: chaining for decomposition, ReAct for tool use, and sampling-based methods for high-stakes correctness. Module 4 takes these patterns into production, where the constraints get real: latency budgets, cost ceilings, and the messiness of retrieval.