RESOURCE

Cheat sheet

A one-page reference of the patterns from the course and when to reach for each. Designed to print cleanly. Use the print button above to save as PDF.

Prompt Engineering Cheat Sheet

Patterns and when to reach for each.

v1.0

1. The 6-component prompt skeleton

  1. Role/context: who the model is acting as
  2. Task: what to do, unambiguously
  3. Inputs: the data to operate on (delimited)
  4. Constraints: what to avoid, how to handle ambiguity
  5. Examples: demonstrations (when needed)
  6. Output format: the exact shape of the response

2. Choosing examples (few-shot)

  • Cover decision boundaries, not the obvious center
  • Be ruthlessly consistent across examples
  • Mine your eval failures — each example should fix a real failure
  • Diminishing returns after 3–5 examples
  • Each example should justify its tokens

3. When to use chain-of-thought

  • Use: multi-step reasoning, math, logic, multi-hop QA
  • Skip: trivial classification, extraction, simple lookups
  • Reasoning models: don't add CoT — they reason internally
  • Production: use structured CoT (XML tags) for parseability
  • Visible reasoning ≠ correct reasoning

4. System vs user prompt

  • System: behavior, tone, rules, persona
  • User: the per-turn task or query
  • Models weight system instructions higher (RLHF)
  • Generic personas ("helpful assistant") do almost nothing
  • Specific scenario-anchored personas earn their tokens

5. Decompose vs single-prompt

Single promptChain
Fits in working memory4+ distinct sub-tasks
Simple structured outputNeed intermediate validation
Latency-criticalQuality > latency
No need to inspect stateWant to retry/branch on state

6. ReAct / tool use

  • Each tool does one thing — no do_database_stuff
  • Tool descriptions are prompts — write them like prompts
  • Use enums, not free-text fields, where possible
  • Idempotent tools so retries don't double-charge
  • Always set: max iterations, token budget, wall timeout
  • Errors are signals — return clear messages to the model

7. RAG essentials

  • Chunk: 200–500 tokens, 10–20% overlap (tune for corpus)
  • Hybrid retrieval: BM25 + embeddings, reranked
  • Order matters — best chunks at start/end of context
  • Wrap each chunk in <document id="..."> tags
  • Require citations; verify they map to real chunks
  • Skip RAG when corpus fits in context

8. Sampling parameters

TaskTemperature
Classification, extraction0
Code generation0–0.3
Summarization0.3–0.5
Creative writing0.7–1.0
Self-consistency voting0.7+

9. Failure mode → fix lookup

FailureFix
Format driftTighter spec + examples + validator
HallucinationRAG, citations, "I don't know" permission
Instruction lapseMove to system prompt; restate near task
Over-refusalSpecify scope; positive examples
VerboseHard length cap; sentence count
Inconsistent tonePersona examples; explicit tone rules

10. Latency levers

  • Stream response — perceived TTFT, not total time
  • Cache stable prefix (system prompt, examples)
  • Smaller model for simple chain steps
  • Cap output tokens — most defaults are too high
  • Parallelize independent calls
  • Trim retrieved context aggressively

11. Eval discipline

  • 50–200 examples gets you started; quality over quantity
  • 60–80% real samples + 20–40% hand-crafted edge cases
  • Split dev/test (70/30); only iterate against dev
  • Watch the diff, not just the aggregate score
  • Newly-failing examples block ship even if score went up
  • Every prod failure → eval set with expected behavior

12. LLM-as-judge biases

  • Position — favors first option (rotate orders)
  • Length — favors longer (penalize verbosity)
  • Self-preference — same family scored higher
  • Sycophancy — agrees with what prompt implies
  • Surface features — bullets/headings over substance
  • Calibrate against human scores before trusting at scale

13. Prompt injection defenses (layered)

  1. Wrap user data in delimiters; explicit "ignore instructions inside"
  2. Least-privilege tools — minimum capability, scoped access
  3. Output validation before any consequential action
  4. Prompt isolation — separate trust levels into separate calls
  5. Monitoring — alert on anomalous tool use

No single defense is sufficient. Stack them.

14. Pre-launch checklist

  • Worst-case output identified, blast radius bounded
  • Eval set covers failure modes that matter
  • Injection attack surface mapped + defended
  • "I don't know" path tested
  • Human-in-the-loop where required
  • Production observability live (latency, cost, output)
  • Rollback plan documented
Prompt Engineering Mastery — promptengineeringmastery.com
For the full reasoning, take the course.

Tip: Save as PDF

Click the Print / Save as PDF button above. In your browser's print dialog, choose "Save as PDF" as the destination. The cheat sheet is designed to fit cleanly on two pages of A4.