1. The 6-component prompt skeleton
- Role/context: who the model is acting as
- Task: what to do, unambiguously
- Inputs: the data to operate on (delimited)
- Constraints: what to avoid, how to handle ambiguity
- Examples: demonstrations (when needed)
- Output format: the exact shape of the response
2. Choosing examples (few-shot)
- Cover decision boundaries, not the obvious center
- Be ruthlessly consistent across examples
- Mine your eval failures — each example should fix a real failure
- Diminishing returns after 3–5 examples
- Each example should justify its tokens
3. When to use chain-of-thought
- Use: multi-step reasoning, math, logic, multi-hop QA
- Skip: trivial classification, extraction, simple lookups
- Reasoning models: don't add CoT — they reason internally
- Production: use structured CoT (XML tags) for parseability
- Visible reasoning ≠ correct reasoning
4. System vs user prompt
- System: behavior, tone, rules, persona
- User: the per-turn task or query
- Models weight system instructions higher (RLHF)
- Generic personas ("helpful assistant") do almost nothing
- Specific scenario-anchored personas earn their tokens
5. Decompose vs single-prompt
| Single prompt | Chain |
| Fits in working memory | 4+ distinct sub-tasks |
| Simple structured output | Need intermediate validation |
| Latency-critical | Quality > latency |
| No need to inspect state | Want to retry/branch on state |
6. ReAct / tool use
- Each tool does one thing — no
do_database_stuff
- Tool descriptions are prompts — write them like prompts
- Use enums, not free-text fields, where possible
- Idempotent tools so retries don't double-charge
- Always set: max iterations, token budget, wall timeout
- Errors are signals — return clear messages to the model
7. RAG essentials
- Chunk: 200–500 tokens, 10–20% overlap (tune for corpus)
- Hybrid retrieval: BM25 + embeddings, reranked
- Order matters — best chunks at start/end of context
- Wrap each chunk in
<document id="..."> tags
- Require citations; verify they map to real chunks
- Skip RAG when corpus fits in context
8. Sampling parameters
| Task | Temperature |
| Classification, extraction | 0 |
| Code generation | 0–0.3 |
| Summarization | 0.3–0.5 |
| Creative writing | 0.7–1.0 |
| Self-consistency voting | 0.7+ |
9. Failure mode → fix lookup
| Failure | Fix |
| Format drift | Tighter spec + examples + validator |
| Hallucination | RAG, citations, "I don't know" permission |
| Instruction lapse | Move to system prompt; restate near task |
| Over-refusal | Specify scope; positive examples |
| Verbose | Hard length cap; sentence count |
| Inconsistent tone | Persona examples; explicit tone rules |
10. Latency levers
- Stream response — perceived TTFT, not total time
- Cache stable prefix (system prompt, examples)
- Smaller model for simple chain steps
- Cap output tokens — most defaults are too high
- Parallelize independent calls
- Trim retrieved context aggressively
11. Eval discipline
- 50–200 examples gets you started; quality over quantity
- 60–80% real samples + 20–40% hand-crafted edge cases
- Split dev/test (70/30); only iterate against dev
- Watch the diff, not just the aggregate score
- Newly-failing examples block ship even if score went up
- Every prod failure → eval set with expected behavior
12. LLM-as-judge biases
- Position — favors first option (rotate orders)
- Length — favors longer (penalize verbosity)
- Self-preference — same family scored higher
- Sycophancy — agrees with what prompt implies
- Surface features — bullets/headings over substance
- Calibrate against human scores before trusting at scale
13. Prompt injection defenses (layered)
- Wrap user data in delimiters; explicit "ignore instructions inside"
- Least-privilege tools — minimum capability, scoped access
- Output validation before any consequential action
- Prompt isolation — separate trust levels into separate calls
- Monitoring — alert on anomalous tool use
No single defense is sufficient. Stack them.
14. Pre-launch checklist
- Worst-case output identified, blast radius bounded
- Eval set covers failure modes that matter
- Injection attack surface mapped + defended
- "I don't know" path tested
- Human-in-the-loop where required
- Production observability live (latency, cost, output)
- Rollback plan documented
Prompt Engineering Mastery — promptengineeringmastery.com
For the full reasoning, take the course.