Anthropic's Prompt Engineering Best Practices, Distilled

Prefer to watch or listen? ▶ YouTube ♫ Spotify ✈ Telegram

Slide deck — 14 slides

The whole argument compressed into a deck. Use ◂ ▸ or arrow keys.

Slide 1 / 14

Prompt engineering has a tips problem. Every tutorial gives you ten clever tricks and leaves it to you to guess which ones survive contact with production. Twitter threads add another ten. By the time you've read a dozen, you've collected fifty heuristics, conflicts between them, and no idea which ones the model's actual designers consider load-bearing.

Anthropic — the team that trained Claude — has consolidated their guidance into a single reference page. It runs long. Half of it is migration trivia about parameter changes between model versions. The other half is the durable part: the techniques that actually move outputs, organized by the problem each one solves.

This post pulls out the durable half. Every technique here is either in Anthropic's official guidance or in Zack Witten's hour-long Prompt Workshop — both linked at the bottom. No third-party tricks, no folk wisdom, no "I read it on a thread."

Six themes, organized roughly in the order you'd reach for them when a prompt isn't behaving:

Clarity — the prompt actually says what you mean
Structure — Claude can find the parts of your prompt
Examples — show, don't tell
Thinking — let the model reason before answering
Output control — get back exactly the shape you need
Agentic systems — the new chapter, for multi-step tool use

1. Clarity

Be clear and direct

Anthropic's golden rule: show your prompt to a colleague with minimal context on the task and ask them to follow it. If they'd be confused, Claude will be too. Claude is described, in Anthropic's own framing, as "a brilliant but new employee who lacks context on your norms and workflows."

Vague prompt:

Create an analytics dashboard

Clear prompt:

Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation.

The change is not magic. It's two added clauses that close off the model's freedom to under-deliver.

Add context to your instructions

Telling Claude why an instruction matters generalizes better than the instruction alone.

Less effective:

NEVER use ellipses

More effective:

Your response will be read aloud by a text-to-speech engine, so never use ellipses — the engine will not know how to pronounce them.

Once Claude knows the reason, it generalizes. It won't just avoid ...; it'll avoid other characters TTS engines mangle.

Use prescriptive limits, not vague directives

"Be concise" gets you a paragraph. "One to two sentences, never more than three" gets you a sentence or two. Replace soft adjectives with hard constraints whenever you can count something.

Mimic the desired tone in your prompt itself

The prompt's own register leaks into the response. Verbose prompts produce verbose answers. Formal prompts produce academic answers. If you want a clipped operational tone in the output, write the prompt that way.

Write the prompt in the language you know best

Stilted phrasing in a second language quietly degrades outputs. Write the prompt in your strongest language and translate the output if needed.

Avoid negative prompting; tell Claude what to do

"Do not use markdown" is weaker than "Respond in smoothly flowing prose paragraphs." Negation requires the model to hold the forbidden behavior in mind to refuse it — a kind of reverse psychology effect. State the desired behavior positively.

Proofread your prompts

Capitalization and typos do affect output quality. Anthropic calls it "prompt hygiene". Treat the prompt like the production string it is.

2. Structure

Use XML tags to separate the parts of your prompt

This is the single most repeatable upgrade you can make. Claude's training data was full of XML, so it parses tags natively. Once your prompt has more than one "kind" of content — instructions, context, examples, variables — wrap each kind in a tag.

<instructions>
Summarize the article below in three bullet points.
</instructions>

<article>
{{ARTICLE_TEXT}}
</article>

Best practices: use consistent tag names across prompts, nest tags when content has natural hierarchy (<documents> containing multiple <document index="n"> blocks).

For long context, put the data at the top

When you're working with 20k+ tokens of input, place the long documents above your question, instructions, and examples — not below. Anthropic reports up to a 30% quality lift on multi-document tasks just from this reordering. The query and instructions go at the end, closest to where the answer will be generated.

Structure documents with metadata

For multi-document prompts, wrap each document with both its content and its source:

<documents>
  <document index="1">
    <source>annual_report_2023.pdf</source>
    <document_content>
      {{ANNUAL_REPORT}}
    </document_content>
  </document>
  <document index="2">
    <source>competitor_analysis_q2.xlsx</source>
    <document_content>
      {{COMPETITOR_ANALYSIS}}
    </document_content>
  </document>
</documents>

Analyze both documents. Identify strategic advantages and recommend Q3 focus areas.

Ground long-document answers in quotes

For document-heavy tasks where hallucination is a risk, ask Claude to extract relevant quotes first, before doing the actual task. This is the single most effective hallucination mitigator Anthropic ships in their docs:

Find quotes from the patient records relevant to the symptoms. Place them in <quotes> tags. Then, based on those quotes, list information that would help the doctor diagnose the patient. Place your diagnostic information in <info> tags.

The model commits to source material before it can drift, and you get the auditable trail of where each conclusion came from.

Give Claude a role, but keep it in the system prompt

Setting a role focuses tone and behavior. Even a single sentence makes a measurable difference:

system="You are a helpful coding assistant specializing in Python."

Anthropic's guidance: the system prompt is for role and tool definitions; substantive task instructions belong in the user message. This is a shift from older "stuff everything into system" practice.

3. Examples

Use multishot examples

Few-shot (or multishot) prompting — showing 3 to 5 worked examples in the prompt — is one of the most reliable ways to steer output format, tone, and structure. Examples are stronger than instructions when the desired output is hard to describe in words.

When you write examples, make them:

Relevant — mirror your actual use case
Diverse — cover edge cases, vary enough that Claude doesn't latch onto an irrelevant pattern
Structured — wrap them in <example> tags (multiple inside <examples>)

<examples>
  <example>
    <input>The meeting is on Tuesday at 3pm.</input>
    <output>{"day": "Tuesday", "time": "15:00"}</output>
  </example>
  <example>
    <input>Let's sync next Mon morning around 9.</input>
    <output>{"day": "Monday", "time": "09:00"}</output>
  </example>
</examples>

Include negative examples

For subjective tasks — tone evaluation, quality grading, style matching — Claude does better when you show what good looks like and what bad looks like, ideally on the same input. Contrasting pairs about the same document teach the rubric faster than instructions ever will.

Ask Claude to generate or critique your examples

Anthropic explicitly recommends asking Claude to evaluate your example set for relevance and diversity, or to generate additional ones based on what you have. The same model that consumes the examples can audit them.

4. Thinking

Use chain-of-thought before the answer, not after

The order matters. Reasoning that comes before the final answer steers the answer. Reasoning that comes after is post-hoc justification — the answer was already locked in. When you ask for thinking, ask for it first.

Think through the problem step by step inside <thinking> tags, then give your final answer in <answer> tags.

Use adaptive thinking on Claude 4.6 and 4.7

The latest models (Opus 4.6, 4.7, Sonnet 4.6) support adaptive thinking — Claude dynamically decides when and how much to think, calibrated by the effort parameter and the query's complexity.

client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},
    messages=[{"role": "user", "content": "..."}],
)

Anthropic's internal evals say adaptive thinking outperforms manual budget control. If you're still pinning budget_tokens, that path is deprecated.

Calibrate effort by workload

Anthropic's effort levels are not interchangeable:

Effort	Use for
`max`	Hardest intelligence-bound tasks; risk of overthinking
`xhigh`	Best for coding agents and most agentic work
`high`	Default for intelligence-sensitive tasks
`medium`	Cost-sensitive workloads with less reasoning
`low`	Short scoped tasks, latency-sensitive paths

If you see shallow reasoning, raise effort before you reach for prompt tricks. Most "the model is being lazy" reports are actually effort-too-low.

Multishot examples work inside thinking

You can put <thinking> blocks inside your few-shot examples to show Claude the reasoning style you want. It generalizes that style to its own extended-thinking traces.

Ask Claude to self-check before finishing

A reliable error-catcher, especially on code and math:

Before you finish, verify your answer against [test criteria].

Append-only, no extra structure. Catches a lot.

5. Output control

Tell Claude what format you want, not what to avoid

✗ Do not use markdown in your response.
✓ Your response should be composed of smoothly flowing prose paragraphs.

Use XML format indicators in your instruction

You can name the desired wrapper directly:

Write the prose sections in <smoothly_flowing_prose_paragraphs> tags.

The model produces the tags and you parse them out. Cleaner than relying on the model to "just stop using bullets."

Match the prompt style to the desired output style

If your prompt is full of bullet points, your output will be full of bullet points. If you want flowing prose, remove markdown from the prompt itself. The output mirrors the input register more than most people expect.

For structured outputs, use Structured Outputs — not prefill

Starting with Claude 4.6, prefilled assistant responses on the last turn are no longer supported (the API returns 400). The old trick of prefilling { to force JSON is dead. Use the Structured Outputs feature or tool-calling with a schema instead.

For simple "skip the preamble" cases, this in the system prompt is enough:

Respond directly without preamble. Do not start with phrases like "Here is...", "Based on..." etc.

Stop sequences for hard-stop cases

When you absolutely need generation to halt at a token (e.g., closing tag, end-of-JSON), the stop_sequences API parameter is more reliable than asking the model to stop politely.

Tame markdown explicitly if you don't want it

Anthropic's own sample for prose-heavy reports — paste this verbatim and adjust:

<avoid_excessive_markdown_and_bullet_points>
When writing reports, documents, technical explanations, analyses, or any long-form content, write in clear, flowing prose using complete paragraphs and sentences. Use standard paragraph breaks for organization and reserve markdown primarily for `inline code`, code blocks, and simple headings. Avoid **bold** and *italics*.

DO NOT use ordered or unordered lists unless: (a) you're presenting truly discrete items where a list format is the best option, or (b) the user explicitly requests a list or ranking.

Instead of listing items with bullets, incorporate them naturally into sentences.
</avoid_excessive_markdown_and_bullet_points>

Turn off LaTeX if you don't want it

Claude's latest models default to LaTeX for math. Disable explicitly:

Format your response in plain text only. Do not use LaTeX, MathJax, or markup notation such as \( \), $, or \frac{}{}. Write all math expressions using standard text characters.

6. Agentic systems

This is the chapter that didn't exist three years ago. With the latest models running long tool-use loops, a new set of techniques applies.

Be explicit about wanting action, not suggestions

A subtle one. "Can you suggest some changes to improve this function?" gets you suggestions. "Change this function to improve its performance" gets you a changed function. Claude takes "suggest" literally.

For agents that should act by default:

<default_to_action>
By default, implement changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover missing details instead of guessing.
</default_to_action>

For agents that should be conservative:

<do_not_act_before_instructions>
Do not jump into implementation or change files unless clearly instructed. When intent is ambiguous, default to information, research, and recommendations rather than action.
</do_not_act_before_instructions>

Use parallel tool calls

The newest models excel at parallel tool execution and the success rate is already high without prompting. You can push it to ~100% with:

<use_parallel_tool_calls>
If you intend to call multiple tools and there are no dependencies between them, make all of the independent tool calls in parallel. If some tool calls depend on previous calls for their parameters, call them sequentially. Never use placeholders or guess missing parameters.
</use_parallel_tool_calls>

Add reversibility-awareness for risky actions

Without guidance, recent models will sometimes take hard-to-reverse actions. If your agent has destructive capabilities (git, file deletion, external posts), add:

Consider the reversibility and potential impact of your actions. Take local, reversible actions freely, but for actions that are hard to reverse, affect shared systems, or could be destructive, ask the user before proceeding.

Examples that warrant confirmation:
- Destructive: deleting files/branches, dropping tables, rm -rf
- Hard to reverse: git push --force, git reset --hard, amending published commits
- Visible to others: pushing code, commenting on PRs, sending messages

Manage long-horizon work across multiple context windows

The latest models can save state to files and continue across context refreshes. Useful patterns:

A different prompt for the first context window — set up scaffolding (tests, init scripts) the first time, iterate on a todo list thereafter.
Tests in a structured file (e.g. tests.json) — Claude can read and update the file across windows without re-deriving state.
Setup scripts (init.sh) — eliminate the re-discovery tax when starting fresh.
Trust the filesystem over compaction — when a window clears, the latest models reliably re-derive state from disk. Be prescriptive: "Call pwd, review progress.txt and tests.json, run a fundamental integration test before adding new features."

Tell the model its context will be compacted

If your harness compacts context (as Claude Code does), tell the model. Otherwise it tends to wrap up prematurely as it senses the limit approaching.

Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely. Do not stop tasks early due to token budget concerns. As you approach the limit, save current progress to memory before the window refreshes. Complete tasks fully.

Counter overengineering

The newest models will sometimes overengineer — extra files, premature abstractions, hypothetical-future flexibility. The corrective prompt:

Avoid over-engineering. Only make changes that are directly requested or clearly necessary.

- Scope: Don't add features, refactor code, or make "improvements" beyond what was asked.
- Documentation: Don't add docstrings or comments to code you didn't change.
- Defensive coding: Don't add error handling or validation for scenarios that can't happen. Trust internal code; only validate at system boundaries.
- Abstractions: Don't create helpers for one-time operations. Don't design for hypothetical future requirements.

Make hallucinations harder

For agentic coding specifically, anchor the model to actually reading the code:

<investigate_before_answering>
Never speculate about code you have not opened. If the user references a specific file, you MUST read it before answering. Investigate relevant files BEFORE answering questions about the codebase. Give grounded, hallucination-free answers.
</investigate_before_answering>

Chain complex prompts only when you need intermediates

Adaptive thinking and subagent orchestration handle most multi-step reasoning internally. Explicit prompt chaining — breaking a task into sequential API calls — is still useful when you need to inspect intermediate output, log it, or branch on it. The most common pattern is self-correction: generate a draft, review it against criteria, refine. Three API calls, three logged steps.

What the docs don't teach you

A few things experience adds that the official guidance doesn't:

Iterate against evals, not vibes. Anthropic flags this upfront — every technique on this page lands or doesn't depending on whether you're measuring. A four-question rubric run on twenty inputs catches more regressions than a thousand re-reads.
Start broad, then constrain. Most prompts that fail in production fail because they were over-constrained for their training set and brittle elsewhere. Loosen first, then tighten when you see specific failure modes.
Newer models punish over-prompting. Many of the aggressive phrasings older prompts used ("CRITICAL: You MUST use this tool when…") cause overtriggering on Claude 4.6 and 4.7. Dial them back when you migrate.
Keep the system prompt small. Long system prompts can cause models to think more often than you want, inflating thinking-token costs. If you see this, trim the system prompt before reaching for thinking budget caps.

The honest summary

The techniques on this page are not a list of forty clever tricks. They are six families of thing-to-pay-attention-to:

Say what you mean (clarity).
Give the prompt a parseable shape (structure).
Show, don't tell (examples).
Let the model reason first (thinking).
Be explicit about the output shape (output control).
For agents, name the failure modes you don't want (agentic).

Almost every prompt-engineering "tip" you'll see in a Twitter thread is a special case of one of those six. Most of the time, the fix isn't a new technique — it's noticing which family the failing prompt is missing from.

Sources

Anthropic — Prompting best practices: docs.anthropic.com — the canonical reference page, kept current with each model release.
Zack Witten — Building with Anthropic Claude: Prompt Workshop (YouTube, 24 min) — workshop format covering 40 techniques, mostly overlapping with the page above. Worth watching for the live worked examples.
GitHub — anthropics/prompt-eng-interactive-tutorial (repo) — example-filled hands-on companion.

If you read all three, you'll be ahead of about 95% of the "prompt engineering" content on the open web.