The average production system prompt contains 40–60% redundant tokens. Here's how to find and remove them.
Token bloat is the silent cost multiplier in LLM applications. It doesn't show up as an error. It doesn't cause obvious failures. It just quietly inflates your API bill every time you make a call — usually by 40–60% compared to what the same behavior would cost with a tightly structured prompt.
Here's how to audit your system prompts and what to cut.
Most bloated system prompts share the same anti-patterns. They grew organically — someone added a clarification here, a constraint there, a few examples "just in case" — and nobody ever went back to refactor them.
You are a helpful, knowledgeable, and professional customer support
agent for Acme Corp. As a customer support agent, your job is to help
customers with their questions. You should be helpful and professional
at all times, as this is a customer support context.
The model understood "customer support agent" after the first sentence. Everything after that is paid redundancy. Cut to: You are a customer support agent for Acme Corp.
Never answer questions about competitors. Do not mention competitor
products. If a user asks about a competitor, redirect them. When
discussing competitors, don't provide details. Avoid talking about
other companies' products.
This says the same thing five times. One clear constraint is enough: Do not discuss competitor products. Redirect those questions to our support page.
Important: Always be accurate. Make sure your responses are correct.
If you are unsure, say so. Do not make up information. Be honest
about what you know and don't know. Accuracy is important.
This tells the model things it already does by default. You're paying tokens to state the obvious.
For example, if a user asks "How do I reset my password?" you should
respond "To reset your password, go to Settings > Security > Reset.
If you need further help, contact [email protected]." For example,
if a user asks "What are your hours?" you should respond "We are
open 9am-5pm EST, Monday through Friday."
These examples don't teach the model anything about tone or structure that the instructions don't already cover — they're just burning tokens.
| Prompt type | Before (tokens) | After (tokens) | Reduction |
|---|---|---|---|
| Support agent | 847 | 312 | 63% |
| Code review | 612 | 198 | 68% |
| Document summarizer | 534 | 241 | 55% |
| Email drafter | 723 | 287 | 60% |
Same outputs. Same model. Same behavior. Just without the waste.
Run through each section of your system prompt and ask:
The goal isn't minimal prompts — it's prompts where every token is earning its cost. Some prompts genuinely need 600 tokens. Most don't.
The reason prompts bloat is that they're freeform strings. There's no enforcement mechanism for conciseness, no review process, no way to see the token count before it ships.
YAML prompt templates enforce a different discipline. The schema pushes you to separate role from task from examples. Sections with headers make redundancy visible. And because the file lives in git, changes are reviewable:
name: support-agent
description: Customer support agent for Acme Corp
variables:
- name: question
required: true
body: |
## Role
Customer support agent for Acme Corp.
## Constraints
- Do not discuss competitor products
- Escalate billing issues to [email protected]
- If unsure, say so and offer to escalate
## Question
{{question}}
This template is 312 tokens vs the 847-token original — and produces identical behavior on real inputs.
Token bloat compounds with volume. A 500-token system prompt sent 50,000 times/day on Claude Sonnet 4 costs roughly $75/day in system prompt tokens alone. Cut that to 200 tokens and you save $45/day — $1,350/month — without changing a single line of application code.
The fastest LLM cost reduction is almost always prompt compression, not model downgrading. A well-structured prompt on a capable model almost always beats a bloated prompt on a cheaper one.
Take your most-called system prompt, paste it into promptctl create, and look at what the structured version looks like. The tool will extract variables, remove redundancy, and organize the remaining content by function. Compare the token counts. That gap is money.
promptctl create "$(cat your-system-prompt.txt)"
# Compare token counts: promptctl info your-prompt
Paste it into the browser tool and see the structured version in under 30 seconds.
Try in browser Read: 55–71% cost reduction