System Prompt Token Bloat: Why Your LLM Calls Cost More Than They Should

Cost Optimization May 26, 2026 6 min read

Token bloat is the silent cost multiplier in LLM applications. It doesn't show up as an error. It doesn't cause obvious failures. It just quietly inflates your API bill every time you make a call — usually by 40–60% compared to what the same behavior would cost with a tightly structured prompt.

Here's how to audit your system prompts and what to cut.

The anatomy of a bloated system prompt

Most bloated system prompts share the same anti-patterns. They grew organically — someone added a clarification here, a constraint there, a few examples "just in case" — and nobody ever went back to refactor them.

Anti-pattern 1: Redundant role restatement

You are a helpful, knowledgeable, and professional customer support
agent for Acme Corp. As a customer support agent, your job is to help
customers with their questions. You should be helpful and professional
at all times, as this is a customer support context.

The model understood "customer support agent" after the first sentence. Everything after that is paid redundancy. Cut to: You are a customer support agent for Acme Corp.

Anti-pattern 2: Over-specified constraints

Never answer questions about competitors. Do not mention competitor
products. If a user asks about a competitor, redirect them. When
discussing competitors, don't provide details. Avoid talking about
other companies' products.

This says the same thing five times. One clear constraint is enough: Do not discuss competitor products. Redirect those questions to our support page.

Anti-pattern 3: Defensive padding

Important: Always be accurate. Make sure your responses are correct.
If you are unsure, say so. Do not make up information. Be honest
about what you know and don't know. Accuracy is important.

This tells the model things it already does by default. You're paying tokens to state the obvious.

Anti-pattern 4: Inline examples that belong in the template

For example, if a user asks "How do I reset my password?" you should
respond "To reset your password, go to Settings > Security > Reset.
If you need further help, contact [email protected]." For example,
if a user asks "What are your hours?" you should respond "We are
open 9am-5pm EST, Monday through Friday."

These examples don't teach the model anything about tone or structure that the instructions don't already cover — they're just burning tokens.

A real before/after audit

Prompt type	Before (tokens)	After (tokens)	Reduction
Support agent	847	312	63%
Code review	612	198	68%
Document summarizer	534	241	55%
Email drafter	723	287	60%

Same outputs. Same model. Same behavior. Just without the waste.

How to audit your own prompts

Run through each section of your system prompt and ask:

Does this sentence constrain behavior the model doesn't already default to? If no, cut it.
Is this said somewhere else in the prompt? If yes, cut the duplicate.
Would removing this change the model's output on any real input? Test it before assuming yes.
Is this example teaching format/tone, or just filling space? If the latter, cut it.

The goal isn't minimal prompts — it's prompts where every token is earning its cost. Some prompts genuinely need 600 tokens. Most don't.

Structure helps enforce this discipline

The reason prompts bloat is that they're freeform strings. There's no enforcement mechanism for conciseness, no review process, no way to see the token count before it ships.

YAML prompt templates enforce a different discipline. The schema pushes you to separate role from task from examples. Sections with headers make redundancy visible. And because the file lives in git, changes are reviewable:

name: support-agent
description: Customer support agent for Acme Corp
variables:
  - name: question
    required: true
body: |
  ## Role
  Customer support agent for Acme Corp.

  ## Constraints
  - Do not discuss competitor products
  - Escalate billing issues to [email protected]
  - If unsure, say so and offer to escalate

  ## Question
  {{question}}

This template is 312 tokens vs the 847-token original — and produces identical behavior on real inputs.

The compounding effect at scale

Token bloat compounds with volume. A 500-token system prompt sent 50,000 times/day on Claude Sonnet 4 costs roughly $75/day in system prompt tokens alone. Cut that to 200 tokens and you save $45/day — $1,350/month — without changing a single line of application code.

The fastest LLM cost reduction is almost always prompt compression, not model downgrading. A well-structured prompt on a capable model almost always beats a bloated prompt on a cheaper one.

Where to start

Take your most-called system prompt, paste it into promptctl create, and look at what the structured version looks like. The tool will extract variables, remove redundancy, and organize the remaining content by function. Compare the token counts. That gap is money.

promptctl create "$(cat your-system-prompt.txt)"
# Compare token counts: promptctl info your-prompt

Audit your system prompt now

Paste it into the browser tool and see the structured version in under 30 seconds.

Try in browser Read: 55–71% cost reduction

System Prompt Token Bloat:Why Your LLM Calls Cost More Than They Should