Token waste is rarely in the content of your prompts — it's in their structure. Here's how to fix it systematically.
If you're calling Claude, GPT-4o, or Gemini in production, your API bill is probably 2–3× higher than it needs to be. Not because you're asking the wrong questions — but because of how you're sending them.
After analyzing prompt logs across dozens of LLM-backed workflows, the same patterns appear almost every time. This post breaks down where the waste comes from and how structured prompt templates fix it.
Most LLM API token waste comes from four sources:
{{customer_name}}, you inline the full sentence around it each time.None of these are hard to fix. They just require treating prompts as structured data rather than freeform strings.
A structured prompt separates the invariant parts (role, instructions, examples) from the variable parts (the actual input). Here's a before/after example for a code review prompt:
You are an expert software engineer specializing in Go and TypeScript.
Your job is to review pull request diffs and identify security issues,
performance bottlenecks, and code style violations. Be concise. Only
flag real issues — don't nitpick. Format your response as a list.
Here is the diff to review:
[2,400 token diff pasted here]
name: code-review
description: Review a PR diff for security, perf, and style issues
variables:
- name: diff
description: The git diff to review
required: true
body: |
## Role
Expert software engineer (Go, TypeScript). Review PR diffs.
## Task
Identify: security issues, performance bottlenecks, style violations.
Be concise. Only flag real issues. Format as a list.
## Diff
{{diff}}
When you run promptctl send code-review --var diff="$(git diff HEAD~1)", promptctl substitutes the variable, applies the structure, and sends the minimal token payload. The role and instructions are cached after the first call.
Consider a support automation workflow making 10,000 calls/day with a 500-token system prompt and an average 200-token user message. Using Claude Sonnet 4 pricing ($3/M input tokens):
The more your prompts repeat static content across calls, the larger the savings.
The fastest path to structured prompts is letting a tool generate the initial template from your existing prompt text.
# Install
brew tap prompt-ctl/tap && brew install --cask prompt-ctl/tap/promptctl
# Convert your existing prompt to a structured template
promptctl create "you are a helpful assistant that reviews code..."
# Or describe what you want
promptctl create "code review for Go and TypeScript PRs"
# Send with variables
promptctl send code-review --var diff="$(git diff HEAD~1)"
The generated template is a plain YAML file you can commit to your repo, version-control, and share with your team. Every change is tracked. You can run regression tests against previous outputs with promptctl benchmark.
Structured prompts aren't just cheaper — they produce more consistent results. When you separate role, task, and input into explicit sections:
promptctl benchmark code-review-v1 code-review-v2The prompts that cost the most are almost always the ones nobody owns. They were written once, never refactored, and get pasted into every new integration.
LLM API costs are almost always reducible without changing what the model does — just how you send requests. The key moves:
cache_control, OpenAI cached prefix)The first three steps alone typically cut costs by 50–70%. The fourth prevents new costs from appearing when someone "improves" a prompt.
Paste any prompt idea into the browser tool and get a structured template in under 30 seconds — no install needed.
Try promptctl free Read the docs