Blog

How to Reduce LLM API Costs by 55–71%
with Structured Prompts

Token waste is rarely in the content of your prompts — it's in their structure. Here's how to fix it systematically.

← All posts
Cost Optimization May 26, 2026 8 min read

If you're calling Claude, GPT-4o, or Gemini in production, your API bill is probably 2–3× higher than it needs to be. Not because you're asking the wrong questions — but because of how you're sending them.

After analyzing prompt logs across dozens of LLM-backed workflows, the same patterns appear almost every time. This post breaks down where the waste comes from and how structured prompt templates fix it.

55–71%
Token reduction
<30s
To structure a prompt
v1.0
Free CLI, open source

Where the tokens go

Most LLM API token waste comes from four sources:

  • Re-explaining context every call. You write "you are a helpful assistant that specializes in..." in every request instead of extracting it into a reusable role definition.
  • Verbose variable embedding. Instead of referencing a variable like {{customer_name}}, you inline the full sentence around it each time.
  • No example deduplication. Few-shot examples are copy-pasted into every call even when the task doesn't change between invocations.
  • No prompt caching. Modern APIs (Claude's prompt caching, OpenAI's cached prefixes) let you cache a static prefix and only pay for the dynamic suffix — but only if your prompt is structured to separate static from dynamic content.

None of these are hard to fix. They just require treating prompts as structured data rather than freeform strings.

What structured prompts look like

A structured prompt separates the invariant parts (role, instructions, examples) from the variable parts (the actual input). Here's a before/after example for a code review prompt:

Before: inline string

You are an expert software engineer specializing in Go and TypeScript.
Your job is to review pull request diffs and identify security issues,
performance bottlenecks, and code style violations. Be concise. Only
flag real issues — don't nitpick. Format your response as a list.

Here is the diff to review:

[2,400 token diff pasted here]

After: YAML template

name: code-review
description: Review a PR diff for security, perf, and style issues
variables:
  - name: diff
    description: The git diff to review
    required: true
body: |
  ## Role
  Expert software engineer (Go, TypeScript). Review PR diffs.

  ## Task
  Identify: security issues, performance bottlenecks, style violations.
  Be concise. Only flag real issues. Format as a list.

  ## Diff
  {{diff}}

When you run promptctl send code-review --var diff="$(git diff HEAD~1)", promptctl substitutes the variable, applies the structure, and sends the minimal token payload. The role and instructions are cached after the first call.

The math on token savings

Consider a support automation workflow making 10,000 calls/day with a 500-token system prompt and an average 200-token user message. Using Claude Sonnet 4 pricing ($3/M input tokens):

  • Without structure: 7,000,000 tokens/day × $3/M = $21/day
  • With prompt caching (static prefix cached after first call): Only the 200-token variable portion is billed at full rate, the 500-token prefix at cache-hit rate (~10% of input price) → $5.40/day
  • Savings: 74%

The more your prompts repeat static content across calls, the larger the savings.

How to get started

The fastest path to structured prompts is letting a tool generate the initial template from your existing prompt text.

# Install
brew tap prompt-ctl/tap && brew install --cask prompt-ctl/tap/promptctl

# Convert your existing prompt to a structured template
promptctl create "you are a helpful assistant that reviews code..."

# Or describe what you want
promptctl create "code review for Go and TypeScript PRs"

# Send with variables
promptctl send code-review --var diff="$(git diff HEAD~1)"

The generated template is a plain YAML file you can commit to your repo, version-control, and share with your team. Every change is tracked. You can run regression tests against previous outputs with promptctl benchmark.

Beyond cost: why structure matters for quality too

Structured prompts aren't just cheaper — they produce more consistent results. When you separate role, task, and input into explicit sections:

  • The model attends to each part correctly (role instructions don't bleed into task context)
  • You can tune individual sections without rewriting the whole prompt
  • You can run A/B benchmarks: promptctl benchmark code-review-v1 code-review-v2
  • Team members can read and modify prompts like code, not magic strings

The prompts that cost the most are almost always the ones nobody owns. They were written once, never refactored, and get pasted into every new integration.

Summary

LLM API costs are almost always reducible without changing what the model does — just how you send requests. The key moves:

  1. Extract static context (role, instructions) into a reusable template
  2. Parameterize the dynamic parts with named variables
  3. Enable prompt caching (Claude's cache_control, OpenAI cached prefix)
  4. Version your templates in git and run regressions before deploying changes

The first three steps alone typically cut costs by 50–70%. The fourth prevents new costs from appearing when someone "improves" a prompt.

Try it on your own prompt

Paste any prompt idea into the browser tool and get a structured template in under 30 seconds — no install needed.

Try promptctl free    Read the docs