AI & Tech Costs

Cost Per Million Tokens: What It Really Means for Your App

LLM API providers quote prices per million tokens. Translating that into actual monthly cost requires understanding what counts as a token.

Every major large-language-model provider — OpenAI, Anthropic, Google, and the open-source hosting services — prices their API the same way: dollars per million tokens of input, and a higher rate per million tokens of output. The pricing looks simple in the documentation. Translating it into a monthly bill for a real application takes some unobvious arithmetic.

What is a token, actually?

A token is roughly three-quarters of an English word. The phrase "Return Measure builds zero-friction calculators" is 6 words and roughly 8 tokens. A 500-word email is about 650 tokens. A typical web page worth of context is 1,500 to 3,000 tokens. A whole book chapter is 10,000 to 20,000 tokens.

For non-English languages, the ratio is different — sometimes much worse. Languages with rich morphology (German, Russian, Korean) often produce 30 to 50 percent more tokens per word than English. For some Asian languages, individual characters can be one or more tokens each, which dramatically inflates the cost.

Input versus output pricing

Output tokens are typically 3 to 5 times more expensive than input tokens. This matters because of how most LLM applications are structured: a long system prompt and conversation history goes in (lots of input tokens), and a short response comes out (few output tokens). Engineers new to LLM pricing often assume input cost dominates because there is so much more of it. In a typical chat application, that turns out to be roughly correct — but in a summarization or generation app, output cost dominates quickly.

Building a real monthly estimate

To estimate a monthly LLM bill honestly, you need four numbers:

The average input tokens per request (system prompt plus any retrieved context plus the user message).
The average output tokens per request.
The number of requests per active user per month.
The number of active users.

For example: a customer-support bot with a 2,000-token system prompt, an average 200-token user question, 400 tokens of output, 30 conversations per active user per month, and 5,000 active users. At GPT-4-class pricing (roughly $2.50 per million input tokens and $10 per million output tokens), the monthly cost is approximately $1,000.

Change two assumptions — bump the system prompt to 4,000 tokens because of a richer knowledge base, and add a second turn of conversation per session — and the same app costs roughly $2,700 a month.

The costs nobody mentions

Failed and retried requests. Production apps retry on timeouts and errors. Plan on 5 to 15 percent overhead for retries.
Streaming partial completions. If users abandon a streaming response halfway, you still pay for the tokens that were generated before they left.
Vector embeddings. If your app uses retrieval-augmented generation, every document you ingest costs embedding tokens, and every query costs another embedding plus the LLM call.
Prompt caching credits. Several providers now offer steep discounts for repeated input. If your system prompt is 80 percent of your input tokens, prompt caching can cut the bill by half.

The levers that actually cut the bill

Once you have an honest estimate, the largest savings almost always come from a handful of structural changes rather than from negotiating a lower per-token rate. Model tiering is the biggest one: route easy requests to a small, cheap model and reserve the expensive flagship model for the requests that genuinely need it. In many applications, eighty percent of traffic can be served by a model that costs a tenth as much, which alone can cut the bill by half or more. Prompt trimming is the second lever — every unnecessary sentence in a system prompt is paid for on every single request, forever, so a prompt that is twice as long as it needs to be doubles that portion of the cost.

Why per-request cost matters more than total spend

When you are deciding whether an AI feature is viable as a business, the number that matters is the cost per request compared to the revenue per request, not the total monthly bill. A feature that costs two cents per request and is used by a customer paying a flat monthly fee can quietly become unprofitable if a small number of heavy users each generate thousands of requests. Modeling the cost per active user, and then the cost of your heaviest ten percent of users specifically, tells you whether you need usage limits, a metered pricing tier, or a cheaper model before the feature ever ships.

Our AI Compute Cost Calculator models monthly LLM API spend across the major providers using these inputs.

Related tool

AI Compute Cost Calculator →

← Back to Learn