Large language model API costs scale with two numbers: input tokens (what you send to the model) and output tokens (what it sends back). Different providers and tiers price these differently, and the cost of a heavy production workload can easily run into thousands of dollars a month before anyone notices. This tool estimates your monthly spend across the major providers given your prompt volume, average prompt length, and average response length.
A token is roughly four characters of English text, or about three-quarters of a word. Each provider publishes input and output prices per million tokens. The tool takes your daily request volume, multiplies by your average input and output token counts per request, scales to a monthly total, and multiplies by each provider's input and output rates. The result is a comparison grid of estimated monthly cost.
You expect 5,000 user requests per day, each with about 1,500 input tokens (including a 600 token system prompt) and 400 output tokens. That is 7.5 million input tokens and 2 million output tokens daily, or roughly 225 million input and 60 million output per 30-day month. On a model priced at 3 dollars per million input and 15 dollars per million output, that is 675 dollars of input plus 900 dollars of output, or 1,575 dollars per month. Swap to a model at 0.25 dollars input and 1.25 dollars output and the same workload costs 131 dollars per month.
Are token counts the same across providers?
Close, but not identical. Each provider uses its own tokenizer. Differences are usually within five percent for English text but can be larger for code, non-Latin languages, or unusual formatting.
How do I count tokens for an image input?
Vision models price images either by fixed token equivalents per image or by pixel-count tiers. Check the provider's vision pricing page; this tool covers text-only workloads.
Should I budget for retries and failures?
Yes. A safe planning rule is to add 10 to 15 percent to your raw forecast to cover retries, malformed responses, and prompt iteration during development.
Are batch APIs cheaper?
Most providers offer a batch tier at roughly 50 percent of the standard price in exchange for asynchronous turnaround (usually within 24 hours). For non-realtime workloads, the savings are substantial.
What about caching?
Several providers offer prompt caching that reduces input cost on the cached portion by 50 to 90 percent. If your system prompt or context is stable across requests, caching is often the single biggest cost lever.
This page is for general educational information only. It is not financial, tax, legal, or medical advice. Consult a qualified professional before making decisions based on this tool.