How to Count Tokens for LLMs: Complete Guide
What Are Tokens?
Tokens are the basic units that large language models use to process text. A token is not a word — it is a chunk of text that the model’s tokenizer has learned to recognize as a meaningful unit. Depending on the word, a single English word might be one token, or it might be split into multiple tokens.
Common English words like “the,” “and,” and “hello” are typically one token each. Longer or less common words get split: “tokenization” becomes [“token”, “ization”] (2 tokens), and “unbelievable” might become [“un”, “believ”, “able”] (3 tokens).
As a rough rule of thumb, 1 token is approximately 0.75 English words, or equivalently, 100 tokens is about 75 words. But this varies significantly by language, content type, and the specific tokenizer used.
Why Token Counts Matter
API Costs
LLM APIs charge per token. If you are building an application that makes thousands of API calls per day, even small differences in token counts per request add up. Knowing your token usage helps you budget accurately and optimize costs.
Context Window Limits
Every model has a maximum context window measured in tokens. GPT-4o supports 128K tokens, Claude 3.5 supports 200K, and Gemini supports up to 1M. If your prompt plus the expected output exceeds the context window, the API will return an error or truncate your input.
Response Quality
Stuffing a context window with unnecessary text can degrade response quality. Models perform better when the relevant information is concise and well-organized. Understanding token counts helps you design efficient prompts.
How Tokenization Works
BPE (Byte Pair Encoding)
Most modern LLMs use Byte Pair Encoding or a variant of it. BPE starts with individual bytes and iteratively merges the most frequent adjacent pairs into new tokens. The result is a vocabulary where common words and subwords are single tokens while rare text is split into smaller pieces.
OpenAI uses a BPE variant called cl100k_base for GPT-4 and GPT-4o models, and o200k_base for newer models. Each tokenizer has its own vocabulary, so the same text produces different token counts with different models.
SentencePiece
Google’s models (Gemini, PaLM) use SentencePiece, which treats the input as a raw byte stream and learns subword units directly. The resulting tokenization can differ significantly from BPE, especially for non-English text.
Claude’s Tokenizer
Anthropic has not published their exact tokenizer, so token counts for Claude models are estimates based on observed behavior. In practice, Claude’s tokenization is similar to OpenAI’s for English text, typically within 5-10% difference.
Counting Tokens by Model
OpenAI Models
OpenAI provides the tiktoken library for exact token counting:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, how are you?")
print(len(tokens)) # 6
In JavaScript, use the js-tiktoken package:
import { encodingForModel } from "js-tiktoken";
const enc = encodingForModel("gpt-4o");
const tokens = enc.encode("Hello, how are you?");
console.log(tokens.length); // 6
Claude Models
Since Anthropic does not provide a public tokenizer, count tokens using the API’s usage response field, or estimate using a BPE tokenizer with a similar vocabulary size. Counts are typically close to OpenAI’s for English text.
Gemini Models
Google provides a countTokens API endpoint that returns exact counts for any text:
model = genai.GenerativeModel("gemini-pro")
result = model.count_tokens("Hello, how are you?")
print(result.total_tokens)
What Counts as Tokens in an API Call
A complete API call’s token usage includes:
- System prompt tokens — Your system instructions
- Conversation history tokens — All previous messages in the chat
- User message tokens — The current user input
- Output tokens — The model’s response
The total context is: system prompt + conversation history + user message + output. All four components count toward the context window limit, and you pay for all of them.
Token Counting Tips
Images
Multimodal models tokenize images differently from text. OpenAI charges based on image dimensions: a 1024x1024 image costs roughly 765 tokens. Smaller images cost fewer tokens.
Code
Code tends to use more tokens per line than prose because variable names, operators, and syntax characters each consume tokens. A 100-line Python file might use 500-800 tokens.
Non-English Text
Languages with non-Latin scripts (Chinese, Japanese, Korean, Arabic) often require more tokens per word because fewer subword units in the vocabulary match those characters. A Chinese text might use 1.5-2x more tokens than an equivalent English text.
JSON and Structured Data
JSON formatting adds significant token overhead. The braces, brackets, quotes, and key names all consume tokens. Consider using more compact formats when token usage matters.
Count Your Tokens Now
Stop guessing and start measuring. The tokencalc Token Counter gives you exact token counts for OpenAI models and accurate estimates for Claude and Gemini. Paste your text, select your model, and see the results instantly, all in your browser, completely free.
For cost projections based on your token usage, try the Pricing Calculator to compare costs across all major providers.