How to Count Tokens for LLMs: Complete Guide

AI cost planning gets easier when token counts, model choices, and pricing tradeoffs are visible before you ship. This guide explains how LLM tokenization works, why token counts matter, and how to count tokens accurately for OpenAI, Claude, and Gemini models. Use the linked calculator or reference page to test your own prompt sizes and traffic assumptions.

What Are Tokens?

Tokens are the basic units that large language models use to process text. A token is not a word — it is a chunk of text that the model’s tokenizer has learned to recognize as a meaningful unit. Depending on the word, a single English word might be one token, or it might be split into multiple tokens.

Common English words like “the,” “and,” and “hello” are typically one token each. Longer or less common words get split: “tokenization” becomes [“token”, “ization”] (2 tokens), and “unbelievable” might become [“un”, “believ”, “able”] (3 tokens).

As a rough rule of thumb, 1 token is approximately 0.75 English words, or equivalently, 100 tokens is about 75 words. But this varies significantly by language, content type, and the specific tokenizer used.

Why Token Counts Matter

API Costs

LLM APIs charge per token. If you are building an application that makes thousands of API calls per day, even small differences in token counts per request add up. Knowing your token usage helps you budget accurately and optimize costs.

Context Window Limits

Every model has a maximum context window measured in tokens. GPT-4o supports 128K tokens, Claude 3.5 supports 200K, and Gemini supports up to 1M. If your prompt plus the expected output exceeds the context window, the API will return an error or truncate your input.

Response Quality

Stuffing a context window with unnecessary text can degrade response quality. Models perform better when the relevant information is concise and well-organized. Understanding token counts helps you design efficient prompts.

How Tokenization Works

BPE (Byte Pair Encoding)

Most modern LLMs use Byte Pair Encoding or a variant of it. BPE starts with individual bytes and iteratively merges the most frequent adjacent pairs into new tokens. The result is a vocabulary where common words and subwords are single tokens while rare text is split into smaller pieces.

OpenAI uses a BPE variant called cl100k_base for GPT-4 and GPT-4o models, and o200k_base for newer models. Each tokenizer has its own vocabulary, so the same text produces different token counts with different models.

SentencePiece

Google’s models (Gemini, PaLM) use SentencePiece, which treats the input as a raw byte stream and learns subword units directly. The resulting tokenization can differ significantly from BPE, especially for non-English text.

Claude’s Tokenizer

Anthropic has not published their exact tokenizer, so token counts for Claude models are estimates based on observed behavior. In practice, Claude’s tokenization is similar to OpenAI’s for English text, typically within 5-10% difference.

Counting Tokens by Model

OpenAI Models

OpenAI provides the tiktoken library for exact token counting:

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, how are you?")
print(len(tokens))  # 6

In JavaScript, use the js-tiktoken package:

import { encodingForModel } from "js-tiktoken";

const enc = encodingForModel("gpt-4o");
const tokens = enc.encode("Hello, how are you?");
console.log(tokens.length); // 6

Claude Models

Since Anthropic does not provide a public tokenizer, count tokens using the API’s usage response field, or estimate using a BPE tokenizer with a similar vocabulary size. Counts are typically close to OpenAI’s for English text.

Gemini Models

Google provides a countTokens API endpoint that returns exact counts for any text:

model = genai.GenerativeModel("gemini-pro")
result = model.count_tokens("Hello, how are you?")
print(result.total_tokens)

What Counts as Tokens in an API Call

A complete API call’s token usage includes:

System prompt tokens — Your system instructions
Conversation history tokens — All previous messages in the chat
User message tokens — The current user input
Output tokens — The model’s response

The total context is: system prompt + conversation history + user message + output. All four components count toward the context window limit, and you pay for all of them.

Token Counting Tips

Images

Multimodal models tokenize images differently from text. OpenAI charges based on image dimensions: a 1024x1024 image costs roughly 765 tokens. Smaller images cost fewer tokens.

Code

Code tends to use more tokens per line than prose because variable names, operators, and syntax characters each consume tokens. A 100-line Python file might use 500-800 tokens.

Non-English Text

Languages with non-Latin scripts (Chinese, Japanese, Korean, Arabic) often require more tokens per word because fewer subword units in the vocabulary match those characters. A Chinese text might use 1.5-2x more tokens than an equivalent English text.

JSON and Structured Data

JSON formatting adds significant token overhead. The braces, brackets, quotes, and key names all consume tokens. Consider using more compact formats when token usage matters.

Count Your Tokens Now

Stop guessing and start measuring. The tokencalc Token Counter gives you exact token counts for OpenAI models and accurate estimates for Claude and Gemini. Paste your text, select your model, and see the results instantly, all in your browser, completely free.

For cost projections based on your token usage, try the Pricing Calculator to compare costs across all major providers.