llm4s-core/org.llm4s/org.llm4s.context/org.llm4s.context.tokens/TokenizerMapping

TokenizerMapping

org.llm4s.context.tokens.TokenizerMapping

Maps LLM model names to appropriate tokenizers for accurate token counting.

Different LLM providers use different tokenization schemes, and even within a provider, newer models may use updated tokenizers. This object provides the mapping logic to select the correct tokenizer for a given model.

==Supported Providers==

| Provider   | Model Pattern           | Tokenizer      | Accuracy |
|------------|-------------------------|----------------|----------|
| OpenAI     | gpt-4o, o1-*            | o200k_base     | Exact    |
| OpenAI     | gpt-4, gpt-3.5          | cl100k_base    | Exact    |
| OpenAI     | gpt-3 (legacy)          | r50k_base      | Exact    |
| Azure      | (same as OpenAI)        | (inherited)    | Exact    |
| Anthropic  | claude-*                | cl100k_base    | ~75%     |
| Ollama     | *                       | cl100k_base    | ~80%     |
| Unknown    | *                       | cl100k_base    | Unknown  |

==Model Name Formats==

The mapper accepts various model name formats:

Plain: gpt-4o, claude-3-sonnet
Provider-prefixed: openai/gpt-4o, anthropic/claude-3-sonnet
Azure: azure/my-gpt4o-deployment
Ollama: ollama/llama2

==Accuracy Considerations==

For non-OpenAI models, token counts are '''approximations'''. Claude uses a proprietary tokenizer that may differ 20-30% from cl100k_base. Always check TokenizerAccuracy to understand the expected accuracy.

Attributes

See also: ConversationTokenCounter.forModel for the recommended entry point

TokenizerAccuracy for accuracy information
Graph
Supertypes: class Object

trait Matchable

class Any
Self type: TokenizerMapping.type

Members list

Value members

Concrete methods

Get accuracy information for a model's tokenizer mapping.

This helps callers understand how reliable the token counts will be. For OpenAI models, counts are exact. For other providers, they are approximations that may differ from actual usage.

Value parameters

modelName: The model identifier

Attributes

Returns: Accuracy information including description and estimated accuracy

Get the appropriate tokenizer ID for a given model name.

Value parameters

modelName: The model identifier (e.g., "gpt-4o", "openai/gpt-4o")

Attributes

Returns: The recommended tokenizer ID for this model

Check if the tokenizer mapping is exact or approximate for a model.

Exact mappings are available for OpenAI and Azure OpenAI models. Other providers use approximations.

Value parameters

modelName: The model identifier

Attributes

Returns: True if token counts will be exact, false if approximate

In this article

Generated with