TokenizerMapping

org.llm4s.context.tokens.TokenizerMapping

Maps LLM model names to appropriate tokenizers for accurate token counting.

Different LLM providers use different tokenization schemes, and even within a provider, newer models may use updated tokenizers. This object provides the mapping logic to select the correct tokenizer for a given model.

==Supported Providers==

| Provider   | Model Pattern           | Tokenizer      | Accuracy |
|------------|-------------------------|----------------|----------|
| OpenAI     | gpt-4o, o1-*            | o200k_base     | Exact    |
| OpenAI     | gpt-4, gpt-3.5          | cl100k_base    | Exact    |
| OpenAI     | gpt-3 (legacy)          | r50k_base      | Exact    |
| Azure      | (same as OpenAI)        | (inherited)    | Exact    |
| Anthropic  | claude-*                | cl100k_base    | ~75%     |
| Ollama     | *                       | cl100k_base    | ~80%     |
| Unknown    | *                       | cl100k_base    | Unknown  |

==Model Name Formats==

The mapper accepts various model name formats:

  • Plain: gpt-4o, claude-3-sonnet
  • Provider-prefixed: openai/gpt-4o, anthropic/claude-3-sonnet
  • Azure: azure/my-gpt4o-deployment
  • Ollama: ollama/llama2

==Accuracy Considerations==

For non-OpenAI models, token counts are '''approximations'''. Claude uses a proprietary tokenizer that may differ 20-30% from cl100k_base. Always check TokenizerAccuracy to understand the expected accuracy.

Attributes

See also

ConversationTokenCounter.forModel for the recommended entry point

TokenizerAccuracy for accuracy information

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

def getAccuracyInfo(modelName: String): TokenizerAccuracy

Get accuracy information for a model's tokenizer mapping.

Get accuracy information for a model's tokenizer mapping.

This helps callers understand how reliable the token counts will be. For OpenAI models, counts are exact. For other providers, they are approximations that may differ from actual usage.

Value parameters

modelName

The model identifier

Attributes

Returns

Accuracy information including description and estimated accuracy

def getTokenizerId(modelName: String): TokenizerId

Get the appropriate tokenizer ID for a given model name.

Get the appropriate tokenizer ID for a given model name.

Value parameters

modelName

The model identifier (e.g., "gpt-4o", "openai/gpt-4o")

Attributes

Returns

The recommended tokenizer ID for this model

def isExactMapping(modelName: String): Boolean

Check if the tokenizer mapping is exact or approximate for a model.

Check if the tokenizer mapping is exact or approximate for a model.

Exact mappings are available for OpenAI and Azure OpenAI models. Other providers use approximations.

Value parameters

modelName

The model identifier

Attributes

Returns

True if token counts will be exact, false if approximate