org.llm4s.context.tokens
Members list
Type members
Classlikes
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
Tokenizer.type
Represents the accuracy of tokenizer mapping for a model.
Represents the accuracy of tokenizer mapping for a model.
This sealed trait hierarchy captures three levels of accuracy:
- '''Exact''': Native tokenizer available, counts are precise
- '''Approximate''': Using a similar tokenizer, counts may differ
- '''Unknown''': Unknown model, using fallback tokenizer
Attributes
- See also
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
Accuracy level variants for tokenizer mappings.
Accuracy level variants for tokenizer mappings.
Attributes
- Companion
- trait
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
TokenizerAccuracy.type
Maps LLM model names to appropriate tokenizers for accurate token counting.
Maps LLM model names to appropriate tokenizers for accurate token counting.
Different LLM providers use different tokenization schemes, and even within a provider, newer models may use updated tokenizers. This object provides the mapping logic to select the correct tokenizer for a given model.
==Supported Providers==
| Provider | Model Pattern | Tokenizer | Accuracy |
|------------|-------------------------|----------------|----------|
| OpenAI | gpt-4o, o1-* | o200k_base | Exact |
| OpenAI | gpt-4, gpt-3.5 | cl100k_base | Exact |
| OpenAI | gpt-3 (legacy) | r50k_base | Exact |
| Azure | (same as OpenAI) | (inherited) | Exact |
| Anthropic | claude-* | cl100k_base | ~75% |
| Ollama | * | cl100k_base | ~80% |
| Unknown | * | cl100k_base | Unknown |
==Model Name Formats==
The mapper accepts various model name formats:
- Plain:
gpt-4o,claude-3-sonnet - Provider-prefixed:
openai/gpt-4o,anthropic/claude-3-sonnet - Azure:
azure/my-gpt4o-deployment - Ollama:
ollama/llama2
==Accuracy Considerations==
For non-OpenAI models, token counts are '''approximations'''. Claude uses a proprietary tokenizer that may differ 20-30% from cl100k_base. Always check TokenizerAccuracy to understand the expected accuracy.
Attributes
- See also
-
ConversationTokenCounter.forModel for the recommended entry point
TokenizerAccuracy for accuracy information
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
TokenizerMapping.type