org.llm4s.context.tokens

Members list

Type members

Classlikes

Attributes

Supertypes
class Object
trait Matchable
class Any
case class Token(tokenId: Int)

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object Tokenizer

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
Tokenizer.type
sealed trait TokenizerAccuracy

Represents the accuracy of tokenizer mapping for a model.

Represents the accuracy of tokenizer mapping for a model.

This sealed trait hierarchy captures three levels of accuracy:

  • '''Exact''': Native tokenizer available, counts are precise
  • '''Approximate''': Using a similar tokenizer, counts may differ
  • '''Unknown''': Unknown model, using fallback tokenizer

Attributes

See also
Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
class Approximate
class Exact
class Unknown

Accuracy level variants for tokenizer mappings.

Accuracy level variants for tokenizer mappings.

Attributes

Companion
trait
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type

Maps LLM model names to appropriate tokenizers for accurate token counting.

Maps LLM model names to appropriate tokenizers for accurate token counting.

Different LLM providers use different tokenization schemes, and even within a provider, newer models may use updated tokenizers. This object provides the mapping logic to select the correct tokenizer for a given model.

==Supported Providers==

| Provider   | Model Pattern           | Tokenizer      | Accuracy |
|------------|-------------------------|----------------|----------|
| OpenAI     | gpt-4o, o1-*            | o200k_base     | Exact    |
| OpenAI     | gpt-4, gpt-3.5          | cl100k_base    | Exact    |
| OpenAI     | gpt-3 (legacy)          | r50k_base      | Exact    |
| Azure      | (same as OpenAI)        | (inherited)    | Exact    |
| Anthropic  | claude-*                | cl100k_base    | ~75%     |
| Ollama     | *                       | cl100k_base    | ~80%     |
| Unknown    | *                       | cl100k_base    | Unknown  |

==Model Name Formats==

The mapper accepts various model name formats:

  • Plain: gpt-4o, claude-3-sonnet
  • Provider-prefixed: openai/gpt-4o, anthropic/claude-3-sonnet
  • Azure: azure/my-gpt4o-deployment
  • Ollama: ollama/llama2

==Accuracy Considerations==

For non-OpenAI models, token counts are '''approximations'''. Claude uses a proprietary tokenizer that may differ 20-30% from cl100k_base. Always check TokenizerAccuracy to understand the expected accuracy.

Attributes

See also

ConversationTokenCounter.forModel for the recommended entry point

TokenizerAccuracy for accuracy information

Supertypes
class Object
trait Matchable
class Any
Self type