Tokenizer
org.llm4s.context.tokens.Tokenizer
object Tokenizer
Factory for obtaining StringTokenizer instances backed by jtokkit BPE encodings.
Encodings are looked up from the jtokkit default registry which bundles the standard TikToken vocabularies (cl100k_base, o200k_base, etc.). An unknown tokenizerId — one whose name does not match any bundled vocabulary — returns None; callers must handle the absent case explicitly.
Attributes
- See also
-
org.llm4s.identity.TokenizerId for vocabulary name constants
org.llm4s.context.tokens.TokenizerMapping for the model → tokenizer mapping
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
Tokenizer.type
Members list
In this article