massOfai

Tokenizer

NLP & Text

Converts text to tokens the model understands

What is Tokenizer?

Tokenizers split text into words, subwords or bytes; choices affect model vocabulary and performance.

Real-World Examples

  • Byte-Pair Encoding (BPE), WordPiece