Tokenization
NLP & Text
Breaking text into smaller units (tokens)
What is Tokenization?
Splitting text into words, subwords, or characters. First step in most NLP pipelines.
Real-World Examples
- •"Hello world!" → ["Hello", "world", "!"]
- •Sentence splitting
- •Word piece tokenization
When to Use This
Essential first step for processing text in ML models
Related Terms
Learn more about concepts related to Tokenization