massOfai

Tokenization

NLP & Text

Breaking text into smaller units (tokens)

What is Tokenization?

Splitting text into words, subwords, or characters. First step in most NLP pipelines.

Real-World Examples

  • "Hello world!" → ["Hello", "world", "!"]
  • Sentence splitting
  • Word piece tokenization

When to Use This

Essential first step for processing text in ML models