What do you understand by tokenization?
Tokenization is the act of breaking a sequence of strings into pieces such as words, keywords, phrases, symbols, and other elements called tokens.
- Tokens can be individual words, phrases, or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded.