Famous lyrics by »
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is not the same process as the probabilistic tokenization, used for large language model's data preprocessing, that encode text into numerical tokens, using byte pair encoding.
0 fans
Share your thoughts on the Lexer Band with the community:
Report Comment
We're doing our best to make sure our content is useful, accurate and safe.
If by any chance you spot an inappropriate comment while navigating through our website please use this form to let us know, and we'll take care of it shortly.
Attachment
You need to be logged in to favorite.
Log In