CS 15-212: Fundamental Structures of Computer Science II |
Many applications require some form of tokenization or lexical analysis to be carried out as a preprocessing step. Examples include compiling programming languages, processing natural languages, or manipulating HTML pages to extract structure. The computational framework for lexical analysis is best described using finite state machines. After recalling and extending our previous use of finite state machines and regular expressions, we study an example of a lexical analyzer for a simple language of arithmetic expressions.