Compiler Construct Projects
Compiler Construct
In the first part of the project, you need implement a simple lexical analyzer using C++
1. Input is a text file with source code in it.
2. Space, tab, new line and comment (/* …. */) should be ignored by the lexical analyzer
3. The lexical analyzer should be able to identify
a) Integer literals, e.g. 34
b) Following keywords: for, while, do, if, else, public, private
c) Any user defined name, e.g. balance, a, b
d) Other single character punctuation, symbols, e.g. %, +, =, ; and so on
e) Special multi-character symbols, including ==, <=, >= (only these three)
4. In the program, tokens for all the key words and user defined names need to be stored in a simple table. No matter how many times one key word or user defined name appear in the code, its token should appear only once in the table
5. Token object structure and display format
Integer Literal Keywords User defined Name Single character symbol Multi-character symbol
Object Contents
tag (integer) 256 257 258 ASCII 259, 260, 261 for ==, <=, >= respectively
v (integer) Numeric value N/A NA NA NA
s (String) NA Keyword string Name string NA NA
Display Format
Display format <num,34> <keyword, “if”> <id, “balance”> <+> <==>
5. Output to screen:
a) List the tokens in the table
b) Display the token sequence derived from the input file on the screen following the format in the table above.