Structure of a Compiler¶
1. What is a compiler?¶
1.1. Translator¶
Definition:
Examples:
Preprocessor: If you have a language that has while statements but no for statements, construct a new language that allows for statements and use preprocessor to translate new language into old language:
for i=1 to n do
(stmts)
end for
i = 1
while (i <= n) do
(stmts)
i = i + 1
end while
2. Language Processing System¶
Comments:
* Preprocessor - C preprocessor replaces#include
statements with files.* Preprocessor - macro preprocessorExample: In LaTeX, define a macro\newcommand\TODO[1]{{\color{red}[#1]}}
then use\TODO{}
wherever you want.
NOTE: Today all these are transparent and are packaged into a "compiler".
2.1. Compiler¶
3. Overview of General Compiler¶
lexical analysis - read the program character by character grouping into atomic units called tokens
syntax analysis - accepts tokens, checks if program is syntactically correct, generates a parse tree.
intermediate code generation - walk through parse tree producing simple assembly code
code optimization - transform intermediate code to "better" code (faster)
code generation - transform intermediate code to machine code (assembler)
symbol table
- lexical analysis: enter identifier into table,
- syntax analysis: - type of identifier and usage,
- code generation: storage locations bound to names at runtime
error handling - lex: lot of errors will pass thru (
while
is typed aswh ile
)
4. Phases of Compilation¶
4.1. Lexical Analysis (Scanner)¶
Purpose: Read the same program character by character grouping them into atomic units called tokens.
Tokens:
depend on language and compiler writer
Examples:
reserved word:if
,for
operators:+, -, <, =
constants:0, 4.89
punctuation:(, }, [
identifiers:i, myNode
Treated as a pair:
token.type
andtoken.value
token.type
is a (mnemonic) integer- some tokens have no
token.value
Example 1
if (x <= 0) x = y + z
when put through lexical analyzer produces:
How does one build a scanner?
- from scratch
- lex
Preview of Lex
idea: tokens described by regular expressions
basic syntax:
regular expression, action
basic semantics:
if match regular expression, then do action.
Example:
- Besides returning token types and values, the lexical analyzer might
- print error messages
- insert identifiers in the symbol table
Difficult to differentiate sometimes: When does lexical analysis stop and parsing start? Example, consider keywords AND and OR. Are they tokens of type AND and OR, or are they RELOP tokens with values AND and OR?
4.2. Syntax Analysis (Parsing)¶
Purpose: Accepts the sequence of tokens generated by the lexical analyzer, checks whether the program is syntactically correct, and generates a parse tree.
Syntax: formally described by a context free grammar.
Parse Tree
if (x <= 0) x = y + z
How does one build a parser?
- from scratch
- using a parser generator such as yacc
4.3. 1.3.3 Intermediate Code Generator¶
Purpose: Traverse the parse tree, producing simple intermediate code.
Three-Address Code: Sequence of instructions, each has at most three operands. (like assembly in which each memory location can act like a register).
Instructions:
1.id := id op id
2.goto label
3.if condition goto label
Example 2
if (x <= 0) x = x + z
if (x <= 0) goto L1
goto L2
L1: x := y + z
L2:
Some compilers combine syntax analysis and intermediate code generation (i.e. no parse tree is generated)
4.4. 1.3.4 Intermediate Code Generation¶
Purpose: Transform the intermediate code into "better" code.
Examples:
- Rearrangement of Code
if (x <= 0) goto L1 if (x$>$0 goto L2
goto L2 ==> x = y $+$ z
L1: x = y + z L2:
L2:
- Redundancy Elimination
a = w + x + y T1 = x + y
==> a = w + T1
b = x + y + z b = T1 + z
- Strength Reduction
x^2 ==> x*x
expensive ==> cheap
operator operator
- Frequency Reduction
for (i=1; i<n; i=i+1) T1 = sqrt(26)
x = sqrt(26) ==> for (i=1; i<n; i=i+1)
} x = T1
}
Remarks:
- Main criteria for optimization is speed.
- Optimization takes time; hence it
- is optional
- may not be desirable (in low level CS class)
4.5. Code Generation¶
Purpose: Transform intermediate code to machine code (assembler)
Example: a = b + c
mov b, R1
add c, R1
mov R1, a
Remarks
- completely machine dependent whereas other phases are not
- "register allocation" is the most difficult task
- idea - use registers (fast access) to avoid memory use (slow access)
- problem - only a finite number of registers (during intermediate code phase, one assumes an infinite number)
4.6. Symbol Table¶
Purpose: record information about various objects in the source program
Examples
- procedure - number and type of arguments
- simple variable - type
- array - type, size
Use - information is required during
- parsing (for type checking)
- code generation (for generating the correct operand, allocating memory)
4.7. Error Handler¶
Errors - all errors should be
- detected
- detected correctly
- detected as soon as possible
- reported at the appropriate place and in a helpful manner
Purpose
- report errors
- "error recovery" - Be able to proceed with processing
Note: Errors can occur in each phase
- misspelled token
- wrong syntax
- improper procedure call
- statements that cannot be reached