Structure of a Compiler¶
1. What is a compiler?¶
Todo
- type: Diagrams
Look at Susan’s original notes (sectcomdes) to see what these are supposed to look like (she used cline to help build boxes, but MathJax cannot support it).
1.1. Translator¶
Definition:
Examples:
Preprocessor: If you have a language that has while statements
but no for
statements, construct a new language that allows for
statements and use the preprocessor to translate the new language into
the old language:
for i = 1 to n do
(stmts)
end for
i = 1
while (i <= n) do
(stmts)
i = i + 1
end while
2. Language Processing System¶
Comments:
* Preprocessor - C preprocessor replaces#include
statements with files.* Preprocessor - macro preprocessorExample: In LaTeX, define a macro\newcommand\TODO[1]{{\color{red}[#1]}}
then use\TODO{}
wherever you want.
NOTE: Today all these are transparent and are packaged into a “compiler”.
2.1. Compiler¶
3. Overview of General Compiler¶
lexical analysis - read the program character by character grouping into atomic units called tokens
syntax analysis - accepts tokens, checks if program is syntactically correct, generates a parse tree.
intermediate code generation - walk through parse tree producing simple assembly code
code optimization - transform intermediate code to “better” code (faster)
code generation - transform intermediate code to machine code (assembler)
symbol table
lexical analysis: enter identifier into table,
syntax analysis: - type of identifier and usage,
code generation: storage locations bound to names at runtime
error handling - lex: lot of errors will pass thru (
while
is typed aswh ile
)
4. Phases of Compilation¶
4.1. Lexical Analysis (Scanner)¶
Purpose: Read the same program character by character grouping them into atomic units called tokens.
Tokens:
depend on language and compiler writer
Examples:
reserved word:if
,for
operators:+, -, <, =
constants:0, 4.89
punctuation:(, }, [
identifiers:i, myNode
Treated as a pair:
token.type
andtoken.value
token.type
is a (mnemonic) integersome tokens have no
token.value
Example 1
if (x <= 0) x = y + z
when put through lexical analyzer produces:
How does one build a scanner?
from scratch
lex
Preview of Lex
idea: tokens described by regular expressions
basic syntax:
regular expression, action
basic semantics:
if match regular expression, then do action.
Example:
- Besides returning token types and values, the lexical analyzer might
print error messages
insert identifiers in the symbol table
Difficult to differentiate sometimes: When does lexical analysis stop and parsing start? Example, consider keywords AND and OR. Are they tokens of type AND and OR, or are they RELOP tokens with values AND and OR?
4.2. Syntax Analysis (Parsing)¶
Purpose: Accepts the sequence of tokens generated by the lexical analyzer, checks whether the program is syntactically correct, and generates a parse tree.
Syntax: formally described by a context free grammar.
Parse Tree
if (x <= 0) x = y + z
How does one build a parser?
from scratch
using a parser generator such as yacc
4.3. 1.3.3 Intermediate Code Generator¶
Purpose: Traverse the parse tree, producing simple intermediate code.
Three-Address Code: Sequence of instructions, each has at most three operands. (like assembly in which each memory location can act like a register).
Instructions:
1.id := id op id
2.goto label
3.if condition goto label
Example 2
if (x <= 0) x = x + z
if (x <= 0) goto L1
goto L2
L1: x := y + z
L2:
Some compilers combine syntax analysis and intermediate code generation (i.e. no parse tree is generated)
4.4. 1.3.4 Intermediate Code Generation¶
Purpose: Transform the intermediate code into “better” code.
Examples:
Rearrangement of Code
if (x <= 0) goto L1 if (x$>$0 goto L2
goto L2 ==> x = y $+$ z
L1: x = y + z L2:
L2:
Redundancy Elimination
a = w + x + y T1 = x + y
==> a = w + T1
b = x + y + z b = T1 + z
Strength Reduction
x^2 ==> x*x
expensive ==> cheap
operator operator
Frequency Reduction
for (i=1; i<n; i=i+1) T1 = sqrt(26)
x = sqrt(26) ==> for (i=1; i<n; i=i+1)
} x = T1
}
Remarks:
Main criteria for optimization is speed.
Optimization takes time; hence it
is optional
may not be desirable (in low level CS class)
4.5. Code Generation¶
Purpose: Transform intermediate code to machine code (assembler)
Example: a = b + c
mov b, R1
add c, R1
mov R1, a
Remarks
completely machine dependent whereas other phases are not
“register allocation” is the most difficult task
idea - use registers (fast access) to avoid memory use (slow access)
problem - only a finite number of registers (during intermediate code phase, one assumes an infinite number)
4.6. Symbol Table¶
Purpose: record information about various objects in the source program
Examples
procedure - number and type of arguments
simple variable - type
array - type, size
Use - information is required during
parsing (for type checking)
code generation (for generating the correct operand, allocating memory)
4.7. Error Handler¶
Errors - all errors should be
detected
detected correctly
detected as soon as possible
reported at the appropriate place and in a helpful manner
Purpose
report errors
“error recovery” - Be able to proceed with processing
Note: Errors can occur in each phase
misspelled token
wrong syntax
improper procedure call
statements that cannot be reached