Spread Knowledge

Compiler Construction - CS606

CS606 - Compiler Construction - Lecture Handout 01

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Course Organization

The course is organized around theory and significant amount of practice. The practice will be in the form of home works and a project. The project is the highlight of the course: you will build a full compiler for subset of Java- like language. The implementation will in C++ and you will generate Intel x86 assembly language code. The project will be done in six parts; each will be a programming assignment.

Read more: CS606 - Compiler Construction - Lecture Handout 01

CS606 - Compiler Construction - Lecture Handout 03

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

A parse can be represented by a tree: parse tree or syntax tree. For example, here is the parse tree for the expression x+2-y

parse tree for the expression

Read more: CS606 - Compiler Construction - Lecture Handout 03

CS606 - Compiler Construction - Lecture Handout 04

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

CISC architecture provided a rich set of instructions and addressing modes but it made the job of the compiler harder when it came to generate efficient machine code. The RISC architecture simplified this problem.

Register Allocation

Registers in a CPU play an important role for providing high speed access to operands.
Memory access is an order of magnitude slower than register access. The back end attempts to have each operand value in a register when it is used. However, the back end has to manage a limited set of resources when it comes to the register file. The number of registers is small and some registers are pre-allocated for specialize used, e.g., program counter, and thus are not available for use to the back end. Optimal register allocation is NP-Complete.

Read more: CS606 - Compiler Construction - Lecture Handout 04

CS606 - Compiler Construction - Lecture Handout 05

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Lexical Analysis

The scanner is the first component of the front-end of a compiler; parser is the second

Lexical Analysis

Read more: CS606 - Compiler Construction - Lecture Handout 05

CS606 - Compiler Construction - Lecture Handout 06

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

How to Describe Tokens?

Regular Languages are the most popular for specifying tokens because

CS606 - Compiler Construction - Lecture Handout 07

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Table Encoding of FA

A FA can be encoded as a table. This is called a transition table. The following example shows a FA encoded as a table.

Table Encoding of FA

Read more: CS606 - Compiler Construction - Lecture Handout 07

CS606 - Compiler Construction - Lecture Handout 08

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

NFA ? DFA Construction

The algorithm is called subset construction. In the transition table of an NFA, each entry is a set of states. In DFA, each entry is a single state. The general idea behind NFA-to- DFA construction is that each DFA state corresponds to a set of NFA states. The DFA uses its state to keep track of all possible states the NFA can be in after reading each input symbol.

Read more: CS606 - Compiler Construction - Lecture Handout 08

CS606 - Compiler Construction - Lecture Handout 09

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

DFA Minimization

The generated DFA may have a large number of states. The Hopcroft’s algorithm can be used to minimize DFA states. The behind the algorithm is to find groups of equivalent states. All transitions from states in one group G1 go to states in the same group G2. Construct the minimized DFA such that there is one state for each group of states from the initial DFA. Here is the minimized version of the DFA created earlier; states A and C have been merged.

Read more: CS606 - Compiler Construction - Lecture Handout 09

CS606 - Compiler Construction - Lecture Handout 10

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Running the Scanner

Here is the output of the scanner when executed and given the file main.cpp as input, i.e., the scanner is being asked to provide tokens found in the file main.cpp:

lex <main.cpp

Read more: CS606 - Compiler Construction - Lecture Handout 10

CS606 - Compiler Construction - Lecture Handout 11

User Rating:  / 1

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Syntactic Analysis

Consider the following C++ function. There are a number of syntax errors present.

  1. int* foo(int i, int j))
  2. {
  3. for(k=0; i j; )
  4. fi( i > j )
  5. return j;
  6. }

  7. Read more: CS606 - Compiler Construction - Lecture Handout 11

CS606 - Compiler Construction - Lecture Handout 12

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Parse Trees

The derivations can be represented in a tree-like fashion. The interior nodes contain the non-terminals used during the derivation

Read more: CS606 - Compiler Construction - Lecture Handout 12

CS606 - Compiler Construction - Lecture Handout 13

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Top-Down Parser

A top-down parser starts with the root of the parse tree. The root node is labeled with the goal (start) symbol of the grammar. The top-down parsing algorithm proceeds as follows:

  1. Construct the root node of the parse tree
  2. Repeat until the fringe of the parse tree matches input string

CS606 - Compiler Construction - Lecture Handout 14

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

The LL(1) Property

 

If A? a and A? b both appear in the grammar, we would like

FIRST(a) Ç FIRST(b)= Æ

Predictive parsers accept LL(k) grammars. The two LL stand for Left-to-right scan of input, left- most derivation. The k stands for number of look-ahead tokens of input. The LL(1) Property allows the parser to make a correct choice with a look-ahead of exactly one symbol! What about e-productions? They complicate the definition of LL(1).
If A? a and A? b and e Î FIRST(a) , then we need to ensure that FIRST(b) is disjoint from FOLLOW(a), too.

Read more: CS606 - Compiler Construction - Lecture Handout 14

CS606 - Compiler Construction - Lecture Handout 15

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Let’s consider the implementation of the C++ classes for the non-terminals. We start with Expr.

bool Expr::isPresent()
{
Term* op1 = new Term(s);
if(!op1->isPresent())
return false;

Read more: CS606 - Compiler Construction - Lecture Handout 15

CS606 - Compiler Construction - Lecture Handout 16

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Here is the output trace for the expression :2+4*6

 

>> Expr::isPresent()
>> Term::isPresent()
>> Factor::isPresent()
token: 2 (257)
<< Factor::isPresent() return true
>> Tprime::isPresent()
token: + (267)

Read more: CS606 - Compiler Construction - Lecture Handout 16

CS606 - Compiler Construction - Lecture Handout 17

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Note that productions output are tracing out a lefmost derivation. The grammar symbols on the stack make up left-sentential forms.

LL(1) Table Construction

Top-down parsing expands a parse tree from the start symbol to the leaves. It always expand the leftmost non-terminal. Consider the state

Read more: CS606 - Compiler Construction - Lecture Handout 17

CS606 - Compiler Construction - Lecture Handout 18

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Computing FIRST Sets

Here is the algorithm for computing the FIRST sets.

  1. For all terminal symbols b,
    FIRST(b) = {b}

  2. Read more: CS606 - Compiler Construction - Lecture Handout 18

CS606 - Compiler Construction - Lecture Handout 19

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

LL(1) Table Construction

Here now is the algorithm to construct a predictive parsing table.

  1. For each production A? a
    1. for each terminal a in FIRST(a), add A? a to M[A,a].
    2. If e is in FIRST(a), add A? a to M[A,b] for each terminal b in FOLLOW(A). If e is in FIRST(a), and $ is in FOLLOW(A), add A? a to M[A,$].

    Read more: CS606 - Compiler Construction - Lecture Handout 19

CS606 - Compiler Construction - Lecture Handout 20

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Bottom-up Parsing

Bottom-up parsing is more general than top-down parsing. Bottom-up parsers handle a large class of grammars. It is the preferred method in practice. It is also called LR parsing; L means that tokens are read left to right and R means that the parser constructs a rightmost derivation. LR parsers do not need left-factored grammars. LR parsers can handle left-recursive grammars.

Read more: CS606 - Compiler Construction - Lecture Handout 20

CS606 - Compiler Construction - Lecture Handout 21

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Shift-Reduce: The Stack

A stack can be used to hold the content of the left string. The Top of the stack is marked by the ► symbol. The shift action pushes a terminal on the stack. Reduce pops zero or more symbols from the stack (production rhs) and pushes a non-terminal on the stack (production lhs)

Read more: CS606 - Compiler Construction - Lecture Handout 21

CS606 - Compiler Construction - Lecture Handout 22

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Example: here is the bottom-up parser’s action when it parses the expression grammar sentence

x – 2 × y

(tokenized as id – num * id)

Read more: CS606 - Compiler Construction - Lecture Handout 22

CS606 - Compiler Construction - Lecture Handout 23

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Handles

The number of potential handles for the grammar is simply the sum of the lengths of the right-hand side of all the productions. The number of complete handles is simply the number of productions. These two facts lead to the critical insight behind LR parsers:

A given grammar generates a finite set of handles (and potential handles) that the parser must recognize

However, it is not a simple matter of putting the placeholder in right-hand side to generate handles. The parser needs to recognize the correct handle by different right contexts. Consider the parser’s action at step 9.

Read more: CS606 - Compiler Construction - Lecture Handout 23

CS606 - Compiler Construction - Lecture Handout 24

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

An LR(1) item is a pair [X → α•β, a] where X → αβ is a production and a∈T (terminals) is look-ahead symbol. The model uses a set of LR(1) items to represent each parser state. The model is called the canonical collection (CC) of set of LR(1) items.

Read more: CS606 - Compiler Construction - Lecture Handout 24

CS606 - Compiler Construction - Lecture Handout 25

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

The set s changed so the repeat loop is executed again. This time, however, the item [E → •int, $/+] does not yield any more items because the dot is followed by the terminal int. The first set of items is

I0 = {
[S → •E, $],
[E → •E+(E), $/+],
[E → •int, $/+]
}

Read more: CS606 - Compiler Construction - Lecture Handout 25

CS606 - Compiler Construction - Lecture Handout 26

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

The initial DFA state I0 we computed is the ε-closure of the set consisting of item

S → •E

Recall the stage in the closure

s = { [S → •E, $] , [E → •E + (E), $] , [E → •int, $] }

The NFA states and transitions required are

Read more: CS606 - Compiler Construction - Lecture Handout 26

CS606 - Compiler Construction - Lecture Handout 28

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

LR(1) Skeleton Parser

stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→β”) {
stack.pop(2×|β|);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}

Read more: CS606 - Compiler Construction - Lecture Handout 28

CS606 - Compiler Construction - Lecture Handout 29

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Shift/Reduce Conflicts

Consider the ambiguous grammar

E → E + E | E × E | int

We will DFA state containg

[E → E × E•, +]
[E → E • + E, +]

Again we have a shift/reduce conflict. We need to reduce because × has precedence over +

Read more: CS606 - Compiler Construction - Lecture Handout 29

CS606 - Compiler Construction - Lecture Handout 30

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Parser Generators

Parser generators exist for LL(1) and LALR(1) grammars. For example,

CS606 - Compiler Construction - Lecture Handout 31

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Beyond Syntax

These questions are part of context-sensitive analysis. Answers depend on values, not parts of speech. Answers may involve computation.

These questions can be answered by using formal methods such as context-sensitive grammars and attribute grammars or by using ad-hoc techniques.

One of the most popular is the use of attribute grammars.

Read more: CS606 - Compiler Construction - Lecture Handout 31

CS606 - Compiler Construction - Lecture Handout 32

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Evaluation Methods

A number of ways can be used to evaluate the attributes. When using Dynamic method, the compiler application builds the parse tree and then builds the dependence graph. A topological sort of the graph is carried out and attributes are evaluated or defined in topological order. In rule-based (or treewalk) methodology, the attribute rules are analyzed at compiler-generation time. A fixed (static) ordering is determined and the nodes in the dependency graph are evaluated this order. In oblivious (passes, dataflow) methodology, the attribute rules and parse tree are ignored. A convenient order is picked at compiler design time and used.

Read more: CS606 - Compiler Construction - Lecture Handout 32

CS606 - Compiler Construction - Lecture Handout 33

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Implementing Ad-Hoc Scheme

The parser needs a mechanism to pass values of attributes from definitions in one snippet to uses in another. We will adopt notation used by YACC for snippets and passing values. Recall that the skeleton LR(1) parser stored two values on the stack 〈symbol,state〉.

Read more: CS606 - Compiler Construction - Lecture Handout 33

CS606 - Compiler Construction - Lecture Handout 34

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Let’s go through an example of using YACC to implement the ad-hoc scheme for an arithmetic calculator.

The YACC file for a calculator grammar is as follows:

%token NUMBER LPAREN RPAREN
%token PLUS MINUS TIMES DIVIDE
%%

Read more: CS606 - Compiler Construction - Lecture Handout 34

CS606 - Compiler Construction - Lecture Handout 35

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

IR Taxonomy

IRs fall into three organizational categories:

  1. Graphical IRs encode the compiler’s knowledge in a graph.
  2. Linear IRs resemble pseudo-code for some abstract machine
  3. Hybrid IRs combine elements of both graphical (structural) and linear IRs

  4. Read more: CS606 - Compiler Construction - Lecture Handout 35

CS606 - Compiler Construction - Lecture Handout 36

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Three-address code is attractive for several reasons. Absence of destructive operators gives the compiler freedom to reuse names and values. Three-address code is reasonably compact: operations are 1 to 2 bytes; addresses are 4 bytes. Many modern processors implement three-address operations; a three-address code models their properties well

Read more: CS606 - Compiler Construction - Lecture Handout 36

CS606 - Compiler Construction - Lecture Handout 37

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Here is the bottom-up parse of the assignment statement a = b*-c + b*-c and the syntax-directed translation into three-address code.

Parser action attribute Three- address code
id=id ∗ –id + id ∗ –id    
id=E1 ∗ –id + id ∗ –id E1.place = b  
id=E1 ∗ –E2 + id ∗ –id E2.place = c  
id=E1 ∗ E2 + id ∗ –id E2.place = t1 t1 = – c
id=E1 + id ∗ –id E1.place = t2 t2 = b∗t1
id=E1 + E2 ∗ –id E2.place = b  
id=E1 + E2 ∗ –E3 E3.place = c  
id=E1 + E2 ∗ E3 E3.place = t3 t3 = – c
id=E1 + E2 E2.place = t4 t4 = b∗t3
id=E1 E1.place = t5 t5 = t2+t4
S   a = t5

Read more: CS606 - Compiler Construction - Lecture Handout 37

CS606 - Compiler Construction - Lecture Handout 38

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Three-Address Statement Types

Prior to proceeding with flow-of-control construct, here are the types of three-Address statements that we will use

CS606 - Compiler Construction - Lecture Handout 39

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Boolean Expressions

In programming languages, boolean expressions have two primary purposes:

  • compute logical values such as x = a < b && d > e
  • conditional expressions in flow-of-control statements

Consider the grammar for Boolean expressions

E → E or E
| E and E
| not E
| ( E )
| id relop id
| true
| false

Read more: CS606 - Compiler Construction - Lecture Handout 39

CS606 - Compiler Construction - Lecture Handout 40

User Rating:  / 1

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Example: consider, the boolean expression

a < b or c < d and e < f

Recall the syntax directed translation for the production

E → id1 relop id2
{
E.truelist = makelist(nextquad());
E.falselist = makelist(nextquad()+1);
emit(‘if’ id1 relop id2 ‘goto _’) ;
emit(‘goto _’ );

Read more: CS606 - Compiler Construction - Lecture Handout 40

CS606 - Compiler Construction - Lecture Handout 41

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Flow-of-Control Statements

We now use backpatching to translate flow-of-control statements in one pass. We will use the same list-handling procedures as before.

S → if E then S
| if E then S else S
| while E do S
| begin L end
| A
L → L ; S

Read more: CS606 - Compiler Construction - Lecture Handout 41

CS606 - Compiler Construction - Lecture Handout 42

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Code Generation

The code generation problem is the task of mapping intermediate code to machine code.
The generated code must be correct for obvious reasons. It should be efficient both in terms of memory space and execution time.

The input to the code generation module of compiler is intermediate code (optimized or not) and its task is typically to produce either machine code or assembly language code for a target machine.

Read more: CS606 - Compiler Construction - Lecture Handout 42

CS606 - Compiler Construction - Lecture Handout 43

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Control Flow Graph - CFG

A control flow graph is the triplet CFG = < V, E, Entry >, where V = vertices or nodes, representing an instruction or basic block (group of statements), E = (V x V) edges, potential flow of control. Entry is an element of V, the unique program entry.

Read more: CS606 - Compiler Construction - Lecture Handout 43

CS606 - Compiler Construction - Lecture Handout 44

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Example: let us apply the algorithm to the following segment of 3-address code:

a = b + c
t1 = a * a
b = t1 + a
c = t1 * b
t2 = c + b
a = t2 + t2

Read more: CS606 - Compiler Construction - Lecture Handout 44

CS606 - Compiler Construction - Lecture Handout 45

User Rating:  / 0

Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

Liveness information allows us to keep values in registers if they will be used later. An obvious concern is why do we assume all variables are live at the end of blocks? Why do we need to save live variables at the end? It seems reasonable to perceive that we might have to reload them in the next block. To do this, we need to determine live/next use information across blocks and not just within a block. This requires global data-flow analysis.

Read more: CS606 - Compiler Construction - Lecture Handout 45