Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The course is organized around theory and significant amount of practice. The practice will be in the form of home works and a project. The project is the highlight of the course: you will build a full compiler for subset of Java- like language. The implementation will in C++ and you will generate Intel x86 assembly language code. The project will be done in six parts; each will be a programming assignment.
Read more: CS606 - Compiler Construction - Lecture Handout 01
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A parse can be represented by a tree: parse tree or syntax tree. For example, here is the parse tree for the expression x+2-y
Read more: CS606 - Compiler Construction - Lecture Handout 03
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
CISC architecture provided a rich set of instructions and addressing modes but it made the job of the compiler harder when it came to generate efficient machine code. The RISC architecture simplified this problem.
Registers in a CPU play an important role for providing high speed access to operands.
Memory access is an order of magnitude slower than register access. The back end
attempts to have each operand value in a register when it is used. However, the back end
has to manage a limited set of resources when it comes to the register file. The number of
registers is small and some registers are pre-allocated for specialize used, e.g., program
counter, and thus are not available for use to the back end. Optimal register allocation is
NP-Complete.
Read more: CS606 - Compiler Construction - Lecture Handout 04
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The scanner is the first component of the front-end of a compiler; parser is the second
Read more: CS606 - Compiler Construction - Lecture Handout 05
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Regular Languages are the most popular for specifying tokens because
Read more: CS606 - Compiler Construction - Lecture Handout 06
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A FA can be encoded as a table. This is called a transition table. The following example shows a FA encoded as a table.
Read more: CS606 - Compiler Construction - Lecture Handout 07
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The algorithm is called subset construction. In the transition table of an NFA, each entry is a set of states. In DFA, each entry is a single state. The general idea behind NFA-to- DFA construction is that each DFA state corresponds to a set of NFA states. The DFA uses its state to keep track of all possible states the NFA can be in after reading each input symbol.
Read more: CS606 - Compiler Construction - Lecture Handout 08
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The generated DFA may have a large number of states. The Hopcroft’s algorithm can be used to minimize DFA states. The behind the algorithm is to find groups of equivalent states. All transitions from states in one group G1 go to states in the same group G2. Construct the minimized DFA such that there is one state for each group of states from the initial DFA. Here is the minimized version of the DFA created earlier; states A and C have been merged.
Read more: CS606 - Compiler Construction - Lecture Handout 09
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Here is the output of the scanner when executed and given the file main.cpp as input, i.e., the scanner is being asked to provide tokens found in the file main.cpp:
Read more: CS606 - Compiler Construction - Lecture Handout 10
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Consider the following C++ function. There are a number of syntax errors present.
Read more: CS606 - Compiler Construction - Lecture Handout 11
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The derivations can be represented in a tree-like fashion. The interior nodes contain the non-terminals used during the derivation
Read more: CS606 - Compiler Construction - Lecture Handout 12
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A top-down parser starts with the root of the parse tree. The root node is labeled with the goal (start) symbol of the grammar. The top-down parsing algorithm proceeds as follows:
Read more: CS606 - Compiler Construction - Lecture Handout 13
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
If A? a and A? b both appear in the grammar, we would like
Predictive parsers accept LL(k) grammars. The two LL stand for Left-to-right scan of
input, left- most derivation. The k stands for number of look-ahead tokens of input. The
LL(1) Property allows the parser to make a correct choice with a look-ahead of exactly
one symbol! What about e-productions? They complicate the definition of LL(1).
If A? a and A? b and e Î FIRST(a) , then we need to ensure that FIRST(b) is
disjoint from FOLLOW(a), too.
Read more: CS606 - Compiler Construction - Lecture Handout 14
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Let’s consider the implementation of the C++ classes for the non-terminals. We start with Expr.
bool Expr::isPresent()
Read more: CS606 - Compiler Construction - Lecture Handout 15
{
Term* op1 = new Term(s);
if(!op1->isPresent())
return false;
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Here is the output trace for the expression :2+4*6
>> Expr::isPresent()
>> Term::isPresent()
>> Factor::isPresent()
token: 2 (257)
<< Factor::isPresent() return true
>> Tprime::isPresent()
token: + (267)
Read more: CS606 - Compiler Construction - Lecture Handout 16
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Note that productions output are tracing out a lefmost derivation. The grammar symbols on the stack make up left-sentential forms.
Top-down parsing expands a parse tree from the start symbol to the leaves. It always expand the leftmost non-terminal. Consider the state
Read more: CS606 - Compiler Construction - Lecture Handout 17
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Here is the algorithm for computing the FIRST sets.
Read more: CS606 - Compiler Construction - Lecture Handout 18
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Here now is the algorithm to construct a predictive parsing table.
Read more: CS606 - Compiler Construction - Lecture Handout 19
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Bottom-up parsing is more general than top-down parsing. Bottom-up parsers handle a large class of grammars. It is the preferred method in practice. It is also called LR parsing; L means that tokens are read left to right and R means that the parser constructs a rightmost derivation. LR parsers do not need left-factored grammars. LR parsers can handle left-recursive grammars.
Read more: CS606 - Compiler Construction - Lecture Handout 20
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A stack can be used to hold the content of the left string. The Top of the stack is marked by the ► symbol. The shift action pushes a terminal on the stack. Reduce pops zero or more symbols from the stack (production rhs) and pushes a non-terminal on the stack (production lhs)
Read more: CS606 - Compiler Construction - Lecture Handout 21
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Example: here is the bottom-up parser’s action when it parses the expression grammar sentence
x – 2 × y
(tokenized as id – num * id)
Read more: CS606 - Compiler Construction - Lecture Handout 22
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The number of potential handles for the grammar is simply the sum of the lengths of the right-hand side of all the productions. The number of complete handles is simply the number of productions. These two facts lead to the critical insight behind LR parsers:
A given grammar generates a finite set of handles (and potential handles) that the parser must recognize
However, it is not a simple matter of putting the placeholder in right-hand side to generate handles. The parser needs to recognize the correct handle by different right contexts. Consider the parser’s action at step 9.
Read more: CS606 - Compiler Construction - Lecture Handout 23
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
An LR(1) item is a pair [X → α•β, a] where X → αβ is a production and a∈T (terminals) is look-ahead symbol. The model uses a set of LR(1) items to represent each parser state. The model is called the canonical collection (CC) of set of LR(1) items.
Read more: CS606 - Compiler Construction - Lecture Handout 24
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The set s changed so the repeat loop is executed again. This time, however, the item [E → •int, $/+] does not yield any more items because the dot is followed by the terminal int. The first set of items is
I0 = {
[S → •E, $],
[E → •E+(E), $/+],
[E → •int, $/+]
}
Read more: CS606 - Compiler Construction - Lecture Handout 25
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The initial DFA state I0 we computed is the ε-closure of the set consisting of item
S → •E
Recall the stage in the closure
s = { [S → •E, $] , [E → •E + (E), $] , [E → •int, $] }
The NFA states and transitions required are
Read more: CS606 - Compiler Construction - Lecture Handout 26
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
stack.push(dummy); stack.push(0);
done = false; token = scanner.next();
while (!done) {
s = stack.top();
if( Action[s,token] == “reduce A→β”) {
stack.pop(2×|β|);
s = stack.top();
stack.push(A);
stack.push(Goto[s,A]);
}
Read more: CS606 - Compiler Construction - Lecture Handout 28
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Consider the ambiguous grammar
E → E + E | E × E | int
We will DFA state containg
[E → E × E•, +]
[E → E • + E, +]
Again we have a shift/reduce conflict. We need to reduce because × has precedence over +
Read more: CS606 - Compiler Construction - Lecture Handout 29
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Parser generators exist for LL(1) and LALR(1) grammars. For example,
Read more: CS606 - Compiler Construction - Lecture Handout 30
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
These questions are part of context-sensitive analysis. Answers depend on values, not parts of speech. Answers may involve computation.
These questions can be answered by using formal methods such as context-sensitive grammars and attribute grammars or by using ad-hoc techniques.
One of the most popular is the use of attribute grammars.
Read more: CS606 - Compiler Construction - Lecture Handout 31
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A number of ways can be used to evaluate the attributes. When using Dynamic method, the compiler application builds the parse tree and then builds the dependence graph. A topological sort of the graph is carried out and attributes are evaluated or defined in topological order. In rule-based (or treewalk) methodology, the attribute rules are analyzed at compiler-generation time. A fixed (static) ordering is determined and the nodes in the dependency graph are evaluated this order. In oblivious (passes, dataflow) methodology, the attribute rules and parse tree are ignored. A convenient order is picked at compiler design time and used.
Read more: CS606 - Compiler Construction - Lecture Handout 32
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The parser needs a mechanism to pass values of attributes from definitions in one snippet
to uses in another. We will adopt notation used by YACC for snippets and passing
values. Recall that the skeleton LR(1) parser stored two values on the stack
〈symbol,state〉.
Read more: CS606 - Compiler Construction - Lecture Handout 33
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Let’s go through an example of using YACC to implement the ad-hoc scheme for an arithmetic calculator.
The YACC file for a calculator grammar is as follows:
%token NUMBER LPAREN RPAREN
Read more: CS606 - Compiler Construction - Lecture Handout 34
%token PLUS MINUS TIMES DIVIDE
%%
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
IRs fall into three organizational categories:
Read more: CS606 - Compiler Construction - Lecture Handout 35
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Three-address code is attractive for several reasons. Absence of destructive operators gives the compiler freedom to reuse names and values. Three-address code is reasonably compact: operations are 1 to 2 bytes; addresses are 4 bytes. Many modern processors implement three-address operations; a three-address code models their properties well
Read more: CS606 - Compiler Construction - Lecture Handout 36
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Here is the bottom-up parse of the assignment statement a = b*-c + b*-c and the syntax-directed translation into three-address code.
Parser action | attribute | Three- address code |
id=id ∗ –id + id ∗ –id | ||
id=E1 ∗ –id + id ∗ –id | E1.place = b | |
id=E1 ∗ –E2 + id ∗ –id | E2.place = c | |
id=E1 ∗ E2 + id ∗ –id | E2.place = t1 | t1 = – c |
id=E1 + id ∗ –id | E1.place = t2 | t2 = b∗t1 |
id=E1 + E2 ∗ –id | E2.place = b | |
id=E1 + E2 ∗ –E3 | E3.place = c | |
id=E1 + E2 ∗ E3 | E3.place = t3 | t3 = – c |
id=E1 + E2 | E2.place = t4 | t4 = b∗t3 |
id=E1 | E1.place = t5 | t5 = t2+t4 |
S | a = t5 |
Read more: CS606 - Compiler Construction - Lecture Handout 37
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Prior to proceeding with flow-of-control construct, here are the types of three-Address statements that we will use
Read more: CS606 - Compiler Construction - Lecture Handout 38
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
In programming languages, boolean expressions have two primary purposes:
Consider the grammar for Boolean expressions
E → E or E
| E and E
| not E
| ( E )
| id relop id
| true
| false
Read more: CS606 - Compiler Construction - Lecture Handout 39
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Example: consider, the boolean expression
a < b or c < d and e < f
Recall the syntax directed translation for the production
E → id1 relop id2
Read more: CS606 - Compiler Construction - Lecture Handout 40
{
E.truelist = makelist(nextquad());
E.falselist = makelist(nextquad()+1);
emit(‘if’ id1 relop id2 ‘goto _’) ;
emit(‘goto _’ );
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
We now use backpatching to translate flow-of-control statements in one pass. We will use the same list-handling procedures as before.
S → if E then S
Read more: CS606 - Compiler Construction - Lecture Handout 41
| if E then S else S
| while E do S
| begin L end
| A
L → L ; S
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
The code generation problem is the task of mapping intermediate code to machine code.
The generated code must be correct for obvious reasons. It should be efficient both in
terms of memory space and execution time.
The input to the code generation module of compiler is intermediate code (optimized or not) and its task is typically to produce either machine code or assembly language code for a target machine.
Read more: CS606 - Compiler Construction - Lecture Handout 42
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
A control flow graph is the triplet CFG = < V, E, Entry >, where V = vertices or nodes, representing an instruction or basic block (group of statements), E = (V x V) edges, potential flow of control. Entry is an element of V, the unique program entry.
Read more: CS606 - Compiler Construction - Lecture Handout 43
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Example: let us apply the algorithm to the following segment of 3-address code:
a = b + c
t1 = a * a
b = t1 + a
c = t1 * b
t2 = c + b
a = t2 + t2
Read more: CS606 - Compiler Construction - Lecture Handout 44
Related Content: CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction
Liveness information allows us to keep values in registers if they will be used later. An obvious concern is why do we assume all variables are live at the end of blocks? Why do we need to save live variables at the end? It seems reasonable to perceive that we might have to reload them in the next block. To do this, we need to determine live/next use information across blocks and not just within a block. This requires global data-flow analysis.
Read more: CS606 - Compiler Construction - Lecture Handout 45