**Related Content:** CS606 - VU Lectures, Handouts, PPT Slides, Assignments, Quizzes, Papers & Books of Compiler Construction

A top-down parser starts with the root of the parse tree. The root node is labeled with the goal (start) symbol of the grammar. The top-down parsing algorithm proceeds as follows:

- Construct the root node of the parse tree
- Repeat until the fringe of the parse tree matches input string
- At a node labeled A, select a production with A on its lhs
- for each symbol on its rhs, construct the appropriate child
- When a terminal symbol is added to the fringe and it does not match the fringe, backtrack

The key is picking right production in step a. That choice should be guided by the input string. Let’s try parsing using this algorithm using the expression grammar.

x – 2 * y

This worked well except that “–” does not match “+”. The parser made the wrong choice of production to use at step 2. The parser must backtrack and use a different production.

This time the “–” and “–” matched. We can advance past “–” to look at “2”. Now, we need to expand “term”

The 2’s match but the expansion terminated too soon because there is still unconsumed input and there are no non-terminals to expand in the sentential form Þ Need to backtrack.

This time the parser met with success. All of the input matched.

Consider another possible parse:

Parser is using productions but no input is being consumed.

Top-down parsers cannot handle left-recursive grammars. Formally, a grammar is left recursive if $ A Î NT such that $ a derivation A Þ* A a, for some string a Î (NT È T)*.

Our expression grammar is left recursive. This can lead to non-termination in a top-down parser. Non-termination is bad in any part of a compiler! For a top-down parser, any recursion must be a right recursion. We would like to convert left recur sion to right To remove left recursion, we can transform the grammar. Consider a grammar fragment:

| b

where neither a nor b starts with A. We can rewrite this as:

A' ? a A'

| e

where A' is a new non-terminal. This grammar accepts the same language but uses only
right recursion. The expressio n grammar we have been using contains two cases of leftrecursion.

Applying the transformation yields

expr ? term expr'

expr' ? + term expr'

| – term expr'

term ? factor term'

term' ? * factor term'

| / factor term'

| e

These fragments use only right recursion. They retain the original left associativity. A top-down parser will terminate using them.

If a top down parser picks the wrong production, it may need to backtrack.

Alternative is to look-ahead in input and use context to pick the production to use
correctly. How much look-ahead is needed? In general, an arbitrarily large amount of
look-ahead symbols are required.. Fortunately, large classes of CFGs can be parsed with
limited lookahead. Most programming languages constructs fall in those subclasses

The basic idea in predictive parsing is: given A? a | b, the parser should be able to choose between a and b. To accomplish this, the parser needs FIRST and FOLLOW sets.

**Definition: FIRST sets: **for some rhs a Î G, define FIRST(a) as the set of tokens that
appear as the first symbol in some string that derives from a. That is, x Î FIRST(a) iff
a Þ* x g, for some g.