YACC – Automatic Parser
Generator
- YACC is a automatic tool that generates the parser program
- YACC stands for Yet Another Compiler Compiler. This program is available in UNIX OS
- The construction of LR parser requires lot of work for parsing the input string. Hence, the process must involve automation to achieve efficiency in parsing an input
- Basically YACC is a LALR parser generator that reports conflicts or uncertainties (if at all present) in the form of error messages
- The typical YACC translator can be represented as shown in the image
YACC Specification
Ambiguous Grammars in YACC
Error Recovery in YACC
YACC Specification:
The YACC specification file consists of three parts
Declaration section: In this section, ordinary C declarations
are inserted and grammar tokens are declared. The tokens should be declared
between %{ and %}
Translation rule section
It includes
the production rules of context free grammar with corresponding actions
Example:
Rule-1 action-1
Rule-2 action-2
:
:
Rule n action n
If there is more than one alternative to a single rule then
those alternatives are separated by ‘|’ (pipe) character. The actions are
typical C statements. If CFG is
LHS: alternative 1
| alternative 2 | …… alternative n
Then
LHS: alternative
1 {action 1}
|
alternative 2 {action 1}
:
:
alternative
n {action n}
C functions Section: this consists of one main function in
which the routine yyparse() is called. And it also contains required C
functions
The specification file comprising these sections can be
written as:
%{
/ * declaration
section */
%}
/* Translation rule section */
%%
/* Required C functions */
Example:
YACC Specification of a simple desk calculator:
%{
#include <ctype.h>
%}
%token DIGIT
%%
line: expr ‘\n’ {
printf(“%d\n”, $1); }
;
expr : expr ‘+’ term {
$$ = $1 + $3; }
| term
;
term : term ‘*’ factor {
$$ = $1 * $3; }
|
factor
;
factor : ‘(‘ expr ‘)’ {
$$ = $2; }
| DIGIT
;
%%
yylex() {
int
c;
c
= getchar();
if(isdigit(c)
{
yylval
= c-‘0’;
return
DIGIT;
}
return c;
}
Ambiguous Grammars in
YACC
YACC declarations resolve shift/reduce and reduce/reduce
conflicts using operator precedence and operator associativity information
YACC has default methods for resolving conflicts. However,
it is better to find out what conflicts arose and how they were resolved using
the ‘-v’ option
The declarations provide a way to override YACCs defaults
Productions have the precedence of their rightmost terminal,
unless otherwise specified by “%prec” element
The declaration keywords %left, %right and %nonassoc inform
YACC that the following tokens are to be treated as left-associative (as binary
+ - * & / commonly are), right-associative (as exp often is), or
non-associative (as binay < & > often are)
The order of declarations informs YACC that the tokens
should be accorded increasing precedence
%left ‘+’ ‘-‘
|
Effect is that * has higher precedence
|
%left ‘*’ ‘I’
|
Than +, so x+y*z is grouped like x+(y*z)
|
YACC Specification for a more advanced desk calculator:
%{
#include <ctype.h>
#include <stdiio.h>
#define YYSTYPE double /*
double type for YACC stack */
%}
%token NUMBER
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“%g\n”, $2); }
| lines
‘\n’
| /*
empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-‘
expr { $$ = $1 - $3; }
| expr ‘*’
expr {$$ = $1 * $3; }
| expr ‘/’
expr {$$ = $1 / $3; }
| ‘(‘
expr ‘)’ {$$ = $2;}
| ‘-‘
expr %prec UMINUS { $$ = -$2; }
|
NUMBER
;
%
yylex() {
int c;
while
(( c = getchar() ) == ‘ ‘ );
if ((c
== ‘.’) || (isdigit(c) ) {
unget(c,
stdin);
scanf(“%1f”,
&uulval);
return
Number;
}
return c;
}
Error Recovery in
YACC
In YACC, error recovery is performed with the help of error
productions
To process errors at the level of a nonterminal A, add an
error production A →
error α. Here, error is
a YACC reserved word, and α
is a string of grammar symbols (may be empty)
If YACC encounters an error during parsing, it continues to
pop symbols from its stack until it finds the topmost state on its stack, whose
underlying set of items includes an item of the following form
A
→ .error α
The parser then shifts the false token error onto the stack
as though it saw the token error of the input
When α
is ε, reduction to A
occurs immediately and the semantic action associated with the production A → error (which might be a
user-specified error recovery routine) is invoked. The parser then discards the
input symbols until it finds an input symbol on which normal parsing can
proceed
If α
is not empty, YACC skips the input and looks for a substring that can be “reduced”
to α on the stack. Now,
the parser has an error α
on top of its stack, reducing it to A. The parser then resumes normal parsing
YACC specification contains a translation rule, for example
lines : error ‘\n’ { yyerror(“reenter previous line:”);
yyerrok; }
Desk Calculator with Error Recovery:
%{
#include <ctype.h>
#include <stdiio.h>
#define YYSTYPE double /*
double type for YACC stack */
%}
%token NUMBER
%left ‘+’ ‘-‘
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“%g\n”, $2); }
| lines
‘\n’
| /*
empty */
| error
‘\n’ { yyerror(“reenter previous line”);
Yyerrok;
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-‘
expr { $$ = $1 - $3; }
| expr ‘*’
expr {$$ = $1 * $3; }
| expr ‘/’
expr {$$ = $1 / $3; }
| ‘(‘
expr ‘)’ {$$ = $2;}
| ‘-‘
expr %prec UMINUS { $$ = -$2; }
|
NUMBER
;
%
#include “lex.yy.c”