Phases of a Compiler

Estudies4you

Phases of a Compiler

  • Phase is sequence of statements or steps, that takes source program in one representation and generates code in another representation
  • Compiler operates in phases, as it is difficult to generate the target code directly
  • A compiler is basically divided into two parts. They are
Analysis part
Synthesis part
  • The analysis part is often called front-end of a compiler and synthesis part is often called back-enc of a compiler
  • Analysis part breaks up the source program into constituent pieces and create an intermediate representation of the source code known as intermediate code
  • Analysis part is also collects information about the source program and stores it in a data structure called a symbol table
  • Analysis part includes 3 phases of a compiler, that is lexical analyzer, syntax analyzer and semantic analyzer.
  • Synthesis part constructs desired target code from an intermediate and the information in the symbol table
  • Synthesis part includes 3 phases of a compiler, that is intermediate code generation, code optimization and code generation
  • Intermediate code generation will act as interface between analysis and Synthesis parts.

Lexical Analyzer
  • It is a first phase of a compiler
  • Lexical Analyzer is also known as scanner
  • Lexical Analyzer reads the stream of characters from left to right and groups the characters into meaningful sequences called lexeme
  • Roles of a Lexical Analyzer
Produce the tokens
Ignore all white spaces in a source code
Ignore comments in a source code, if any
  • The general form of a token is <token-name, attribute-value> where token-name is an abstract symbol that is used during next phase(syntax analyzer) of a compiler and attribute-value points to an entry in the symbol table

Example:
result = number1 + number2 * 50
After lexical analyzer, the above statement would be
<id,1> <=> <id,2> <+> <id,3> <*> <number>
In this representation the token names=,+,* are abstract symbols for the assignment, addition, multiplication respectively
System.out.println("compiler design");
This statement would be
<system> <.> <out> <.> <println> <(> <string> <)> <;>
here .,(,),; are abstract symbols for dot, left parenthesis, right parenthesis, semicolon respectively

Syntax Analyzer
  • It is a second phase of compiler
  • It is also known as parser or hierarchical analysis
  • Parser collects the sufficient tokens which are generated by lexical analyzer as its input and generates parse tree as its output
  • Parser checks the source code i.e., whether it is syntactically correct or not. Here the meaning of a source code is not checked
  • The role of a syntax analyzer is to check the syntax errors in a source program

Example:

id1: = id2+id3*50
where id1, id2, id3, 50 are external nodes
and :=,+,* are internal nodes





Semantic Analyzer
  • Semantic Analyzer takes parse tree as its input and construct meaningful parse tree as its output
  • The semantic analyzer uses the parse tree and the information in the symbol table to check the source code for meaning
Roles of a Semantic Analyzer
  • Checks the meaning of a source program
  • Construct syntax tree
  • Construct DAG (Directed Acyclic Graph)
  • Saves the information in the symbol table
  • Type Checking
  • Type Conversion
  • Evaluation of an expression
  • Check variable is declared or not

Intermediate Code Generation
  • Intermediate is the interface between front-end and back-end of a compiler
  • Intermediate code generation takes a parse tree as its input and construct one or more intermediate representations, which can have a variety of forms
Roles of Intermediate Code Generation
  • Construct intermediate representation of the source program called intermediate code
  • Saves information in he symbol table and collects the required information from symbol table
  • Intermediate representation should have two important properties
It should be easy to produce
It should be easy to translate in to the target machine
Example:
t1:=int to float(50)
t2:=id3*t1
t3:=id2+t2
t3:=id2+t2
where t1,t2,t3 are temporary variables generated by a compiler automatically

There are certain properties, which should be possessed by the three address code and those are
  • Each three address instruction has at the most one operator in addition to the assignment.
  • The compiler must generate a temporary name to hold the value computed by each instruction.
  • Some Three address instructions may have fewer than three operands for example first and last instruction of above given three address code. i.e.,
t1:=int to float(10)
total:=t3

Code Optimization
  • It is optional phase of a compiler
  • The process of reducing the number of instructions without changing the meaning of source program is known as code optimization
  • Code Optimization takes intermediate code as its input and generates optimized intermediate codes as its output if possible
  • The role of a code optimization is to improve the intermediate code so that the running time of the target program can be significantly improved
Example:
t1:=id3*50.0
id1:=id2+t1
The optimization can deduce the conversion of 50 from integer to floating point can be done once and for all at compile time, so the int to float operation can be eliminated by the integer by the floating point number 50.0

Code Generation:
  • It is a final phase of a compiler
  • The code generator takes intermediate or optimized intermediate code and required information in symbol table as its input and maps it into the targe code
  • If the output of a code generator is machine code, registers or memory location are selected for each of the variables used by the program
  • Then, the intermediate instructions are translated into sequences of machine instructions that perform the same task
Example:


Symbol Table:
  • An essential function of compiler is to record the variable names used in the source program and collect information about various attributes of each name
  • Attributes may be name, its type, its size, its memory location, its scope and in the case of procedure names, such as the number and types of its arguments, the method of passing each argument
  • The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name
  • The data structure allows the compiler to find the record quickly and to store or receive data from that record
Example
Translation of result = number1 + number2 * 50
The phases of this statement are:
Each phase can encounter errors
The lexical analyzer can encounter lexical errors, syntax analyzer can encounter syntax errors and semantic analyzer can encounter semantic errors

Error Detection and Handling
  • In compilation, each phase detects errors, these errors must be reported to error handler whose task is to handle the errors so that the compilation can proceed
  • The errors are reported in the form of a message, when the input characters from the input do not form the token, the lexical analyzer detects it as error
  • Syntax analysis phase large number of errors can be detected. Such errors are popularly called syntax errors
  • Semantic analysis phase, type mismatch kind of error is usually detected

To Top