Lexical Analyzer in Compiler Design

Estudies4you

Lexical Analyzer

Interaction between lexical analyzer and parser,Lexical analyzer, phases of compilation,compiler design jntuh, jntuh compiler design course file

  • The task of first phase of a compiler is to read the input characters of the source code and group them into sequence of characters with a collective meaning is known as token.
  • Lexical Analyzer reads the source program and performs the following tasks
Produce stream of tokens
Ignore white spaces(blank, new line, tab)
Ignore comments if any
Definition of a token:
  • The sequence of characters with a logical meaning is known as token
(or)
  • The smallest individual unit of a program is known as token
Definition of pattern rule:
  • A pattern rule is a description of the form that the lexeme of a token may take
Definition of Lexeme:
  • A lexeme is sequence of characters in the source program that matches the pattern for a token
(or)
  • The actual representation of a token
  • Each lexeme is categorized by its name called a token
  • The general form of a token is <token-name, attribute-value>
  • where token-name is an abstract symbol that is used during next phase(syntax analyzer) of a compiler and attribute-value points to an entry in the symbol table
Example:
DO 5 I = 1.12;
  • The output would be <DO> <number> <id, I> <assign_op> <number> <semicolon>
  • When the lexical analyzer recognizes tokens as identifier (id), it needs to enter into the symbol table along with their attributes
  • Lexical Analyzer is also known as Scanner
Reasons why lexical analyzer is also a scanner
  • Scanners don't require tokenization of the input, such as deletion of comments and white spaces
  • Where Lexical analyzer produces tokens from output of the scanner
Why to separate lexical analyzer and parsing?
  • Simplicity of design
  • Compiler efficiency is improved
  • Compiler portability is enhanced
Specification of a token
  • Specification of tokens can be done by using regular expressions

Identifier
Identifier is collection of alphanumeric characters and identifier beginning character should be necessarily a letter
Rules for being valid identifiers
  • The name of the identifier should not begin with a letter or any special character. For example, 1index, $currency amount_count are invalid identifiers but index1 is valid one
  • There should not be any space in the identifier name. For example, int total amount is invalid identifier
  • The name of the identifier must not be a keyword. For example, int switch is an invalid identifier

To Top