Generic Code Generation Algorithm
Assume that for
each operator in the statement, there is a corresponding target language
operator
The computed
results can be left in registers as long as possible, storing them only if the
register is needed for another computation or just before a procedure call,
jump or labeled statement
Register and Address Descriptors
These are the
primary data structures used by the code generator. They keep track of what
values are in each registers, as well as where a given value resides
- Each register has a register descriptor containing the list of
variables currently stored in this register. At the start of the basic
block, all register descriptors are empty. It keeps track of
recent/current variable in each register. It is constructed whenever a new
register is needed
- Each variable has an address descriptor containing the list of
locations where this variable is currently stored. Possibilities are its
memory location and one or more registers
- The memory location might be in the static area, the stack, r
presumably the heap. The register descriptors can be computed from the
address descriptors. For each name of the block, an address descriptors is
maintained that keep track of location where current value of name is
found at runtime
There are
basically three aspects to be considered in code generation:
Choosing
registers
Generating
instructions
Managing
descriptors
Register
allocation is done in a function getReg(Instruction). The instruction
generation algorithms uses getReg() and the descriptors
The generation
of machine instruction is done as follows:
Given a TAC, OP x, Y(i.e., x=x
OP y), generation of machine instructions proceeds as follows:
Step 1: Call
getReg(OP, x, y) to get Rx and Ry, the registers to be
used for x and y respectively. GetReg merely selects the registers, it does not
gurantee that the desired values are present in these registers.
Step 2: Check
the register descriptor for Ry. If y is not present in Ry, check the
address descriptor for y and issue LD Ry, y
Step 3: Similar
treatment is done for Rx
Step 4: Generate
the instruction OP Rx, Ry
When the TAC is
x = y, step 1 and step 2 are same (getReg() will set Rx =Ry). Step 3 is empty
and step 4 is omitted. If ‘y’ was already in a register before the copy
instruction, no code is generated at this point. Since the value of ‘y’ is not
in its memory location, we may need to store this value back into ‘y’ at block
exit.
All variables
needed by (dynamically) subsequent blocks (i.e., that live-on-exit) have their
current values in their memory locations. Such live variables are identified as
follows:
- Temporaries never live beyond a basic block. Hence, they are
ignored
- Variables dead on exit are also ignored
- All live on exit variables need to be stored in their memory
location on exit from the block. So, check the address descriptor for each
live on exit variable. If its own memory location is not listed, generate
ST X, R, where R is a register listed in the address descriptor
The management
of register and address descriptor is performed as follows:
- For a register R, let Desc(R) be is register descriptor. For a
program variable x, let Desc(x) be its address descriptor. The management
of descriptor for load, store, operation and copy are given below
Load: LD R, x
Desc(R) =
x(removing everything else from Desc(R))
Add R to Desc(x)
(leaving alone everything else in Desc(x))
Remove R from
Desc(w) for all w ≠ x
Store: ST x, R
Add the memory
location of x to Desc(x)
Operation: OP Rx, Ry implementing the quas OP x, y
Desc(Rx)
= x
Desc(x) = Rx
After operation
Rx has Rx OP Ry
Copy: for x = y after processing the load
Add x to Desc(Ry)
(note y not x)
Desc(x) = Ry
Minimize the
number of registers used:
- When a register holds a temporary value and there are no
subsequent uses for this value, we reuse that register
- When a register holds the value of a program variable and there
are no subsequent uses of this value, we reuse that register providing
this value also in the memory location for the variable
- When a register holds the values of a program variable and all
subsequent uses of this value are preceded by a redefinition, we could
reuse this register. But to know about all subsequent uses, one may
require live/dead-on-exit knowledge
Assume a, b, c
and d are program variables and t, u, v are compiler generated temporaries.
These are represented as t$1, t$2 and t$3. The code generated for different
TACs is given below:
t = a –b
LD R1, a
LD R2, b
SUB R1, R2
U = a – c
LD R3, a
LD R2, c
SUB R3, R2
v = t + u
ADD R1, R3
a = d
LD R2, d
ST a, R2
d = v + u
ADD R1, R3
ST d, R1
Exit