Learning Compiler -- Semantic Analysis, CodeGen and Optimization

This is the summary of Stanford Compiler Lessons on Coursera

Semantic Analysis

Semantic Analysis is used to understand the meaning of code, this is the hardest part for me.

Analyse Scope

There are two kinds of scopes, static scope and dynamic scope, most languages has static scopes

In the example language Cool, there are 5 kinds of scopes:
- Class declarations (introduce class names)
- Method definitions (introduce method names)
- Let expressions (introduce object id’s)
- Formal parameters (introduce object id’s)
- Attribute definitions (introduce object id’s)
- Case expressions (introduce object id’s)
also, methods and class names have complex scopes.
Symbol Table

A data structure that tracks the current bindings of identifiers, we can use a stack to represent it. we can first, build an abstract syntax tree, then use symbol table to check and bind things together, so if something is not right, we claim an error.
Types

A language’s type system specifies which operations are valid for which types. The goal of type checking is to ensure that operations are used with the correct types.
- The types in Cool are:
  - Class Names
  - SELF_TYPE
- How to configure type:
  - The user declares types for identifiers
  - The compiler infers types for expressions
Type Checking

using logical rules to infer the type of one expression.

Building logical blocks:
- Symbol ^ is “and”
- Symbol => is “if-then”
- x:T is “x has type T”
- |- means “it is provable that…”
Notice: Types are computed in a bottom-up pass over the AST
Type Environment

Since variables don’t have a solid type, a type environment gives types for free variables, or more precisely, a type environment is a function to transfer varaibles to Types

Let O be a function from ObjectIdentifiers to Types

The sentence O |- e: T

is read: Under the assumption that variables have the types given by O, it is provable that the expression e has the type T

Type environment is passed down the AST from the root towards the leaves, while types are computed up the AST from the leaves towards the root, notice the diffrerece here.
Subtyping

Define a relation called subtyping ≤ on classes
- X ≤ X
- X ≤ Y if X inherits from Y
- X ≤ Z if X ≤ Y and Y ≤ Z
lub(X,Y), the least upper bound of X and Y, is Z if
- X ≤ Z ^ Y ≤ Z, then Z is an upper bound
- X ≤ Z’ ^ Y ≤ Z’ => Z ≤ Z’, then Z is least among upper bounds
the least upper bound of two types is their least common ancestor in the inheritance tree
Typing Methods

M for method signatures, M(C,f) = (T1,…Tn,Tn+1) means in class C there is a method f that f(x1:T1,…,xn:Tn): Tn+1 this is mostly used in dispatch and barely used anywhere else,but it must be added to all rules.

For some cases involving SELF_TYPE, we need to know the class in which an expression appears So we also defined the current class C

which sum up the rule as O,M,C |- e: T An expression e occurring in the body of C has static type T given a variable type environment O and method signatures M
Relation of dynamic type and static type.

Soundness theorem for the Cool type system: for every random E. dynamic_type(E) ≤ static_type(E) which means all operations that can be used on an object of type C can also be used on an object of type C’≤ C
SELF_TYPE

when invoking a method, it allows the method to return type of the class(SELF_TYPE) which invoking it instead of the class contains this method. But remember the declaration of method should change to

method() : SELF_TYPE { … }
- It is a static type
- It helps the type checker to keep better track of types
- It enables the type checker to accept more correct programs
Let T and T’ be any types but SELF_TYPE, and SELF_TYPEc ≤ C
1. SELF_TYPEc ≤ SELF_TYPEc
2. SELF_TYPEc ≤ T if C ≤ T
3. T ≤ SELF_TYPEc always false
4. T ≤ T’
5. lub(SELF_TYPEc, SELF_TYPEc) = SELF_TYPEc
6. lub(SELF_TYPEc, T) = lub(C, T)
7. lub(T, SELF_TYPEc) = lub(C, T)
8. lub(T, T’) = T
SELF_TYPE is not always allowed in Cool
1. class T inherits T’ {…} where T, T’ cannot be SELF_TYPE
2. m@T(E1,…,En) where T cannot be SELF_TYPE
3. m(x : T) : T’ { … } where only T’ can be SELF_TYPE!
Error Recovery

Introduce a new type No_type for use with ill-typed expressions
- Define No_type ≤ C for all types C
- Every operation is defined for No_type (With a No_type result)

Code Generation

Runtime Organization

Assume Organization in memory like a big rectangular bookshelf.
- Layers (from head(low address) to toe(high address))
  - First Layer contains code, for most languages, fixed size and read only
  - Second Layer contains static data with fixed addresses (e.g., global data)
  - Third Layer (Stack) contains activation records for each currently active procedure
  - Bottom Layer (Heap) contains dynamic data which grows to head
Activation

An invocation of procedure P is an activation of P
- The lifetime of an activation of P is
  - All the steps to execute P
  - Including all the steps in procedures P calls
- Notice:
  - Lifetimes of procedure activations are properly nested
  - Activation lifetimes can be depicted as a tree, which is called an activation tree
  - Since activations are properly nested, a stack can track currently active procedures
Activation Records
- The information needed to manage one procedure activation is called an activation record (AR) or frame The compiler must determine, at compile-time, the layout of activation records and generate code that correctly accesses locations in the activation record. Thus, the AR layout and the code generator must be designed together!
- Activation Records usually cantains (e.g: If procedure F calls procedure G)
  - Space for G’s return value
  - F’s actural parameters
  - Pointer to the previous activation record (The control link; points to AR of caller of G)
  - Machine status prior to calling G (Contents of registers & program counters and Local variables)
  - Other temporary values
Code Generation

We simulate stack machine instructions using MIPS instructions and registers
- Stach Machine: a simple model for code generation, for an instruction r = F(a1…an) it
  - Pops n operands from the stack
  - Computes the operation F using the operands
  - Pushes the result r on the stack
- MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a reduced instruction set computer (RISC) instruction set (ISA) developed by MIPS Technologies (formerly MIPS Computer Systems, Inc.).
  - Acturally we just need to get familiar with it’s instructions and registers, konwing the def. is useless nor helpful here ;)
- MIPS registers used here are:
  - $a0 accumulator (Q: why call it a0? A: just convention. Notice it’s always on the top of stack)
  - $sp stack pointer, it contains address of the next empty location on the stack
  - $t1 temperary register
  - $fp frame pointer, a pointer to the current activation
  - $ra return address
- MIPS instructions used here are:
  - lw reg1 offset(reg2) –> load 32-bit word from address reg2 + offset into reg1, notice offset is always an integer
  - add reg1 reg2 reg3 –> (reg1 <- reg2 + reg3)
  - sw reg1 offset(reg2) –> store 32-bit word in reg1 at address reg2 + offset
  - addiu reg1 reg2 imm –> reg1 <- reg2 + imm, here imm means immediate value
  - li reg1 imm –> reg1 <- imm
  - sub reg1 reg2 reg3 –> reg1 <- reg2 - reg3
  - beq reg1 reg2 label –> go to branch label if reg1 == reg2
  - b label –> unconditional jump to label
  - jar label –> jump to label, save address of next instruction in $ra
  - jr –> jump and link, jump to address in register reg what means jump and link?
Temporaries

Temporary is a fixed space in a stack for temporary values. Assign a location in the AR for each temporary can make the code much efficient.
- Let NT(e) = # of temps needed to evaluate e:
  - NT(e1 + e2) = max(NT(e1), 1 + NT(e2))
  - NT(e1 - e2) = max(NT(e1), 1 + NT(e2))
  - NT(if e1 = e2 then e3 else e4) = max(NT(e1),1 + NT(e2), NT(e3), NT(e4))
  - NT(id(e1,…,en) = max(NT(e1),…,NT(en))
  - NT(int) = 0
  - NT(id) = 0
- For a AR that has temps, a function definition f(x1,…,xn) = e has 2 + n + NT(e) elements
  - Return address (1)
  - (old) Frame pointer (1)
  - n arguments (n)
  - NT(e) locations for intermediate results (NT(e))
Object Layout

A Cool Object contains: + Class Tag: identifies class of a object + Object Size: size of an object in words + Dispatch ptr: a pointer to the table of pointers points to methods of this object’s class

The offset for an attributes is the same in a class and all of its subclass

The method and args in a class and its subclass is always in a fixed position, even is overwritten, so in execution the args might be different in content, but they are of the same name

Optimization

Optimizing compilers repeat optimizations until no improvement is possible The optimizer can also be stopped at any point to limit compilation time Goal: Maximum benefit for minimum cost

Intermediate Language (IL)
- Intermediate code gather up tp form intermidate language, which is a language between the source and the target, the biggest differece between it and assembly code is it can use any number of registers, but in other aspects it’s almost the same as assembly code thus it is also called a high level assembly code The form of IL is: t1 := y * z. It should be used in optimization phase because:
  - it is machine independent, means it doesn’t need to consider the structure of computer the code is running on.
  - Exposes optimization opportunities clearly
- Basic block: A maximal sequence of instructions written in intermediate language with no labels (except at the first instruction), and no jumps (except in the last instruction)
- Control Flow Graph: use basic blocks as nodes and jumps as vertexs with directions
Optimization types:

There are three granularities of optimizations:
- Local optimizations (Apply to a basic block in isolation, mostly use)
- Global optimizations (Apply to a control-flow graph (method body) in isolation, many use)
- Inter-procedural optimizations (Apply across method boundaries, few use)
Local Optimization

On intermediate Language: It is the simplest form of optimization, it just need to optimize one basic block instead of the whole procedure body, it has these methods:
- Algebraic Simplification:
```
// delete useless code:
x := x + 0
x := x * 1
// simplify statements:
x := x * 0  => x := 0
y := y ** 2 => y := y * y
x := x * 8  => x := x << 3
```
- Constant Folding: (can be dangerous if applied in different machines)
```
x := 2 + 2 => x := 4
if 2 < 0 jump L      // can be deleted
```
- Constant propagation:
```
x := 3          x := 3
y := Z * w  =>  y := Z * w
q := x + y      q := 3 + y
```
- Deadcode Elimination: eliminate unreachable basic blocks
  
  E.g., basic blocks that are not the target of any jump or “fall through” from a conditional
- Single Assignment: each register occurs only once on the left-hand side of an assignment
```
// b is a fresh register
x := z + y      b := z + y
a := x      =>  a := b
x := 2 * x      x := 2 * b
```
- Common Subexpression Elimination:
```
// the values of x, y, and z do not change
// in the … code
x := y + z      x := y + z
…           =>  …
w := y + z      w := x
```
- Copy Propagation:
```
// a should not appear anywhere else in the program
x := z + y      b := z + y      b := z + y
a := x      =>  a := b      =>  x := 2 * b
x := 2 * a      x := 2 * b
```
On Assembly Code: use Peephole Optimization, replace improved version of instructions with original instructions, also need to be applied repeatedly for maximum effect.