Lab 9: Compiling Expressions

Project
November 27, 2020

In this lab you make a start with compiling ChocoPy to RISC-V code using Stratego. You translate expression statements with expressions containing constants and operators, but without variables and function calls. While it would be possible to completely compute the values of such expressions using constant folding, this is not the aim of this lab. Rather, we want the basis for a translation of expressions using variables and function calls as well.

See the slides of Lecture 12 on code generation for an approach to get started.

Objectives

  1. Develop tests for your compiler that explore the edge cases.

  2. Define a transformation strategy that translates ChocoPy constant expressions to RISC-V code. I.e. including integer and boolean constants and operators.

  3. Develop a simplification transformation that transforms expressions such that more concise code can be generated.

RISC-V Resources

Consult the following resources as documentation on the RISC-V architecture and instruction set:

Basic Compiler Pipeline

Follow the slides of Lecture 12 to set up a basic compiler pipeline for translating statement expressions and print the resulting value.

  program-to-rv32im:
    ast@Program(definitions, [stat]) -> Program(instrs2)
    where
      a := <stx-get-ast-analysis>
      ; <stat-to-instrs(|"a1", <registers>)> stat => instrs1
      ; instrs2 := <concat> [
          [PSText()]
        , instrs1
        , [Li("a0", "1"),
           Ecall()]
        ]

Note that you will eventually have to extend this rule to cover all of ChocoPy.

The stat-to-instrs strategy transforms statements. For now, we are only defining it for expression statements:

  stat-to-instrs(|r, regs) :
    Exp(e) -> instrs
    where <exp-to-instrs(|r, regs)> e => instrs

The strategy is parameterized with a register in which it expects the value of the expression to be stored and a list of registers that can be used as temporaries. For now this is defined as follows:

  registers = !["t0", "t1", "t2", "t3", "t4", "t5", "t6"]

That is, the t registers can all be used as temporaries. You will probably have to adjust this in the future.

Compiler Testing

Develop tests in SPT to test your compiler.

module test

language chocopy

test 1+1 [[
1+1
]] transform "Execute file" to "2"

Translating Expressions

Define a strategy that transforms expressions to lists of instructions. The example rules below show how to do this for integer constants and integer additions. These rules use registers to store the intermediate values of sub-expressions, by drawing from a list of registers. Note that this a naive approach that does not scale to arbitrary expressions. You will have to discover the limitations and figure out how to overcome those.

Debugging

The following strategy definition defines exp-to-instrs in terms of an auxiliary rule exp-to-instrs-:

  exp-to-instrs(|r, regs) =
    exp-to-instrs-(|r, regs)
    <+ (debug(!"exp-to-instr fails: "); fail)

This is a useful pattern in Stratego, as it catches a possible failure of the exp-to-instrs- rule and reports it in the Eclipse Console. For example, this strategy will report when you apply the compiler to an operator that is not (yet) supported.

You can also debug your generated code, by creating a .cpy file, transforming it to an .rv32im file using the Spoofax > Generation menu, and copying the generated output to the online Venus editor, where you can use the Simulator tab to run your code or step through each line, and inspect the memory and registers.

Translating Constants

Integer constants are translated to instructions that load the constant value into a register:

  exp-to-instrs-(|r, regs) :
    Int(i) -> [Li(r, i)]

Can other types of constants be treated similarly?

Translating Operators

Operators combine expressions into new expressions. In order to translate an operator, its sub-expressions should be translated recursively the lists of instructions. Furthermore, while the value of the second sub-expression is computed, the value of the first sub-expression should be stored. This typically requires a new register. Thus, integer addition can be translated as follows:

  exp-to-instrs-(|r, regs@[r2 | regs']) :
    add@Add(e1, e2) -> <concat> [
        instrs1
      , instrs2
      , [Add(r, r, r2)]
    ]
    where <stx-get-ast-analysis> add => a
        ; <get-type(|a)> add => INT()
        ; <exp-to-instrs(|r, regs)> e1 => instrs1
        ; <exp-to-instrs(|r2, regs')> e2 => instrs2

Note that the list of registers is used to obtain a fresh register.

Define transformation rules for all operators of ChocoPy expressions.

Special Cases

RISC-V provides specialized instructions for some operations. For example, the addi instruction allows directly adding an integer constant (between -2048 and 2047) to a register. A compiler can make use of such instructions, by detecting special patterns in the source language. For example, the following rule (when listed before the general rule for addition above), detects additions with an integer constant, and translates those to applications of addi, avoiding the use of an extra register.

  exp-to-instrs-(|r, regs) :
    add@Add(e, Int(i)) -> <concat> [
        instrs
      , [Addi(r, r, i)]
    ]
    where <gtS>(i, "-2049"); <ltS>(i, "2048")
        ; <stx-get-ast-analysis> add => a
        ; <get-type(|a)> add => INT()
        ; <exp-to-instrs(|r, regs)> e => instrs

Can you detect other specialized instructions and corresponding source language patterns that provide a more concise and/or faster target code?

Simplification

To improve the result of code generation, it can be useful to transform the source language expression. For example, left-associative additions produce a better result with the rules for addition above. Another useful transformation is to type specialize the constructors of the AST, such that the analysis results are no longer needed. That is useful when applying transformations, since preserving those annotations cannot be done for all transformations. The following simplify-all transformation should be invoked on the AST before invoking the compiler transformation.

signature
  constructors
    AddInt : Exp * Exp -> Exp

rules

  simplify-all =
    innermost(type-specialize <+ simplify)

  type-specialize :
    add@Add(e1, e2) -> AddInt(e1, e2)
    where <stx-get-ast-analysis> add => a
        ; <get-type(|a)> add => INT()

  simplify :
    AddInt(e1, AddInt(e2, e3)) -> AddInt(AddInt(e1, e2), e3)

Note that the rules for translating integer additions should be adapted to reflect the change in constructors.

This simplification can include constant folding, eventually. However, do not yet include constant folding rules, since you want to test the translation of operators. That is, your code generator should be able to translate arbitrary combinations of operators, so be prepared for the general case.

Think about (and try out in the online compiler!) the differences between adding integers and concatenating strings in RISC-V. In ChocoPy, they initially both use the Add(...) constructor, so make sure to disambiguate them into separate constructors you will later use for code generation. The same holds for other operations, but you have to think about those yourself.

Booleans

In ChocoPy we compile the Booleans True and False to integers in RISC-V (1 and 0, respectively). Implement the Boolean operators and, or and not, and integer comparison operators ==, !=, <, >, <=, >= (you can ignore is for now since it operates on objects).

You can make use of the online compiler, or the RISC-V instruction set to find the proper instructions in RISC-V.

Also implement the ternary operator ... if ... else .... (See Short-circuit Boolean operations to get an idea on how to implement it.)

Challenges

Running out of registers

The approach sketched above does not generalize to arbitrary expressions. Create test cases that require more temporary registers than are available.

Start thinking about solutions for this limitation. Two standard solutions are using the stack for temporaries (which is slow since it involves writing to and reading from memory), and register allocation (which requires a program analysis on the generated code).

Short-circuit Boolean operations.

When encountering a Boolean operation of the form False and ..., do we really care about the right side of the operator? Similarly, when we encounter True or ...?

Think about how we can use conditional jumps in RISC-V to short-circuit (any) Boolean operators. Have a look at the instruction set and online compiler.

A conditional jump in RISC-V takes a label, which represents an address in memory. In Stratego, labels can be generated using <newname> "L" => label. (Note that "L" can be anything you want, as long as it is a string.)