# **Cross-ISA Testing of the Pharo** VM

## Lessons learned while porting to ARMv8 64bits **Tool Paper – MPLR'21**









### Guille Polito, Stéphane Ducasse, Pablo Tesone, Théo Rogliano, Pierre Misse-Chanabier, Carolina Hernandez, Luc Fabresse **RMoD Team — Inria Lille Nord Europe — UMR9189 CRIStAL — CNRS**











object Jit Sompiter hard tum bas: Stuck nother Native Code tone



# **Some Numbers**

- 255 bytecodes (77 different) + ~340 primitives/native methods
- 146 different IR instructions
- polymorphic inline caches
- threaded code interpreter
- generational scavenger GC



Lots of combinations!

# **Objective: Implementing an ARM64 Backend**

- ARM64 is now pervasive:
  - New Apple M1
  - Raspberry Pi 4
  - Microsoft Surface Pro X
  - PineBook Pro

- move r1 #1
- move r2 #17

- add r3 r1 r2
- move r1 r3
- ret



checkSmallInt r1

checkSmallInt r2

checkSmallInt r3

JIT compiler IR



## **Targeting Real Hardware** Challenges

- How to do a partial implementation, in an iterative way?
- Hardware availability: did not have access to an Apple M1 yet
- Slow Change-Compile-Test cycle
- **Bug reproduction** is a demanding task



# **Execution Mode Comparison**

|                         | Real Hardware<br>Execution |  |
|-------------------------|----------------------------|--|
| Feedback-cycle<br>speed | Very low                   |  |
| Availability            | Low                        |  |
| Reproducibility         | Low                        |  |
| Precision               | High                       |  |
| Debuggability           | Low                        |  |



# **Simulation Environment**

| Simul | ation Enviror | nment (Pharo)        |
|-------|---------------|----------------------|
|       |               | VM                   |
|       |               | Interpreter          |
|       |               | Heap                 |
|       |               | Native Code Cache    |
|       |               |                      |
|       | Unicorn       | LLVM<br>Disassembler |





Two decades of smalltalk vm development: live vm development through simulation tools.



### Miranda et al. VMIĽ18



# **Simulation Environment Comparison**

|                         | Real Hardware<br>Execution | Full-System<br>Simulation |  |
|-------------------------|----------------------------|---------------------------|--|
| Feedback-cycle<br>speed | Very low                   | Low                       |  |
| Availability            | Low                        | High                      |  |
| Reproducibility         | Low                        | Low                       |  |
| Precision               | High                       | Low                       |  |
| Debuggability Low       |                            | High                      |  |



## **Unit Testing Infrastructure Extending the simulation environment**







### testPushConstantZeroBytecodePushesASmallIntegerZero

### self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

self assert: self popAddress equals: (memory integerObjectOf: 0)





Reusable test fixtures covering e.g.,

- trampoline and stub compilation
- heap initialization

### testPushConstantZeroBytecodePushesASmallIntegerZero

self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

self assert: self popAddress equals: (memory integerObjectOf: 0)





Reusable test fixtures covering e.g.,

- trampoline and stub compilation

- heap initialization

### testPushConstantZeroBytecodePushesASmallIntegerZero

self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

self assert: self popAddress equals: (memory integerObjectOf: 0)



**Compiler internal DSL** 



Reusable test fixtures covering e.g.,

- trampoline and stub compilation

- heap initialization

### testPushConstantZeroBytecodePushesASmallIntegerZero

### self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

### self assert: se

JIT Execution helpers such as e.g., - run all code between two addresses - run until the PC hits an address



**Compiler internal DSL** 

integerObjectOf: 0)



## VM Unit Testing Lessons **Insights: Black box testing**

testPushConstantZeroBytecodePushesASmallIntegerZero

self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

**Depend only on** observable behaviour

**Reusable** on different backends / architectures

## self assert: self popAddress equals: (memory integerObjectOf: 0)

**Resistant to changes** in the implementation



## VM Unit Testing Lessons **Insights: Cross-compile / Cross-execute**

testPushConstantZeroBytecodePushesASmallIntegerZero

self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

Hardware independent



### self assert: self popAddress equals: (memory integerObjectOf: 0)

**Parametrizable tests** 



## VM Unit Testing Lessons **Insights: Start Small**

testPushConstantZeroBytecodePushesASmallIntegerZero

self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode.

self assert: self popAddress equals: (memory integerObjectOf: 0)

First: The simplest test, the simplest feature

Second: the next simplest test

Focus on enhancing the testing infrastructure



# **Unit Testing Infrastructure Comparison**

|                         | Real Hardware<br>Execution | Full-System<br>Simulation | <b>Unit-Testing</b> |  |  |
|-------------------------|----------------------------|---------------------------|---------------------|--|--|
| Feedback-cycle<br>speed | Very low                   | Low                       | High                |  |  |
| Availability            | Low                        | High                      | High                |  |  |
| Reproducibility         | Low                        | Low                       | High                |  |  |
| Precision               | High                       | Low                       | Low<br>High         |  |  |
| Debuggability           | Low                        | High                      |                     |  |  |



# There is no silver bullet

- Simulators are cheap, but not 100% trustworthy
- Full execution (simulated or on real HW)
  - more expensive to run
  - cannot unit-test it (less controllable)
- Unit tests only exercise specific scenarios
- Full executions exercise not yet covered scenarios



# **Our testing Workflow**

- Simulate the execution, less than you run tests
- Run the real app, less than you simulate

- Go back and forth:
  - Turn full execution failures into tests
  - Fix with the aid of the test: => unit test are faster to run => easier to debug => detect regressions





## Case Study 1 Porting the Cogit JIT Compiler to ARM64

- Started with no tests and no hardware (main target Apple M1)
- Incremental test development: bytecode, native methods, PICs, code patching
- All tests run from the beginning on our four targets: x86, x86-64, ARM32 and ARM64
- Test allowed safe modifications in the IR to support e.g., ARM64 Multiplication overflow
- ARM64 specific tests covered stack alignment, W+X ...



## Case Study 2 **Ongoing Port to RISCV64**

- Currently under development
- Is our harness test suite enough to develop a new backend?
- Are our tests general enough?

- Future work on: Hardware-based security enforcement

# Collaboration with Q. Ducasse, P. Cortret, L. Lagadec from ENSTA Bretagne



## Case Study 3 **Debugging and Testing Memory Corruptions**

- Bug report using Ephemerons https://github.com/pharo-project/pharo/issues/8153
- Starting the other way around
  - First reproducing the bug in real-hardware => long to execute (even longer in simulation) => required manual developer intervention
  - Then building a unit test from observations
  - Test becomes a part of the regression test suite

# **Future Perspectives**

## Automatic VM Validation

- Automatic (Unit?) Test Case Generation
- Interpreter vs Compiler Differential Testing
- VM Tailored Multi-level Debugging

# se Generation fferential Testing bugging

## **Cross-ISA Testing of the Pharo** VM Lessons learned while porting to ARMv8 64bits

|                         | Real Hardware<br>Execution | Full-System<br>Simulation | Unit |
|-------------------------|----------------------------|---------------------------|------|
| Feedback-cycle<br>speed | Very low                   | Low                       |      |
| Availability            | Low                        | High                      |      |
| Reproducibility         | Low                        | Low                       |      |
| Precision               | High                       | Low                       |      |
| Debuggability           | Low                        | High                      |      |





| t-Testing |  |  |  |  |
|-----------|--|--|--|--|
| High      |  |  |  |  |
| High      |  |  |  |  |
| High      |  |  |  |  |
| Low       |  |  |  |  |
| High      |  |  |  |  |



## **Debugging a compiler** Insights: build your own tools, based on needs, not desires

### Examples:

- Machine code
  debugger
- Bytecode-IR visualization
- Disassembler DSL



### × – 🗆

| IR Instructions                              | ۸ | Address    |
|----------------------------------------------|---|------------|
| ' (PopR 10 13503 810113)'                    |   | 16r1000000 |
| ' (Label 1)'                                 |   | 16r1000004 |
| ' (TstCqR 7 10 757D93)'                      |   | 16r1000008 |
| ' (JumpNonZero (Label 2) 20D9063)'           |   | 16r100000C |
| ' (MoveMwrR 0 10 22/16 53B03)'               |   | 16r1000010 |
| ' (AndCqR 4194295/3FFFF7 22/16 no mcode)'    |   | 16r1000014 |
| ' (JumpNonZero (Label 2) D9663)'             |   | 16r1000018 |
| ' (MoveMwrR 8 10 10 853503)'                 |   | 16r100001C |
| ' (Jump (Label 1) FE1FF06F)'                 |   | 16r1000020 |
| ' (Label 2)'                                 |   | 16r1000024 |
| ' (MoveMwrR 0 2 23/17 13B83)'                |   | 16r1000028 |
| ' (Label 3)'                                 |   | 16r100002C |
| ' (TstCqR 7 23/17 7BFD93)'                   |   | 16r1000030 |
| ' (JumpNonZero (Label 4) 20D9063)'           |   | 16r1000034 |
| ' (MoveMwrR 0 23/17 22/16 BBB03)'            |   | 16r1000038 |
| ' (AndCqR 4194295/3FFFF7 22/16 no mcode)'    |   | 16r100003C |
| ' (JumpNonZero (Label 4) D9663)'             |   | 16r1000040 |
| ' (MoveMwrR 8 23/17 23/17 8BBB83)'           |   | 16r1000044 |
| ' (Jump (Label 3) FE1FF06F)'                 |   | 16r1000048 |
| ' (Label 4)'                                 |   | 16r100004C |
| ' (CmpRR 10 23/17 41750DB3)'                 |   | 16r1000050 |
| ' (JumpNonZero (MoveCqR 16856080/1013410 10  |   | 16r1000054 |
| ' (MoveCqR 16856096/1013420 10 1013537 42050 |   | 16r1000058 |
| ' (Jump (MoveRMwr 10 0 2 A13023) C0006F)'    | v | 16r100005C |

Jump to

|                 | VM           | Deb | ougger |       |        |              |   |    |            |     |
|-----------------|--------------|-----|--------|-------|--------|--------------|---|----|------------|-----|
| ASM             | Bytes        | Δ   |        |       |        |              | ^ |    |            |     |
| ld a0, 0(sp)    | #[3 53 1 0]  |     | lr     |       |        | '16r1001000' |   | SP | 16r1002FE8 | 16r |
| addi sp, sp, 8  | #[19 1 129 ( | )]  | рс     |       |        | '16r1000'    |   |    | 16r1002FF0 | 16r |
| andi s11, a0, i | #[147 125 1  | 1   | sp     |       |        | '16r1002FE8' |   |    | 16r1002FF8 | 16r |
| bnez s11, 32    | #[99 144 13  | 1   | fp     |       |        | '16r1003000' |   | FP | 16r1003000 | 16r |
| ld s6, 0(a0)    | #[3 59 5 0]  |     | x0     | zero  |        | '16r0'       |   |    | 16r1003008 | 16r |
| lui t0, 1024    | #[183 2 64 ( | )]  | x1     | ra    |        | '16r1001000' |   |    | 16r1003010 | 16r |
| addiw t0, t0, · | #[155 130 1  | 1   | x2     | sp    | sp     | '16r1002FE8' |   |    | 16r1003018 | 16r |
| and s6, s6, t0  | #[51 123 91  | .(  | х3     | gp    |        | '16r0'       |   |    | 16r1003020 | 16r |
| bnez s11, 12    | #[99 150 13  | (   | x4     | tp    |        | '16r0'       |   |    | 16r1003028 | 16r |
| ld a0, 8(a0)    | #[3 53 133 ( | )]  | x5     | t0    | ip1    | '16r0'       |   |    | 16r1003030 | 16r |
| j -32           | #[111 240 3  | 1   | х6     | t1    | ip2    | '16r0'       |   |    | 16r1003038 | 16r |
| ld s7, 0(sp)    | #[131 59 1 ( | )]  | x7     | t2    |        | '16r0'       |   |    | 16r1003040 | 16r |
| andi s11, s7, 7 | #[147 253 1  | 2   | x8     | s0(fp | o) fp  | '16r1003000' |   |    | 16r1003048 | 16r |
| bnez s11, 32    | #[99 144 13  | 1   | х9     | s1    |        | '16r0'       |   |    | 16r1003050 | 16r |
| ld s6, 0(s7)    | #[3 187 11 ( | )]  | x10    | a0    | arg0   | '16r0'       |   |    | 16r1003058 | 16r |
| lui t0, 1024    | #[183 2 64 ( | )]  | x11    | al    | argl   | '16r0'       |   |    | 16r1003060 | 16r |
| addiw t0, t0, · | #[155 130 1  | 1   | x12    | a2    | carg0  | '16r0'       |   |    | 16r1003068 | 16r |
| and s6, s6, t0  | #[51 123 91  | .(  | x13    | a3    | carg1  | '16r0'       |   |    | 16r1003070 | 16r |
| bnez s11, 12    | #[99 150 13  | (   | x14    | a4    | carg2  | '16r0'       |   |    | 16r1003078 | 16r |
| ld s7, 8(s7)    | #[131 187 1  | 3   | x15    | a5    | carg3  | '16r0'       |   |    | 16r1003080 | 16r |
| j-32            | #[111 240 3  | 1   | x16    | a6    |        | '16r0'       |   |    | 16r1003088 | 16r |
| sub s11, a0, s  | #[179 13 11  | 7   | x19    | s3    | extra1 | '16r0'       |   |    | 16r1003090 | 16r |
| bnez s11, 16    | #[99 152 13  | (   | x20    | s4    | extra2 | '16r0'       |   |    | 16r1003098 | 16r |
| lui a0, 4115    | #[55 53 1 1] | v   | x22    | s6    | temp   | '16r0'       | ~ |    | 16r10030A0 | 16r |
|                 |              |     |        |       |        |              |   |    |            |     |

