10

GitHub - mykolav/coollang-2020-fs: Cool 2020 compiler implemented in F#

 3 years ago
source link: https://github.com/mykolav/coollang-2020-fs
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Compiler of Cool 2020 (a small Scala subset) into x86-64 assembly, in F#

The project is purely for fun and it's, honestly, just another toy compiler. Here's a couple things that might set it apart.

  • It compiles down to x86-64 assembly. Then invokes GNU as and ld to produce a native executable. To me personally, this is much more rewarding than emitting MIPS assembly and using an emulator to run it, like many compiler courses do. I also prefer emitting assembly to, for example, C — at the very least, it forces the developer to figure out converting expressions into assembly, and managing the stack. That really drives home the point how much work high level languages do for us.

  • The Cool 2020 language is simple but not too simple. A lot of mini-compilers have languages with functions, primitive values and not much else. Whereas this project's language has classes, inheritance, virtual dispatch, and even a very simple form of pattern matching.

  • The test suite contains more than 250 automated tests. In particular there are a number of end-to-end tests, which invoke the compiler on a source file, run the produced executable and check its output against the expected values.

  • The compiler runs on Windows and Linux.

This page tries to give a bit of background on the project and describe it in more detail.


Contents


Cool 2020

Cool 2020 is a subset of Scala with minor incompatibilities.
Let's first take a look at a sample programm and then discuss the language in more details.

class Fib() extends IO() {
  def fib(x: Int): Int =
    if (x == 0) 0
    else if (x == 1) 1
    else fib(x - 2) + fib(x - 1);

  {
    var i: Int = 0;
    while (i <= 10) {
      out_string("fib("); out_int(i); out_string(") = ");
      out_int(fib(i));
      out_nl();
      
      i = i + 1
    }
  };
}

class Main() {
  { new Fib() };
}

Too see more of the language's features in action take a look at Life.cool, QuickSort.cool, and InsertionSort.cool.

CoolAid: The Cool 2020 Reference Manual

... the Classroom Object-Oriented Language ... retains many of the features of modern programming languages including objects, static typing, and automatic memory management...

Cool programs are sets of classes. A class encapsulates the variables and procedures of a data type. Instances of a class are objects. In Cool, classes and types are identified; i.e., every class defines a type. Classes permit programmers to define new types and associated procedures (or methods) specific to those types. Inheritance allows new types to extend the behavior of existing types.

Cool is an expression language. Most Cool constructs are expressions, and every expression has a value and a type. Cool is type safe: procedures are guaranteed to be applied to data of the correct type. While static typing imposes a strong discipline on programming in Cool, it guarantees that no runtime type errors can arise in the execution of Cool programs.

An antlr4 grammar for Cool 2020

Click to expand/collapse

Precendence

Click to expand/collapse

Work in progress

The compiler can successfully build a Windows x64 or Linux x64 excecutable out of every sample program.

Generational garbage collection is planned for some time in the future, but doesn't exist at the moment. As a result we never free allocated memory.

Build

Install .NET Core 3.1 SDK

The compiler is written in F#. F# is a .NET language. Rather predictably, a .NET SDK is a dependency.

Get it from the download page.

Windows

Download and run an SDK installer.
Keep in mind:

If you're using Visual Studio, look for the SDK that supports the version you're using.
If you're not using Visual Studio, install the first SDK listed.

Linux

Follow these instructions.

Just to give an example. On Ubuntu installation looks something like this:

# Add the Microsoft package signing key to your list of trusted keys.
# And add the package repository.
wget https://packages.microsoft.com/config/ubuntu/20.10/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb

# Install the SDK.
sudo apt-get update; \
sudo apt-get install -y apt-transport-https && \
sudo apt-get update && \
sudo apt-get install -y dotnet-sdk-3.1

Check installation

On both Windows and Linux, to check your installation, do the following command. If everything is OK, the output will contain one or many 3.1.xyz versions.

dotnet --list-sdks

Install GNU Binutils

The compiler emits x86-64 assembly. It uses as to assemble it. And ld to link with the runtime.

Windows

One way of getting binutils is installing MinGW. MinGW is a project providing Windows versions of GCC, GDB, binutils, and some other tools.
This page explains how to install MinGW (see item #3).

Click to expand/collapse the installation steps

... relocation truncated to fit: R_X86_64_32S ...

In case you use MSYS2 to install their MinGW packages, please keep in mind the following.
The compiler links an executable using a command similar to

ld -o a.exe -e main a.o ../../src/Runtime/rt_common.o ../../src/Runtime/rt_windows.o -L"C:/msys64/mingw64/x86_64-w64-mingw32/lib" -lkernel32

Starting at least from GNU Binutils 2.36.1 (and maybe an earlier version), the command above produces error along these lines:

a.o:fake:(.text+0x3f): relocation truncated to fit: R_X86_64_32S against symbol `IO_proto_obj' defined in .data section in ../../src/Runtime/rt_common.o
a.o:fake:(.text+0x7c): relocation truncated to fit: R_X86_64_32S against `.data'
a.o:fake:(.text+0x9e): relocation truncated to fit: R_X86_64_32S against `.data'
a.o:fake:(.text+0xd2): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x8): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x36): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x6d): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0x89): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xbb): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xdb): relocation truncated to fit: R_X86_64_32S against `.data'
../../src/Runtime/rt_common.o:fake:(.text+0xf8): additional relocation overflows omitted from the output

A news report on the MSYS2 page and an ld bug report seem to discuss a very similar issue though not exactly the same.

I don't really understand what causes the issue with linking object files produced from compiler-generated assembly. But a workaround suggested on the MSYS2 page solves it. The workaround is to pass --default-image-base-low to ld.

The compiler's code includes this flag into the linker command line to work around the problem. If you still encounter similar error messages, you might try downgrading GNU Binutils to 2.34 to make the problem go away.

pacman -U http://repo.msys2.org/mingw/x86_64/mingw-w64-x86_64-binutils-2.34-3-any.pkg.tar.zst

Linux

Install your distribution's binutils package. For example, on Ubuntu it looks something like this:

sudo apt install binutils

Build the compiler

On Windows, use Bash to perform these commands. As a git user, you likely have Git Bash installed and know how to use it. If not, take a look at this nice tutorial.

# Clone the repo.
git clone https://github.com/mykolav/coollang-2020-fs.git
cd coollang-2020-fs/

# Build F# code.
# And assemble the runtime's code.
./build.sh

A successful build should look similar to the following.

Compiler usage

On Windows, use Bash to follow the examples.
One option is using Git Bash. To get more details, take a look at this nice tutorial.

Change directory to sandbox/clc.
The folder contains a bash script clc. It's a wrapper that invokes the compiler and passes through command line arguments.

Synopsis

clc file1.cool [file2.cool, ..., fileN.cool] [-o file.exe | -S [file.asm]]

Examples

To compile and run QuickSort.cool, follow these steps.

# Compile
$ ./clc ../../src/Tests/CoolPrograms/Runtime/QuickSort.cool -o qs
Build succeeded: Errors: 0. Warnings: 0

# Run
$ ./qs
30 20 50 40 10
10 20 30 40 50

To see the assembbly the compiler emits for QuickSort.cool, perform the command below. Then open qs.s in your favourite editor.

$ ./clc ../../src/Tests/CoolPrograms/Runtime/QuickSort.cool -S qs.s
Build succeeded: Errors: 0. Warnings: 0

Test programs

Take a look inside src/Tests/CoolPrograms/Runtime for more test programs. In particular, relatively bigger programs it contains are Life.cool, InsertionSort.cool, QuickSort.cool, and Fibonacci.cool.

Implementation remarks

The implementation language is F#, but the code is imperative/OO. That said, the code is, hopefully, consistent with the recommendations from "F# Code I Love" by Don Syme.

Initially it was my ambition to practice writing functional code while learning about compilers. But trying to do both at the same time was biting off more than I could chew.

Used as an imperative/OO language, F# has many and many nice features. More mainstream languages started to catch up only recently. E.g., no null values by default, records, discriminated unions, pattern matching, primary constructors, etc.

Plus, F# is a low ceremony, low syntactic noise language. In this respect, you can think of F# as Python but statically typed.

The code contains a notable deviation from F# formatting guidelines. The guidelines recommend camelCase naming for let bindings. But I use snake-case naming in let bindingis. It's just a personal preference for a hobby project.

Useful links

Credits

The original Cool language was designed by Alex Aiken.

The Cool 2020 version was designed by John Boyland.

QuickSort.cool and InsertionSort.cool came from a papaer from LUME - the Digital Repository of the Universidade Federal do Rio Grande do Sul.

The web page uses GitHub Corners by Tim Holman.

License

This project is licensed under the MIT license. See LICENSE for details.

GitHub Corners is licensed under the MIT license. See LICENSE.github-corners for details.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK