Thursday, 25 December 2008

Building a better future: the High-Level Virtual Machine

Microsoft's Common Language Run-time (CLR) was a fantastic idea. The ability to interoperate safely and at a high-level between different languages, from managed C++ to F#, has greatly accelerated development on the Microsoft platform. The resulting libraries, like Windows Presentation Foundation, are already a generation ahead of anything available on any other platform.

Linux and Mac OS X do not currently have the luxury of a solid foundation like the CLR. Consequently, they are composed entirely from uninteroperable components written in independent languages, from unmanaged custom C++ dialects to Objective C and Python. Some developers choose to restrict themselves to the lowest common denominator (e.g. writing GTK in C) which aids interoperability but only at a grave cost in productivity. Other developers gravitate to huge libraries written in custom dialects of particularly uninteroperable languages (e.g. Qt). Both approaches have a bleak future.

The situation is compounded by the fact that Linux has a far richer variety of programming languages than Windows, thanks to Linux being the platform of choice for academics such as programming language researchers who develop and maintain a variety of state-of-the-art programming languages, libraries and tools on the Linux platform. However, despite any benefits of languages like OCaml, Erlang, Haskell, Lisp, Scheme, ATS, Pure and others, these languages are almost entirely uninteroperable because they do not have a shared run-time and many do not even have easy foreign function interfaces (FFIs) to access existing unmanaged libraries.

If there were a high-level virtual machine (HLVM) available for Linux that could act as a common language run-time for these kinds of languages then it may be possible to build a better future for software development on these platforms. The impedance mismatch between different languages (including C) would be a lot smaller and the ability to write and consume libraries from other languages would greatly improve productivity.

We believe this approach has a bright future and, consequently, we have begun developing a new HLVM that is designed to act as a common language run-time, initially for the ML family of languages, in the hope that others will build upon it and efforts can be combined between language communities. We are using the excellent LLVM library that provides high-performance native code generation across a variety of architectures and platforms, including x86/x64 and Linux/OSX.

Although the project is still at a very early stage of development, we already have some promising results. We can compile a subset of ML including bools, ints, floats and arrays types, we have full tail calls between internal functions and the C calling convention for external functions which can be invoked directly and our implementation is 2-4× faster than OCaml on x86 at several simple benchmarks including the Sieve of Eratosthenes and a Mandelbrot renderer.

The main features that we have yet to implement are algebraic datatypes, pattern matching and garbage collection. Once those features have been completed we shall release a first version of our HLVM as an open source project and ask for contributors and developers to start improving and building upon this foundation. This will take time but hopefully we can work together to build a better future for high-level programming on the Linux and Mac OS X platforms.