Typed data for performance boost

After reading John McCutchan’s recent post on numeric computation in Dart, and a conversation with Srdjan Mitrovic from the Dart VM team, I made some changes to the Dart port of Box2D in the hope of improving performance. The specific commits that make up this change are here and here, and this change has been released to pub as version 0.1.6.

Originally, I had attempted to port the internal math library to John’s vector_math library. For one, this would mean that I no longer need to maintain a chunk of code, and for another it means any work he does to introduce SIMD into the library will automatically be available to me. However, this integration introduced some major instabilities into one of the collision solvers so I abandoned it.

Instead, I made two key changes to the code: Converted all num types to double, and converted the vector and matrix types to use typed data instead of double stored in fields. First, the results. All benchmark code can be found here and were run under SDK version 0.5.11.1_r23200. In all the below charts, the y-axis is the number of steps simulated per second, hence bigger is better.

Results

ball_drop

This simple benchmark shows little difference in performance across the board.

ball_cage

This simple benchmark shows a performance regression for higher step counts. This may be a sign that some parts of the code haven’t been updated to use doubles so the VM is having to do extra work.

circle_stress

The first of the complex benchmarks with many collisions per frame shows a huge improvement across the board. The performance was always fairly consistent, but now it is consistently better.

domino_platform

The most complex benchmark (the one that exhibits the collision instabilities under vector_math) also shows huge improvement in performance.

Rationale

It is clear from the above results that these changes have made a significant positive difference to performance, but why? We’ll tackle each part of the change in turn.

doubles

The VM can work with doubles in an unboxed form which makes computations very efficient. This does, however, break down when passing doubles to functions or returning them from methods as this causes a box/unbox operation pair. Similarly, storing a non-smi number in a field, as the old math library was doing, causes a box operation as fields can only hold smi values or object pointers. And that’s where typed lists come in.

Typed lists

Typed lists in Dart can only hold numbers, not regular objects. Further, unlike when working with integers, the double value does not need to be tagged. This means that the storing and retrieving doubles from a typed list is very fast, and also very memory efficient.

JavaScript

This is all well and good for the VM, but what happens when this code is converted to JavaScript, which is going to be the majority use-case at this point? JavaScript stores all numbers as double precision floating point numbers anyway, so there’s no conversion necessary there. Also, the typed lists map trivially to the native typed arrays in JavaScript which also means no further overhead.

Conclusion

If you’re doing any kind of numeric computation in Dart, read John’s article again and follow his advice. The tips he gives are not theoretical; they have a significant practical performance impact.

Embedding the Dart VM: Part One

The Dart VM can be run as a standalone tool from command-line, and is embedded in a branch of Chromium named Dartium, but all of the public instructions I could find only discuss how to extend the VM with native methods. Definitely useful, but there are some applications when it would be useful to directly embed the VM in a native executable. For instance, if a game wanted to use Dart as an alternative to Lua.

The instructions for adding native extensions recommend building the extension as a shared library that the VM can load. We need to flip that model around and build the VM as a static library that our native executable can link against. This, part one of a series of who-knows-how-many, will focus on getting the source and building a native executable with the VM embedded. We won’t expect to actually run a script just yet, but by the end of this blog we will be loading and compiling a script.

Getting and Building the VM

The instructions for getting all of the Dart source (including the editor, VM, and runtime libraries) are here.

Once we have the source, we need to build the VM as a static library instead of the default. gyp is used to generate the Makefile, and it already includes a static library target for the VM. Sadly make doesn’t seem to be fully supported on OSX, given the pile of compiler errors that I hit, but opening the xcode project and building the dart-runtime project directly does work. I needed to tweak a few of the settings in the xcodeproject as it was set up to use GCC 4.2, which I don’t have, and I wanted the Debug build to be unoptimized for easy debugging, but eventually I had a few shiny static libs to play with.

Quite a bit is hidden in that ‘eventually’. There are a number of subtleties around linking in generated source files to get a snapshot buffer loaded that contains much of the core libraries. There’s also the issue of initializing the built-in libraries; the VM source contains some files that handle this, but it’s unclear which should be included and which shouldn’t. Similarly, some libraries are name with a _withcore suffix, and it’s not clear whether they are the ones that should be linked in or not.

Loading a script

To load a script we need an Isolate and some core libraries loaded to do things like resolve the path to the script, read the source, and compile it. Eventually, when we come to run, we’ll have to register some native methods, but for now just loading the core, io and uri libraries is a good start.

It took a while, but the end result is both exciting, and suggests the next step that’s required:

Dart Initialized
LoadScript: helloworld.dart
CreateIsolate: helloworld.dart, main, 1
Created isolate
Loaded builtin libraries
About to load helloworld.dart
LoadScript: helloworld.dart, 1
Script loaded into 0x100702590
Invoking 'main'
Invoke: main
Failed to invoke main: Unhandled exception:
UnsupportedOperationException: 'print' is not supported
#0 _unsupportedPrint._unsupportedPrint (dart:core-patch:2525:3)
#1 print (dart:core-patch:2521:16)
#2 main (file:////~/git/embed-dart-vm/helloworld.dart:4:8)Done

There was one rather nice step where I was getting an error compiling the test Dart script. After spending a couple of hours debugging the code it turned out that I actually had a syntax error in the Dart script. I should have trusted my code after all!

Next time we’ll figure out those native extensions, built-in libraries, and get the script running.

Full source for this project can be found here and the version that this post describes is here.

Benchmarking DartBox2d

Joel Webber wrote this excellent blog post in which he tests native versions of Box2D against Javascript implementations. Perhaps unsurprisingly, he discovered that native code is around 20 times faster than JavaScript.

Having just released DartBox2d, I was curious to see how Dart stacks up against these results. It should be noted that the Dart version has diverged a little from the original port to make it more Dart-like. My measurements didn’t show any significant performance change between the current version and the initial port.

I’m using the same test as Joel, taken from his github repo, and have committed the Dart source used back into the tree, so you can check it out here. The JavaScript there, and used below, was generated using frog rather than dartc as it generates smaller, more readable output. The Dart VM does not currently support any references to dart:dom or dart:html so running those required some massaging of the code. Specifically, commenting out all of dartbox2d/callbacks/CanvasDraw.dart and removing all references to Canvas from Bench2d.dart.

All of the data and the graphs can be seen here.

JavaScript

First, JavaScript generated from Dart using frog vs hand-written Box2D-web JavaScript:

 

 

This is on a linear scale, unlike Joel’s graphs, as the difference between the traces is much smaller. However, the raw frame times are higher, which is probably due to the different machines we’re running on. The results, though, are still clear: Box2D-web runs at an average 104 ms/frame while the JS generated by frog from Dart is running at 135 ms/frame. There’s significant variation in both implementations (standard deviation is ~18 – 19 ms in both cases) which is either inherent in the simulation or indicates garbage collection running.

Native

Given the difference Joel saw between the Java VM and Javascript, with the Java VM running 10 times faster than JavaScript, it is tempting to compare Dart compiled to JavaScript with Dart running natively in a VM in Dartium.

There’s a massive 4800 ms frame that I had to cut off to see detail across all the samples. I think this is some part of the VM being initialized and blocking the process, but it’s hard to tell.

 

 

There’s some other really interesting things to note here. Firstly, the VM performance improves over time, which is not something that I’ve seen in other tests. It’s also faster than the generated JavaScript and the hand-written Box2D-Web JavaScript at it’s fastest, however there is massive variance due to a periodic slowdown. It’s running at an average 119 ms per frame but the standard deviation is a massive 300 ms. I haven’t looked into the Dart VM but I’m going to throw out a guess that this is some garbage collection kicking in every few frames.

Summary

Here are all three results together for comparison:

 

With a little optimization work in DartBox2d, and maybe a little work on the code generation in Dart, I think it’s possible to get Dart-generated-JavaScript to get close to the performance of hand-written JavaScript. However, it’s also clear that the Dart VM, even in its current state, has the potential to outperform both.