URL shortener in go

I recently combined all my various blogs into this one and then realised that, yet again, the length of my name is a pain. Especially when it comes to URLs. I also realised that I’d never tried to build a URL shortener, so I did. Full source available here.

There are a few tutorials around about writing one in PHP, but I decided to use Go. The basic idea is the same.

Database setup

The URL shortener uses a single table with the following structure:

Field Type Null Default Extra
id int(11) No None AUTO_INCREMENT
short varchar(8) No None
long varchar(64) No None
created timestamp No CURRENT_TIMESTAMP
accessed timestamp No 0000-00-00 00:00:00
access_count int(11) No 0

I think the fields are self-explanatory.

Code

The code itself is really simple. The idea is to handle every HTTP request and capture the path of the URI. Look the path up in the database, and return either a redirect or not found. I decided to use go-sql-driver for the mysql integration and gorilla mux for more advanced request routing.

func main() {
        flag.Parse()

        c, err := sql.Open("mysql", *user+":"+*pwd+"@("+*host+")/"+*db)
        if err != nil {
                log.Fatal("Failed to open sql dbConnection")
        }
        dbConn = c
        defer dbConn.Close()

        r := mux.NewRouter()
        r.HandleFunc("/{shorturl}", Handler).Methods("GET")
        http.Handle("/", r)

        log.Fatal(http.ListenAndServe(":"+*port, nil))
}

As you can see, all we do in main is open the connection to the database given the passed-in connection parameters, then set up a simple request routing table.

func Handler(w http.ResponseWriter, r *http.Request) {
        shorturl := mux.Vars(r)["shorturl"]                                                                                                               

        // lookup shorturl
        var longurl string
        var access_count int
        err := dbConn.QueryRow("select `long`,`access_count` from `url` where short=?", shorturl).Scan(&longurl, &access_count)

        switch {
        case err == sql.ErrNoRows:
                log.Printf("%q not found", shorturl)
                http.NotFound(w, r)

        case err != nil:
                log.Fatal(err)

        default:
                access_count++
                log.Printf("%q -> %q", shorturl, longurl)
                http.Redirect(w, r, longurl, http.StatusFound)
        }
}

The single handler is almost as straightforward. It queries the database to see if the short URL is registered and then either returns a redirect or not found, or fatals in the case of a SQL error. An extra feature that I might use later in a dashboard view is that it tracks the access count and last accessed time for each registered short URL.

We also don’t need to worry about little Bobby Tables as Go escapes the parameters for us.

Running

One final wrinkle is that I use Dreamhost for hosting and so can’t run a native Go server. However, with a little mod_rewrite trickery I can persuade Apache to forward requests to my server. Here’s the relevant lines of the .htaccess file:

RewriteEngine on 
RewriteCond %{SERVER_PORT} !^4242$ 
RewriteRule ^(.*) http://%{SERVER_NAME}:4242%{REQUEST_URI} [R=301,L]

This ensures that the given port isn’t already the one our server is listening on, and then redirects to the host with the write port number.

Next steps

Currently, short URLs have to be added manually through a SQL interface. Ideally, there’d be some UI for adding short URLs, and maybe even generating them through some hash function. Though checking for and handling hash collisions in a database is a pain.

All the small things

Someone, on checking out my github page, recently commented that I’m more of a breadth than a depth developer. I’m not sure that this is entirely accurate, though I didn’t take offence, but it is true that many of my personal projects are small one-off toys. This post is a tour of some of them.

dmanix

I didn’t study Computer Science at university1 and I sometimes feel like I missed out on some key programming projects like writing your own operating systems, compilers, and path tracers. I’ve been steadily remedying this and dmanix is my attempt at an operating system.

It only runs on 386 and has a scheduler, ramdisk, memory allocation, etc. The next step will be writing an ELF parser and compiling programs for it.

The only thing that separates it from other *nix OSes is that it’s written in more C++ than C.

lru-cache

One of the interview questions I ask on occasion2 is to implement an LRU cache in C or C++. The idea behind an LRU cache is that you have a fixed capacity cache in which you want to store data such that only the most recently used things are in it. In this case, I’m storing key/value pairs.

The specific reason why this is an interesting implementation is that it uses only operations that are average-case O(1). Ie, it should scale without significant performance reduction. This is managed by using both a list and an unordered_map; the unordered_map to store the key-value pairs for fast lookup, and the list to track how recently items are accessed.

buckets

There’s a well-known class of puzzles in which one is asked to fill a container using two other odd-sized containers. For example, fill a 5l container using a 3l and 4l container. I don’t enjoy these puzzles so I decided to write code to solve them for me.

It uses a depth-first graph search algorithm  to explore the solution space and then prints out the steps needed to solve the problem. There’s one issue which I should fix at some point: It might be better to use a breadth-first search as the solution space is relatively small and this would return the shortest solution.

queueserve

One of the things I’ve been doing recently is learning Go3. I felt that I needed to get to grips with the scalability and use for a web service, so I built a highly concurrent queue server and client.

It uses JSON over HTTP as a wire protocol and supports many, many concurrent clients thanks to Go’s built-in support for concurrent HTTP serving. The server is using lock-free queues to maximize concurrency.

evernote

Evernote have a series of coding challenges on their careers website. I enjoy a challenge so I tackled the problems. No-one ever contacted me so I don’t know if I did well, but my code passes all of the test cases and is relatively fast. All the other solutions posted around the web are in Python (that I found, anyway) so I thought having C++ solutions would be a neat change.

 

  1. Theoretical physics, if you were wondering
  2. and presumably can’t ask any more after posting this
  3. the language, not the game

gomud

I needed an excuse to learn go, and none of the usual candidate projects that I usually use to pick up a new language seemed to fit well. I was thinking back to how I started programming, and the transition from hobby programming to professional was largely thanks to working as a creator on Discworld MUD. It struck me that a MUD could be the perfect platform, given the support for concurrency and networking. And then I realised it would be even more interesting to write a MUD engine that others could use to develop MUDs. So I did: gomud.

Right now you can define rooms that have exits, players are persistent (though there are no passwords yet) and supported commands include ‘say’, ‘tell’, ‘me’, ‘shout’, ‘who’, ‘look’, and ‘go’.

Future plans include defining commands through data for easy expansion, add passwords for players, items, creatures, and then things like combat, skills, etc.

It turns out that it really is the perfect vehicle for learning go. The network support and concurrency in the language lend themselves really well to the core loop for a MUD, and the message passing works wonders for sending text to different connected players. Built in testing means that unit test coverage is strong, and it’s really easy to hack something up that works before iterating it into something more modular and well-structured.

If you can’t tell, I’m somewhat enamoured with both the project, and the language. I’m open to pull requests, so if you also want to get started with the language, or if you want to use it to build a MUD, please get in touch and help me build something neat.

Typed data for performance boost

After reading John McCutchan’s recent post on numeric computation in Dart, and a conversation with Srdjan Mitrovic from the Dart VM team, I made some changes to the Dart port of Box2D in the hope of improving performance. The specific commits that make up this change are here and here, and this change has been released to pub as version 0.1.6.

Originally, I had attempted to port the internal math library to John’s vector_math library. For one, this would mean that I no longer need to maintain a chunk of code, and for another it means any work he does to introduce SIMD into the library will automatically be available to me. However, this integration introduced some major instabilities into one of the collision solvers so I abandoned it.

Instead, I made two key changes to the code: Converted all num types to double, and converted the vector and matrix types to use typed data instead of double stored in fields. First, the results. All benchmark code can be found here and were run under SDK version 0.5.11.1_r23200. In all the below charts, the y-axis is the number of steps simulated per second, hence bigger is better.

Results

ball_drop

This simple benchmark shows little difference in performance across the board.

ball_cage

This simple benchmark shows a performance regression for higher step counts. This may be a sign that some parts of the code haven’t been updated to use doubles so the VM is having to do extra work.

circle_stress

The first of the complex benchmarks with many collisions per frame shows a huge improvement across the board. The performance was always fairly consistent, but now it is consistently better.

domino_platform

The most complex benchmark (the one that exhibits the collision instabilities under vector_math) also shows huge improvement in performance.

Rationale

It is clear from the above results that these changes have made a significant positive difference to performance, but why? We’ll tackle each part of the change in turn.

doubles

The VM can work with doubles in an unboxed form which makes computations very efficient. This does, however, break down when passing doubles to functions or returning them from methods as this causes a box/unbox operation pair. Similarly, storing a non-smi number in a field, as the old math library was doing, causes a box operation as fields can only hold smi values or object pointers. And that’s where typed lists come in.

Typed lists

Typed lists in Dart can only hold numbers, not regular objects. Further, unlike when working with integers, the double value does not need to be tagged. This means that the storing and retrieving doubles from a typed list is very fast, and also very memory efficient.

JavaScript

This is all well and good for the VM, but what happens when this code is converted to JavaScript, which is going to be the majority use-case at this point? JavaScript stores all numbers as double precision floating point numbers anyway, so there’s no conversion necessary there. Also, the typed lists map trivially to the native typed arrays in JavaScript which also means no further overhead.

Conclusion

If you’re doing any kind of numeric computation in Dart, read John’s article again and follow his advice. The tips he gives are not theoretical; they have a significant practical performance impact.

Getting started with DartBox2d part 2

This is an update to this post as dartbox2d has undergone some drastic changes since that was written over a year ago.

Getting the code

The first of the main differences is that dartbox2d is now a package hosted on pub. Getting the code is a matter of having a pubspec.yaml file in the root of your project that looks something like:

name: dartbox2d-tutorial
version: 0.0.1
author: Dominic Hamon <dominic@google.com>
homepage: http://dmadev.com
dependencies:
    box2d: ">=0.1.1"
    browser: any"

The dependencies section is the important bit. This lets pub know that this application depends on any version of box2d with a version number greater than 0.1.1. That version uses the vector_math package instead of a hand-rolled math library and runs against the latest (as of this writing) dart sdk r19447. It also depends on browser which we’ll need to pull in the script to allow the code to run both as dart and javascript.

Then, just call $DART_SDK/bin/pub install and the box2d package will be installed. If any errors about conflicting dependencies are printed, please let me know.

The HTML page

<html>
  <body>
    <script type="application/dart" src="tutorial.dart"></script>
    <script src="packages/browser/dart.js"></script>
  </body>
</html>

The Dart code

At the top of the dart file, you would now import the libraries you need:

library dartbox2d_tutorial;
import 'dart:html';
import 'package:box2d/box2d.dart';

Note that the box2d package actually contains two libraries; box2d.dart for use with the VM and box2d_browser.dart for use in the browser. The only difference is that the latter enables debug drawing functions using dart:html. If you’re planning on running through DartEditor, you probably want box2d_browser.

The rest of the tutorial works almost as is, though the new math library means that initializeWorld should look like:

void initializeWorld() {
    // Create a world with gravity and allow it to sleep.
    world = new World(new vec2(0, -10), true, new DefaultWorldPool());

    // Create the ground.
    PolygonShape sd = new PolygonShape();
    sd.setAsBox(50.0, 0.4);

    BodyDef bd = new BodyDef();
    bd.position.setCoords(0.0, 0.0);
    Body ground = world.createBody(bd);
    ground.createFixtureFromShape(sd);

    // Create a bouncing ball.
    final bouncingBall = new CircleShape();
    bouncingBall.radius = BALL_RADIUS;

    final ballFixtureDef = new FixtureDef();
    ballFixtureDef.restitution = 0.7;
    ballFixtureDef.density = 0.05;
    ballFixtureDef.shape = bouncingBall;

    final ballBodyDef = new BodyDef();
    ballBodyDef.linearVelocity = new vec2(-2, -20);
    ballBodyDef.position = new vec2(15, 15);
    ballBodyDef.type = BodyType.DYNAMIC;
    ballBodyDef.bullet = true;

    final ballBody = world.createBody(ballBodyDef);
    ballBody.createFixture(ballFixtureDef);
}

And the switch to dart:html leaves initializeCanvas looking like:

CanvasElement canvas;
CanvasRenderingContext2D ctx;
ViewportTransform viewport;
DebugDraw debugDraw

void initializeCanvas() {
    // Create a canvas and get the 2d context.
    canvas = new CanvasElement(width:CANVAS_WIDTH, height:CANVAS_HEIGHT);
    document.body.append(canvas);
    ctx = canvas.getContext("2d");

    // Create the viewport transform with the center at extents.
    final extents = new Vector(CANVAS_WIDTH / 2, CANVAS_HEIGHT / 2);
    viewport = new CanvasViewportTransform(extents, extents);
    viewport.scale = VIEWPORT_SCALE;

    // Create our canvas drawing tool to give to the world.
    debugDraw = new CanvasDraw(viewport, ctx);

    // Have the world draw itself for debugging purposes.
    world.debugDraw = debugDraw;
}

and run as:

void run() {
    window.animationFrame.then((time) => step());
}

void step() {
    world.step(1/60, 10, 10);
    ctx.clearRect(0, 0, CANVAS_WIDTH, CANVAS_HEIGHT);
    world.drawDebugData();
    run();
}

Also, though the frog and dartc compilers have been replaced by dart2js so this is compiled to javascript using:

$ $DART_SDK/bin/dart2js tutorial.dart -otutorial.dart.js

There are a host of examples in the demos folder of the DartBox2d source here.

Visualizing M-Lab data with BigQuery: Part Two

When I wrote Part One, I wasn’t aware that it was going to need a Part Two. However, I was unsatisfied with the JavaScript version and the limitations incurred by trying to build a live app. I also wanted to make movies to show the time series of the data. As such, I rewrote the code in Python using the authorization technique for installed apps, and you can find that here.

I won’t go through the code in depth, because it’s fairly explanatory. It authorizes with, and gets, a BigQuery service then loops through the tables in the m_lab datastore running a query and plotting points on a map. However, the results of running it are much more satisfactory.


minrtt_map

If you have any questions, feel free to comment here or contact me some other way.

Visualizing M-Lab data with BigQuery

I recently moved to a new group at Google: M-Lab. The Measurement Lab is a cross-company supported platform for researchers to run network performance experiments against. Every experiment running on M-Lab is open source, and all of the data is also open; stored in Google Cloud Storage and BigQuery.

One of the great things about having all that data open and available in BigQuery is that anyone can come along and find ways to visualize it. There’s a few examples in the Public Data Explorer, but I was feeling inspired 1 and wanted to know what it was like for someone coming to M-Lab data fresh.

I decided to build something using HTML5 canvas and JavaScript so those are the versions of the API that I’m using. However, there’s broad language support for the APIs.

OAuth2 authentication

This step actually took the longest of the whole process. There are a number of options when it comes to which type of key you need, and how you go about getting one, and the documentation is thorough but not particularly clear. Essentially you need to have view rights for the measurement-lab BigQuery project, and a Client ID created for you by one of the owners. You can ignore any documentation that talks about billing, unless you import the data into your own BigQuery project before running queries against it. However, there’s no need to do that, as a simple email to someone at M-Lab 2 you can get a key for your app and view access. This will allow you to run queries at no cost to you.

Once you have a key, it’s time to authorize:

<html>
  <head>
    <script src="https://apis.google.com/js/client.js"></script>
    <script
        src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js">
    </script>
    <script>
      var project_id = 'measurement-lab';
      var client_id = 'XXXXXXXXXXX';
      var config = {
        'client_id': client_id,
        'scope': 'https://www.googleapis.com/auth/bigquery.readonly'
      };
      function auth() {
        gapi.auth.authorize(config, function() {
            $('#status').html('Authorized..');
            gapi.client.load('bigquery', 'v2', function() {
                $('#status').html('BigQuery client initiated.');
                $('#auth_button').fadeOut();
            });
        });
      }

      function initialize() {
        if (typeof gapi === 'undefined')
          $('#status').html('API not available. Are you connected to the internet?');
        $('#status').html('Client API loaded.');
        $('#auth_button').fadeIn();
      }

  </head>
  <body onload="initialize();">
    <div id="status">Waiting for client API to load</div></br>
    <button id="auth_button" style="display:none;" onclick="auth();">
      Authorize
    </button>
  </body>
</html>

With this, you should be able to hit the Authorize button, enter a Google account (required for tracking by BigQuery, I believe) and load the BigQuery API library. The Google account you use will have to have accepted the BigQuery terms of service, which involves logging in to the BigQuery site and clicking through the login.

Note, this is a bit of a pain for a multiple-user web application. However, there is the option to set up a server-to-server authorization flow which removes this difficulty. Similarly, I believe native installed applications have a different route for authorization but I haven’t looked into it yet.

Your first query

For the purposes of this post, I wanted to get every distinct test that had been run in a month and plot a point at the latitude and longitude of the client’s location. By making the plotted pixel semi-transparent I could use additive blending to make things glow a bit, and easily see areas where multiple tests had run.

I’ll assume you know how to add a canvas to the page and draw to pixels. I’ll focus instead on actually running the query.

      function runQuery() {
        var query =
            'SELECT project,log_time,' +
                   'connection_spec.client_geolocation.latitude,' +
                   'connection_spec.client_geolocation.longitude,' +
                   'web100_log_entry.is_last_entry ' +
                'FROM m_lab.2012_10 ' +
                'WHERE log_time > 0 AND ' +
                      'connection_spec.client_geolocation.latitude != 0.0 AND ' +
                      'connection_spec.client_geolocation.longitude != 0.0 AND ' +
                      'web100_log_entry.is_last_entry == true ' +
                'ORDER BY log_time';
        var request = gapi.client.bigquery.jobs.query({
            'projectId': project_id,
            'timeoutMs': '45000',
            'query': query
        });
        $('#status').html('Running query...');
        var startTime = new Date().getTime();
        request.execute(function(response) {
            var duration = (new Date().getTime()) - startTime;
            $('#status').html('Query complete in ' + duration / 1000 + ' s.');

            ... <clear canvas> ...

            $.each(response.result.rows, function(i, item) {
                var latitude = parseFloat(item.f[3].v);
                var longitude = parseFloat(item.f[4].v);

                // Normalize to canvas
                var canvas_x = w * (longitude / 360.0 + 1 / 2);
                var canvas_y = h * (latitude / 180.0 + 1 / 2);

                var canvas_x_int = Math.floor(canvas_x);
                var canvas_y_int = Math.floor(canvas_y);
                // Flip y for canvas
                var i = (canvas_x_int + (h - canvas_y_int) * w) * 4;

                ... <draw pixel i> ...
            });
        });
      }

There’s a few things to note here, but the main one is that this is a synchronous call and will timeout and return no results if it takes longer than 45 seconds. It will also return a maximum of 16000 rows. There are ways to remove both of these restrictions which we’ll get to later.

Also, the key to getting values from each row is in the item.f[3].v lines. The index there comes from the order of fields requested in the SELECT statement.

Another thing to note is that longitude and latitude could be projected using Mercator or Equirectangular projections.

Removing the restrictions

Both the synchronicity of the call and the timeout can be solved by polling for the query to complete using the jobs.getQueryresults method.

With this change, the code above looks something like:

      function runQuery() {
        startIndex = 0;
        var query =
            'SELECT log_time,' +
                    getColorField() + ',' +
                   'connection_spec.client_geolocation.latitude,' +
                   'connection_spec.client_geolocation.longitude,' +
                   'web100_log_entry.is_last_entry ' +
                'FROM m_lab.2012_10 ' +
                'WHERE log_time > 0 AND ' +
                      'connection_spec.client_geolocation.latitude != 0.0 AND ' +
                      'connection_spec.client_geolocation.longitude != 0.0 AND ' +
                      'web100_log_entry.is_last_entry == true ' +
                'ORDER BY log_time LIMIT ' + maxPoints; 
        var request = gapi.client.bigquery.jobs.query({
            'projectId': project_id,
            'timeoutMs': pollTime,
            'query': query
        });

        $('#status').html('Running query');

        ... <clear canvas> ...

        startTime = new Date().getTime();
        request.execute(pollForResults);
      }

      function pollForResults(response) {
        if (response.jobComplete === true) {
          if (startIndex == 0)
            $('#status').html('Receiving data');
          startIndex += response.rows.length;
          plotResponse(response);
        }
        if (startIndex == parseInt(response.totalRows)) {
          var duration = (new Date().getTime()) - startTime;
          $('#status').html('Query complete in ' + duration / 1000 + ' s. ' +
                            response.totalRows + ' points plotted.');
          return;
        }
        if (typeof response.error === 'undefined') {
          request = gapi.client.bigquery.jobs.getQueryResults({
              'jobId': response.jobReference.jobId,
              'projectId': response.jobReference.projectId,
              'timeoutMs': pollTime,
              'startIndex': startIndex
          });
          request.execute(pollForResults);
        } else {
          $('#status').html('ERROR: ' + response.error.message);
        }
      }

As you can see, instead of having a long timeout, we have a short one. When the callback fires, if the job is not complete, we poll again with the same short timeout. This continues until the job succeeds (or an error is returned). Note, this means that even jobs that apparently timeout can remain running and accessible from the API. If you want to cancel a job you need to delete it using the API.

So what do you get when you run this for 1 million points?

The points are a little fuzzy as I was using CSS to scale up the canvas for a cheap blur effect.

What else?

This is pretty enough, but the M-Lab data includes some terrific data regarding the innards of TCP states on client and server machines throughout the tests that are run. This is thanks to the Web100 kernel patches that run on M-Lab server slices. With those, it would be possible to map out areas where congestion signals are more common, or the distribution of receiver window settings. Or try to find correlations between RTT and the many available fields in the schema.

As another simple example, by plotting short RTT in blue, medium RTT in green, and long RTT in red (and removing the blur), you get something like:

If you look at the full-res version, you can see the clusters of red pixels across India and South East Asia.

This is immediately useful data: Given the number of tests that are run in the area (the density of points), and the long RTT we’re seeing from there, it would make sense to add a few servers in those countries to ensure the data we have on throughput and congestion for that area is not being skewed by long RTT. Similarly, we can feel good about our coverage across Europe and North America, though the less impressive RTT in Canada should be investigated.

More?

Almost certainly, but I’m out of time on this little weekend project hacked together between jaunts around Boston. Someone smarter than me can probably combine the fields in the m_lab table schema in ways I haven’t considered and draw out interesting information. Similarly, the live version 3 could support zooming and panning of the map, and more flexibility in setting the query from the UI.

Lastly, if you want to start playing around with >630TB of network performance data, let me know and I’ll see what I can do.

  1. Mostly by this post from Facebook
  2. Hint: Try dominic at measurementlab dot net
  3. I did mention that there’s a live version right?

Benchmarking DartBox2d

Joel Webber wrote this excellent blog post in which he tests native versions of Box2D against Javascript implementations. Perhaps unsurprisingly, he discovered that native code is around 20 times faster than JavaScript.

Having just released DartBox2d, I was curious to see how Dart stacks up against these results. It should be noted that the Dart version has diverged a little from the original port to make it more Dart-like. My measurements didn’t show any significant performance change between the current version and the initial port.

I’m using the same test as Joel, taken from his github repo, and have committed the Dart source used back into the tree, so you can check it out here. The JavaScript there, and used below, was generated using frog rather than dartc as it generates smaller, more readable output. The Dart VM does not currently support any references to dart:dom or dart:html so running those required some massaging of the code. Specifically, commenting out all of dartbox2d/callbacks/CanvasDraw.dart and removing all references to Canvas from Bench2d.dart.

All of the data and the graphs can be seen here.

JavaScript

First, JavaScript generated from Dart using frog vs hand-written Box2D-web JavaScript:

 

 

This is on a linear scale, unlike Joel’s graphs, as the difference between the traces is much smaller. However, the raw frame times are higher, which is probably due to the different machines we’re running on. The results, though, are still clear: Box2D-web runs at an average 104 ms/frame while the JS generated by frog from Dart is running at 135 ms/frame. There’s significant variation in both implementations (standard deviation is ~18 – 19 ms in both cases) which is either inherent in the simulation or indicates garbage collection running.

Native

Given the difference Joel saw between the Java VM and Javascript, with the Java VM running 10 times faster than JavaScript, it is tempting to compare Dart compiled to JavaScript with Dart running natively in a VM in Dartium.

There’s a massive 4800 ms frame that I had to cut off to see detail across all the samples. I think this is some part of the VM being initialized and blocking the process, but it’s hard to tell.

 

 

There’s some other really interesting things to note here. Firstly, the VM performance improves over time, which is not something that I’ve seen in other tests. It’s also faster than the generated JavaScript and the hand-written Box2D-Web JavaScript at it’s fastest, however there is massive variance due to a periodic slowdown. It’s running at an average 119 ms per frame but the standard deviation is a massive 300 ms. I haven’t looked into the Dart VM but I’m going to throw out a guess that this is some garbage collection kicking in every few frames.

Summary

Here are all three results together for comparison:

 

With a little optimization work in DartBox2d, and maybe a little work on the code generation in Dart, I think it’s possible to get Dart-generated-JavaScript to get close to the performance of hand-written JavaScript. However, it’s also clear that the Dart VM, even in its current state, has the potential to outperform both.

 

Getting started with DartBox2d

This post is deprecated. Please see this post for the latest version.

The latest Dart library to be released is one that might see a fair bit of use, if the Java and JavaScript versions are anything to go by. DartBox2d is the latest port of the immensely popular 2d physics engine seen in games across the web. It has a very similar interface to the Java version, so getting started shouldn’t be too hard for anyone familiar with that version.

That being said, here’s how to put together a simple application.

Getting the library

Go here and follow the instructions to get a local copy of the code.

The HTML page

Next, you need an HTML page to host the application. A simple boilerplate would look something like:

<html>
  <body>
    <script type="text/javascript" src="tutorial.dart.js"></script>
  </body>
</html>

The Dart code

At the top of the dart file, you need to name your application and import any libraries you want to use:

#library('tutorial');
#import('dart:dom');
#import('[path_to_dartbox2d]/lib/box2d.dart');

Now create a class for your application and a simple main method:

class Tutorial {
  ...
}

void main() {
  Tutorial.main();
}

This simple main method is what will be called when your application starts. It is calling a static method on your class that should now look something like:

class Tutorial {
  static void main() {
    final app = new Tutorial();
    app.run();
  }

  Tutorial() {
    initializeWorld();
    initializeCanvas();
  }
}

All we have left to do is to define the two initialization methods and the run method. First, let’s initialize the world:

class Tutorial {
  ...

  static final int BALL_RADIUS = 1;

  World world;

  void initializeWorld() {
    // Create a world with gravity and allow it to sleep.
    world = new World(new Vector(0, -10), true, new DefaultWorldPool());

    // Create the ground.
    PolygonShape sd = new PolygonShape();
    sd.setAsBox(50.0, 0.4);
      
    BodyDef bd = new BodyDef();
    bd.position.setCoords(0.0, 0.0);
    Body ground = world.createBody(bd);
    ground.createFixtureFromShape(sd);

    // Create a bouncing ball.
    final bouncingBall = new CircleShape();
    bouncingBall.radius = BALL_RADIUS;

    final ballFixtureDef = new FixtureDef();
    ballFixtureDef.restitution = 0.7;
    ballFixtureDef.density = 0.05;
    ballFixtureDef.shape = bouncingBall;

    final ballBodyDef = new BodyDef();
    ballBodyDef.linearVelocity = new Vector(-2, -20);
    ballBodyDef.position = new Vector(15, 15);
    ballBodyDef.type = BodyType.DYNAMIC;
    ballBodyDef.bullet = true;

    final ballBody = world.createBody(ballBodyDef);
    ballBody.createFixture(ballFixtureDef);
  }
}

And now we’re ready to initialize the canvas:

class Tutorial {
  ...
  static final int CANVAS_WIDTH = 900;
  static final int CANVAS_HEIGHT = 600;
  static final int VIEWPORT_SCALE = 10;

  HTMLCanvasElement canvas;
  CanvasRenderingContext2D ctx;
  IViewportTransform viewport;
  DebugDraw debugDraw;

  void initializeCanvas() {
    // Create a canvas and get the 2d context.
    canvas = document.createElement('canvas');
    canvas.width = CANVAS_WIDTH;
    canvas.height = CANVAS_HEIGHT;
    document.body.appendChild(canvas);
    ctx = canvas.getContext("2d");

    // Create the viewport transform with the center at extents.
    final extents = new Vector(CANVAS_WIDTH / 2, CANVAS_HEIGHT / 2);
    viewport = new CanvasViewportTransform(extents, extents);
    viewport.scale = VIEWPORT_SCALE;

    // Create our canvas drawing tool to give to the world.
    debugDraw = new CanvasDraw(viewport, ctx);

    // Have the world draw itself for debugging purposes.
    world.debugDraw = debugDraw;
  }
}

Now, all that’s left is to start the world running:

class Tutorial {
  ...
  static final num TIME_STEP = 1/60;
  static final int VELOCITY_ITERATIONS = 10;
  static final int POSITION_ITERATIONS = 10;

  void run() {
    window.webkitRequestAnimationFrame((num time) {
      step(time);
    }, canvas);
  }

  void step(num time) {
    // Advance the world.
    world.step(TIME_STEP, VELOCITY_ITERATIONS, POSITION_ITERATIONS);

    // Clear the canvas and draw the frame.
    ctx.clearRect(0, 0, CANVAS_WIDTH, CANVAS_HEIGHT);
    world.drawDebugData();

    // Request another animation frame.
    window.webkitRequestAnimationFrame((num t) {
      step(t);
    }, canvas);
  }
}

Once you’ve put all this together, you’re ready to compile it to JavaScript.

I prefer the command-line frog compiler, but DartBox2d also compiles cleanly with dartc with warnings as errors and fatal type errors enabled, so feel free to use either. You can also use the Dart Editor to build your application, and that’s almost certainly the best route to take. Refer to the Dart language site for more details.

Once you’ve compiled to JavaScript, make sure that your html file is referencing the generated file correctly and load it up in a browser. If all went well, you should see a green box with a peach ball bouncing on it.

Porting Colossal Cave Adventure to Native Client

Native Client (NaCl) is a new technology built in to Chrome that allows native code (read C++) to be compiled into a form that is then executed within the browser. Several impressive ports have already been completed, including ScummVM, OGRE, and Unity 3D. A host of other open source libraries have also been ported and are available in the naclports repository.

I am really excited about what this could mean for the future of the web as a platform for technology; even old crusty low-level coders like me can get in on this internet thing all the young people are talking about. I had the opportunity recently to take part in a hackathon centred around Native Client and took the opportunity to port a game that should need no introduction: Colossal Cave Adventure.

For those who just want to try it out, you’ll need Chrome 14 as a minimum and you can find it in the Chrome web store right here. If you just want to see the source, which is open of course, you’ll find that here. Please understand that this was written over the course of a few hours so it is not necessarily as elegant a solution as it might be given more time. Perhaps I will return to it and clean it up later.

There were a number of hurdles I had to overcome to complete this port and it might be interesting to others so this post will cover the issues and my approaches to them.

There were three key components to getting the code to work under Native Client:

  • Static data
  • stdin/stdout
  • Synchronous calls

Static data

The problem

The original source contains a utility advent0 that takes the text messages from a set of .txt files and creates look-up tables that match a message with a byte offset. At run-time, these look-up tables are used to seek through the .txt files and read the message out using fseek and fread. This is not feasible under Native Client as this interface to file I/O is not (yet?) supported. Other ports have successfully used forms to allow users to upload files that are required, but that also is not applicable to this app.

The solution

A new utility was created (advtxt_to_c) that converts the .txt files into arrays of c-style strings. These are then statically linked into the executable so that instead of using byte offsets, the run-time can access the message array directly. This does increase the final executable size, but only by about 64kb. For reference, the total executable size is on the order of 4Mb, most of which is library code.

stdin/stdout

The problem

The original source obviously relies heavily on printffputc, etc to write to stdout. Under Native Client these will end up in the console logs rather than appearing on screen. Using fgets and scanf to get input from the user are equally incorrect.

The solution

The solution to this was to write a thin wrapper around any console output. At this point, I also split the output into three:

  • screen printing
  • console printing
  • error printing

All of these functions call through to the Native Client module which then calls through to JavaScript methods.

Screen printing is used whenever the output should go to the, well, screen and in this case is appended to a textarea on the host HTML page. Console output is redirected to the JavaScript console where they can be read using the Developer Tools built in to Chrome, and error messages are added to a special span on the host HTML page that shows up red and fades over time. The game is also restarted whenever an error is produced, which is a far better approach than the original exit(-1).

Input from stdin was fixed by adding a method to the Native Client module that would call a callback when an ‘input’ message was passed from the JavaScript with a string parameter that comes from a text box on the HTML host page. This required some further changes as an synchronous call becomes an asynchronous call. See the next section for more on that.

Synchronous calls

The problem

As well as console output, the original source depended on fgets for reading user input. This is a synchronous call which has no analogue on the Native Client side. In fact, the whole game was originally written with an assumption that the game loop would block waiting for user input which, well, doesn’t work out all that well.

The solution

As mentioned above, fgets has been replaced by a call to the Native Client module that registers a callback to be called when the user submits text through the input control on the HTML page. The places that called fgets and assumed a synchronous return therefore had to be split into two functions. In some cases, they had to take in function pointers to call after the input had been received.

The main gameloop, that was essentially while (true) { turn(); }, has been replaced by a single call to the turn function that contains a single tick of the gameloop. When input is expected and received, a callback is called that once again calls the turn function.