Profiling and Optimizing the Javascript B+ Tree

In an earlier post, I built a B+ tree in Javascript.  This post is about improving it’s performance by profiling and optimizing it. You can grab the baseline code that is going to be optimized if you want to make the changes and run the demo in your own browser.

First of all launch the developer tools in Chromium (or Chrome if you have that installed).  I’m using version 15, but earlier versions probably have the same basic functionality.  Ctrl-Shift-I to launch the tools, then select the Profiles tab.  To start with, click the record button at the bottom, and then load the demo page.  Here’s what the baseline output looks like for me:

The demo run is inserting 500,000 entities, then requesting removal of 500,000 random keys, and then inserting 500,000 more random entities. Note that the removal pass will not remove all the entities inserted in the initial insertion pass. The baseline numbers are 9820ms/4463ms/9409ms.

My first guess as to a possible performance bottleneck was the fact that the leaf nodes maintain their collection of keys in a simple array, so insertion and removal would be potentially expensive.  Is this true?  Here are the baseline profile numbers:

What this is saying is that jQuery’s extend function accounts for over 14% of the CPU utilization, taking into account only the code executed in itself, and ignoring any code that it called via another function.  Therefore this is the single most expensive function in terms of direct CPU use.  It’s called most often in the constructor of internal and leaf nodes, and as the code stands as of this writing, two of those calls are simple pass throughs:

...
init:function(options) {
	options = $.extend(true, {
	}, options);
	this._super(options);
	var thiz = this;
},
...

So eliminate those and rerun the demo. The insertion/removal/insertion numbers are now 7228ms/4442ms/7635ms

Now the extend function is no longer the top item. Instead it’s something called (program). This is essentially the CPU contribution of Chromium itself, and there isn’t too much you can do speed that up. But close behind that are the findIndex methods of the BPlusTreeInternalNode and BPlusTreeLeafNode classes. These are accounting for almost 30% of the CPU use now, and the profiler indicates this is almost entirely a direct contribution of code in the functions themselves.

These functional are essentially binary search operations. So the obvious thing to check first is the code inside the loop inside these functions. The first thing that comes to mind is to remove deep references and replace them with cached local variables.

// Don't do it like that...
for (...) {
	this._private.data[mid].key
}
// ... do it like this
var data = this._private.data
for   (...) {
	data[mid].key
}

Doing just this in the two findIndex methods shaves another 5% off the total execution time. As a point of comparison, making the same change in the insert or split methods has no effect. This is because the deep access doesn’t occur inside a loop, so the cost of referencing once and assigning to a local var is about the same as directly referencing the value thrice.

Now it’s worthwhile to take a second look at the original offender: jQuery’s extend function. It’s not as trivial to remove it’s use from the root node class. However, it’s ok to make an assumption about what constructor options may be passed in and optimize based on that, since no external code will call into the node constructors. So the BPlusTreeNode class constructor can be rewritten like this:

init:function(options) {
	this._private = {};
	this._private.order = 100;
	this._private.mergeThreshold = 40;
	this._private.data = [];
},

And the constructors for the node subclasses can be rewritten likewise. Doing this has a big impact, shaving off another 30% from the execution times. jQuery extend is now not an issue at all, and the aggregate performance gains relative to the baseline are 50%. At this point, the performance numbers are 3647ms/3737ms/4740ms.

The optimization possibilities for the findIndex methods are narrowing. A minor gain can be achieved by changing the order of the conditional inside the loop to the following:

if (data[mid].key<key) {
	left = mid+1;
}
else if (data[mid].key>key) {
	right = mid;
}
else {
	found = true;
}

This reduces the run time by another couple % because the first two branches of the conditional are more likely to occur than the third. Times are 3383ms/3574ms/4517ms.

So far all the optimizations have been rather shallow and local in nature. It’s a good sampling of basic Javascript performance considerations, and the changes made have had a significant impact. In a future post, it may be interesting to explore optimization of the algorithm itself, potentially taking into account domain specific knowledge of how it may be used.

Advertisements

1 Comment

  1. Thank you so much for this type of posts, I appreciate your work

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: