Structured Data Management for AJAX Applications, Part 2

Part 1 of this series introduced Stache, a Javascript library for structured data management in AJAX applications. Part 2 describes how to construct data schemas and work with associations, indexing, and constraints. To run the code samples below, you’ll need to include two Javascript files in addition to stache.js: common.js and bplus.js.

As an example scenario, consider a social networking app, which has users, profiles, posts, and ratings. These can be modeled as a Stache schema like this:

var db = new com.anvesaka.stache.Stache({
	id:"db"
});
db.addTableSpec({
	name:"com.anvesaka.stache.user",
	schema:{
		username:{
			type:"string",
			index:true
		},
		posts:{
			type:"set",
			association:"com.anvesaka.stache.post",
			inverse:"author"
		},
		ratings:{
			type:"set",
			association:"com.anvesaka.stache.rating",
			inverse:"rater"
		},
		follows:{
			type:"set",
			association:"com.anvesaka.stache.user"
		},
		friends:{
			type:"set",
			association:"com.anvesaka.stache.user",
			inverse:"friends"
		},
		profile:{
			type:"object",
			association:"com.anvesaka.stache.profile",
			inverse:"user"
		}
	}
});
db.addTableSpec({
	name:"com.anvesaka.stache.profile",
	schema:{
		user:{
			type:"object",
			association:"com.anvesaka.stache.user",
			inverse:"profile"
		},
		name:{
			type:"string"
		},
		email:{
			type:"string"
		}
	}
});
db.addTableSpec({
	name:"com.anvesaka.stache.post",
	schema:{
		title:{
			type:"string",
			index:true,
			unique:false
		},
		content:{
			type:"string",
			unique:false
		},
		author:{
			type:"object",
			association:"com.anvesaka.stache.user",
			index:true,
			unique:false,
			inverse:"posts"
		},
		ratings:{
			type:"set",
			association:"com.anvesaka.stache.rating",
			inverse:"post"
		}
	}
});
db.addTableSpec({
	name:"com.anvesaka.stache.rating",
	schema:{
		value:{
			type:"number",
			index:true,
			unique:false
		},
		post:{
			type:"object",
			association:"com.anvesaka.stache.post",
			index:true,
			unique:false,
			inverse:"ratings"
		},
		rater:{
			type:"object",
			association:"com.anvesaka.stache.user",
			index:true,
			unique:false,
			inverse:"ratings"
		}
	}
});
// Activate the database.
db.activate();

There’s a lot to digest there. The first few lines set up a Stache instance to use as a local “database”. Table specifications are then added to the database, and finally, it’s activated, which causes the table specifications to be transformed into actual tables, with associations, indexes, and constraints as specified in the schema.

Every table specification consists of a name and a schema section. The name should be unique within the scope of the Stache instance to which this table spec belongs.

The schema section is an object whose property names correspond to the property names of the objects that will be stored in the table. The property values of the schema object describe the characteristics of the corresponding properties in the stored objects. So for the schema above, a rating object sent back from the server would contains at least the following:

var rating = {
	value:...,
	post:{
		...
	},
	rater:{
		...
	}
}

For each property, a number of configuration options can be specified (default values in bold):

  • type [object|set|list|number|string|boolean] – string value describing the type of data expected as a value for this property
  • index [true|false] – boolean value describing whether or not the property should be indexed, making it queryable
  • unique [true|false] – boolean value describing whether or not the property must contain a unique value
  • association – a table name, indicating that this property forms an association to entities in the corresponding table
  • inverse – a property name on an associated table, indicating that the association reference should be synchronized during calls to mutator methods

Additionally, each table is automatically augmented with an “id” property, which is a unique, indexed, non-associated, numeric property, and serves as a primary key for the table.

Now with this schema defined, and the Stache instance activated, we can work with actual data. As with the prior post, there is no actual server, data is created inline so the discussion can focus on how to use Stache. Each example code block below is cumulative. First, define some variables we’ll use throughout the examples:

var userTable = db.getTable("com.anvesaka.stache.user");
var profileTable = db.getTable("com.anvesaka.stache.profile");
var postTable = db.getTable("com.anvesaka.stache.post");
var ratingTable = db.getTable("com.anvesaka.stache.rating");

var user;
var profile;
var post;
var rating;

Simple object creation:

var user1 = {
	id:1,
	username:"sbasu"
};
userTable.merge(user1);
user = userTable.get(["ID", 1]);
com.anvesaka.common.assert(user.getUsername()=="sbasu");			
com.anvesaka.common.assert(user.getId()==1);			

Idempotence of object creation:

userTable.merge(user1);
user = userTable.get(["ID", 1]);
com.anvesaka.common.assert(user.getUsername()=="sbasu");			
com.anvesaka.common.assert(user.getId()==1);			

Demonstration of uniqueness constraints:

var user2 = {
	id:2,
	username:"sbasu"
};
try {
	userTable.merge(user2);
}
catch (e) {
	com.anvesaka.common.log(com.anvesaka.common.LOG_WARN, e.message);			
}
user2 = {
	id:2,
	username:"bledbetter"
};
userTable.merge(user2);
user = userTable.get(["ID", 2]);
com.anvesaka.common.assert(com.anvesaka.common.isDefined(user));			
com.anvesaka.common.assert(user.getId()==2);			
com.anvesaka.common.assert(user.getUsername()=="bledbetter");			

Implicit associations initialized as part of object creation:

var profile1 = {
	id:1,
	name:"Santanu Basu",
	email:"santanu.basu@email.com",
	user:user1
}
var profile2 = {
	id:2,
	name:"Brain Ledbetter",
	email:"brian.ledbetter@email.com",
	user:user2
}
profileTable.merge([profile1, profile2]);
com.anvesaka.common.assert(profile1.getUser().getId()==user1.getId());			
com.anvesaka.common.assert(user1.getProfile().getId()==profile1.getId());			
com.anvesaka.common.assert(profile2.getUser().getId()==user2.getId());			
com.anvesaka.common.assert(user2.getProfile().getId()==profile2.getId());			

Referential equivalence:

profile = profileTable.get(["ID", 2]);
profile.setEmail("brian.ledbetter@beardhat.com");
com.anvesaka.common.assert(profile2.getEmail()=="brian.ledbetter@beardhat.com");			

Merging of object graphs, with implied associations:

var user3 = {
	id:3,
	username:"sabrams",
	profile:{
		id:3,
		name:"Steve Abrams",
		email:"steve.abrams@email.com"
	}
};
var user4 = {
	id:4,
	username:"mberlan",
	profile:{
		id:4,
		name:"Mike Berlan",
		email:"mike.berlan@email.com"
	}
};
userTable.merge([user3, user4]);
profile = profileTable.get(["ID", 3]);			
com.anvesaka.common.assert(com.anvesaka.common.isDefined(profile));			
com.anvesaka.common.assert(profile.getUser().getId()==user3.getId());			
com.anvesaka.common.assert(user3.getProfile().getId()==profile.getId());			

Nested object modification:

userTable.merge({
	id:4,
	profile:{
		id:4,
		email:"mike.berlan@beardhat.com"
	}
});
profile = profileTable.get(["ID", 4]);
com.anvesaka.common.assert(profile.getEmail()=="mike.berlan@beardhat.com");			

Unidirectional self association:

user1.addFollows([user2, user3]);
com.anvesaka.common.assert(user1.getFollows().length==2);
com.anvesaka.common.assert(user2.getFollows().length==0);

Bidirectional self association:

user2.addFriends([user3, user4]);
com.anvesaka.common.assert(user2.getFriends().length==2);
com.anvesaka.common.assert(user3.getFriends().length==1);
com.anvesaka.common.assert(user4.getFriends().length==1);

Idempotence of set associations:

user2.addFriends([user3, user4]);
com.anvesaka.common.assert(user2.getFriends().length==2);
com.anvesaka.common.assert(user3.getFriends().length==1);
com.anvesaka.common.assert(user4.getFriends().length==1);

Implicit set associations:

user4.addPosts([{
	id:1,
	title:"Post 1",
	content:"Content 1",
}, 
{
	id:2,
	title:"Post 2",
	content:"Content 2",
},
{
	id:3,
	title:"Post 3",
	content:"Content 3",
}]);
com.anvesaka.common.assert(user4.getPosts().length==3);
post = postTable.get(["ID", 3]);
com.anvesaka.common.assert(post.getAuthor().getId()==user4.getId());
com.anvesaka.common.assert(post.getTitle()=="Post 3");

Object deletion, with implicit set dissociation:

postTable.remove(post);
post = postTable.get(["ID", 3]);
com.anvesaka.common.assert(com.anvesaka.common.isNotDefined(post));
com.anvesaka.common.assert(user4.getPosts().length==2);

Create some numeric ratings:

var post1 = postTable.get(["ID", 1]);
var post2 = postTable.get(["ID", 2]);
ratingTable.merge({
	id:1,
	value:1.0,
	rater:user1,
	post:post1
});
ratingTable.merge({
	id:2,
	value:2.0,
	rater:user2,
	post:post1
});
ratingTable.merge({
	id:3,
	value:3.0,
	rater:user3,
	post:post1
});
ratingTable.merge({
	id:4,
	value:4.0,
	rater:user4,
	post:post1
});
ratingTable.merge({
	id:5,
	value:3.0,
	rater:user4,
	post:post2
});

Criteria queries:

var ratings = ratingTable.list(["RANGE", "value", 0.0, 2.5]);
com.anvesaka.common.assert(ratings.length==2);
com.anvesaka.common.assert(ratings[0].getId()==1);
com.anvesaka.common.assert(ratings[1].getId()==2);
ratings = ratingTable.list(["AND", ["RANGE", "value", 0.0, 3.5], ["EQ", "rater", user4.getId()]]);
com.anvesaka.common.assert(ratings.length==1);
com.anvesaka.common.assert(ratings[0].getId()==5);

These examples illustrate Stache syntax and usage patterns. In a future post, topics like performance, best practices, and common pitfalls will be discussed.

Structured Data Management for AJAX Applications, Part 1

The AJAX application model lends itself to dynamic, responsive user interfaces.  However, as the application and data model increase in complexity, it becomes harder to keep track of data in a consistent and manageable manner.  This is the first of a series of posts introducing a Javascript library I created called Stache that aims to bring lightweight structured data management to the client side, analogous to the many systems that are common on the server side.

I say analogous, because it’s important to keep in mind that this is not “SQL for the browser”, or a full ORM solution a la Hibernate, or even a means of achieving long term data persistence.  It’s just a simple, relatively transparent way to work with objects that avoids the pitfalls of an ad hoc approach.

The best way to show what it does and how it works is a simple example. Let’s say you have an geospatial AJAX application that needs to work with location fixes on the client side. These have properties such as latitude, longitude, fix time, etc. As you view different parts of the map in this app, the server sends back lists of fixes. To simplify this example, there is no actual server, the data is simply created by instantiating plain old Javascript objects.

So here’s some initial fixes:

var fixes = [
	{
		id:1,
		lat:48.0,
		lon:-74.0,
		fixTime:1340054017000
	},
	{
		id:2,
		lat:50.0,
		lon:-76.0,
		fixTime:1340055017000
	},
	{
		id:3,
		lat:48.0,
		lon:-72.0,
		fixTime:1340056017000
	}
];

Using Stache, this list of fixes would be merged into a common data store, like this:

fixTable.merge(fixes);	

The fixTable is an instance of a data structure used by Stache to track entities on the client side. A follow up post describes how to create these structures. For now, just assume that it exists.

Once POJOs are merged, they become available from anywhere in the app, without the need to make a server request again. Moreover, all references to merged entities point to the same object. Merged entities can be queried, modified, and deleted. They are enriched with mutator and accessor methods, which are used to get and set their field values. Here’s some examples of how to interact with merged entities:

var fix, fixes, lat, lon;
fix = fixTable.get(["ID", 1]);
fix.setLat(49.0);
lon = fix.getLon();
fixes = fixTable.list(["ID", 1, 2]);
fixes = fixTable.list(["RANGE", "lat", 40.0, 50.0]);
fixes = fixTable.list(["AND", 
			["RANGE", "lat", 40.0, 50.0], 
			["RANGE", "lon", -60.0, -80.0]
]);
fixTable.remove(fix);

At some later point, the server might send back more fixes, and those may partially overlap with the existing ones:

fixes = [
	{
		id:3,
		lat:48.0,
		lon:-72.0,
		fixTime:1340056017000
	},
	{
		id:4,
		lat:49.0,
		lon:-72.0,
		fixTime:1340057017000
	},
	{
		id:5,
		lat:50.0,
		lon:-72.0,
		fixTime:1340058017000
	}
];

It’s important to be able to work with the new data and the old data transparently. In particular, fix 3 should not be duplicated, while fix 4 and 5 should be available for querying, modification, etc. Fortunately, the same merge operation used to initialize the fix set earlier can be used to add new data, as well as reconcile duplicates, all in one call:

fixTable.merge(fixes);	

After merging the new data, the same sort of query and modification code above can be used again to work with the full integrated data set. Thus, the usage pattern for Stache is roughly as follows:

  • Obtain data from server
  • Merge data into local tables
  • Query, modify, delete data
  • Repeat as necessary

The example described here was intentionally kept simple, and ignores much of the functionality provided by Stache, including:

  • Data schemas
  • Object, list, and set associations
  • Automatic synchronization of inverse associations
  • Uni-directional and bi-directional self association
  • Property indexing and constraints
  • Deep merging of object graphs

Follow up posts will illustrate these features using more advanced examples, and provide runnable sample code. In the meantime, you can get Stache from GitHub if you want to take it for a spin.

Profiling and Optimizing the Javascript B+ Tree, Update 1

In the previous post, I neglected to demonstrate one important performance consideration, which is the type of key used to insert elements into the B+ tree. The baseline code was using string keys, as were all improved versions discussed. If instead numeric keys are used, the performance improves significantly. Insertion/removal/insertion times are now 2037ms/2161ms/2522ms, which is a further 40% improvement over the fastest previous version, and 70% faster than the baseline.

Additionally, there was a design flaw in the range retrieval method (as well as a bug introduced while optimizing the code, which has been fixed). The design flaw had significant impact on performance, resulting in range retrievals taking several seconds per invocation for ranges that returned 10k or 100k rows or more. Essentially, the use of Array.concat was the culprit, you can compare the current code to the baseline to see the changes. After fixing this, range retrieval is again very fast. Retrieval of over 500k elements in a range of 1M takes ~10ms.

Profiling and Optimizing the Javascript B+ Tree

In an earlier post, I built a B+ tree in Javascript.  This post is about improving it’s performance by profiling and optimizing it. You can grab the baseline code that is going to be optimized if you want to make the changes and run the demo in your own browser.

First of all launch the developer tools in Chromium (or Chrome if you have that installed).  I’m using version 15, but earlier versions probably have the same basic functionality.  Ctrl-Shift-I to launch the tools, then select the Profiles tab.  To start with, click the record button at the bottom, and then load the demo page.  Here’s what the baseline output looks like for me:

The demo run is inserting 500,000 entities, then requesting removal of 500,000 random keys, and then inserting 500,000 more random entities. Note that the removal pass will not remove all the entities inserted in the initial insertion pass. The baseline numbers are 9820ms/4463ms/9409ms.

My first guess as to a possible performance bottleneck was the fact that the leaf nodes maintain their collection of keys in a simple array, so insertion and removal would be potentially expensive.  Is this true?  Here are the baseline profile numbers:

What this is saying is that jQuery’s extend function accounts for over 14% of the CPU utilization, taking into account only the code executed in itself, and ignoring any code that it called via another function.  Therefore this is the single most expensive function in terms of direct CPU use.  It’s called most often in the constructor of internal and leaf nodes, and as the code stands as of this writing, two of those calls are simple pass throughs:

...
init:function(options) {
	options = $.extend(true, {
	}, options);
	this._super(options);
	var thiz = this;
},
...

So eliminate those and rerun the demo. The insertion/removal/insertion numbers are now 7228ms/4442ms/7635ms

Now the extend function is no longer the top item. Instead it’s something called (program). This is essentially the CPU contribution of Chromium itself, and there isn’t too much you can do speed that up. But close behind that are the findIndex methods of the BPlusTreeInternalNode and BPlusTreeLeafNode classes. These are accounting for almost 30% of the CPU use now, and the profiler indicates this is almost entirely a direct contribution of code in the functions themselves.

These functional are essentially binary search operations. So the obvious thing to check first is the code inside the loop inside these functions. The first thing that comes to mind is to remove deep references and replace them with cached local variables.

// Don't do it like that...
for (...) {
	this._private.data[mid].key
}
// ... do it like this
var data = this._private.data
for   (...) {
	data[mid].key
}

Doing just this in the two findIndex methods shaves another 5% off the total execution time. As a point of comparison, making the same change in the insert or split methods has no effect. This is because the deep access doesn’t occur inside a loop, so the cost of referencing once and assigning to a local var is about the same as directly referencing the value thrice.

Now it’s worthwhile to take a second look at the original offender: jQuery’s extend function. It’s not as trivial to remove it’s use from the root node class. However, it’s ok to make an assumption about what constructor options may be passed in and optimize based on that, since no external code will call into the node constructors. So the BPlusTreeNode class constructor can be rewritten like this:

init:function(options) {
	this._private = {};
	this._private.order = 100;
	this._private.mergeThreshold = 40;
	this._private.data = [];
},

And the constructors for the node subclasses can be rewritten likewise. Doing this has a big impact, shaving off another 30% from the execution times. jQuery extend is now not an issue at all, and the aggregate performance gains relative to the baseline are 50%. At this point, the performance numbers are 3647ms/3737ms/4740ms.

The optimization possibilities for the findIndex methods are narrowing. A minor gain can be achieved by changing the order of the conditional inside the loop to the following:

if (data[mid].key<key) {
	left = mid+1;
}
else if (data[mid].key>key) {
	right = mid;
}
else {
	found = true;
}

This reduces the run time by another couple % because the first two branches of the conditional are more likely to occur than the third. Times are 3383ms/3574ms/4517ms.

So far all the optimizations have been rather shallow and local in nature. It’s a good sampling of basic Javascript performance considerations, and the changes made have had a significant impact. In a future post, it may be interesting to explore optimization of the algorithm itself, potentially taking into account domain specific knowledge of how it may be used.

Laziness, DTOs and the Open Session in View (Anti)Pattern

If you’ve built web applications using an ORM library like Hibernate, you’ve probably run into the “open session in view” (anti)pattern.  The online discussion of this swings back and forth, and I’m not here to comment on whether this is or is not a good idea.  But generally, all the posts on this topic take a few common considerations into account:

  • Is the database session kept open outside the service layer (and potentially into the view layer)?
  • Are domain objects re-used as data transfer objects (DTOs) or otherwise exposed beyond the controller?
  • Are dedicated DTO classes created, and if so, how is the transfer of domain data into the DTO managed/automated?
  • Are associations fetched eagerly or lazily, and if lazily, when/where/how are child entities manually fetched?

There may be no single right way to approach this, so it’s really just a matter of what your preferences are, and how you weight each of the trade offs in your design.  I’ve taken a certain approach in most of my past projects, which rests on the following answers to the preceding questions:

  • I prefer to terminate my database sessions at the boundary between the service layer and the MVC controllers.
  • I don’t have an objection to re-using domain entities as DTOs providing they are are outside the scope of a database session.
  • I generally prefer my associations to be lazy.
  • I assume controllers can have semantic awareness of the domain models they pass on to the view.

Given these statements, the approach I’ve taken has been to create service methods that accept one or more Fetcher objects.  A Fetcher is an implementation of an interface like this:

public interface Fetcher<a> {
	public void fetch(A entity);
}

Service methods look like this:

public A findByUsername(String username, Fetcher<A>... fetchers) {
	A user;
	user = domainDao.findByUsername(username);
	fetch(user, fetchers);
	return user;
}

Typically, service classes derive from a common base class providing the fetch method:

protected <A extends DomainEntity> void fetch(A entity, Fetcher<A>... fetchers) {
	for (Fetcher<A> fetcher : fetchers) {
		fetcher.fetch(entity);
	}
}

Here, DomainEntity is just the common base class of all entity classes in the domain model. Given this approach, a controller method can invoke service methods like this:

User user = userService.find(username, new Fetcher() {
	public void fetch(User user) {
		user.getFriends().size();
	}
});

So a couple things stand out here:

  • The controller understands that users have friends, and that the web request that is invoking this code wants the friends to be returned with the user.
  • The actual retrieval of the friends associated with this user is done by code defined in the controller, but executed in the service method.
  • The fetching of the friend entities is handled by the underlying ORM’s understanding of the relationships between entities that is already present in the ORM configuration.
  • The same service method can be used to retrieve different associated entities, by passing in a different Fetcher.
  • The domain entities passed outside the service layer remain mutable, however, changes will not be persisted to the database.

This approach has worked out well for me in most situations where the topics of DTOs, session scope, and lazy associations have been relevant.  It’s a good compromise between a layered architecture, a DRY data model, and a performant fetch strategy.

Next thing to take a look at is Hibernate fetch profiles and see if it serves the same purpose and is equally (or more) flexible for what I need.

A Simple Near Duplicate Detection Algorithm

Near duplicate detection (NDD) is the operation whereby two documents are tested for partial similarity.  A typical application of this would be to ignore similar web pages for a search engine result set.  These sorts of partly similar pages are common nowadays due to syndication, re-posting, media content pools, etc.

The problem with NDD (vs. exact duplicate detection) is that you can’t just hash the document.  You have to generate some sort of signature that, when compared with other such signatures, generates a numerical measure of similarity, not a boolean answer.  There are various types of algorithms out there to do this, for example, Cosine Similarity, N-Gram bases algorithms, and various algorithms that make assumptions about document content, such as this algorithm developed at Stanford a few years ago, or this one developed at Microsoft.

I implemented a variation of the N-Grams approach that has certain interesting properties.  The signatures are of fixed size, no matter what the size of the documents.  No assumptions are made about the document content.  Similarity computation for two documents is O(K) complexity.

The rough outline of the algorithm for generating the signatures is as follows:

  • Break document into N-Grams (N=4 or 5 tends to be a good value)
  • Hash the string value of each N-Gram
  • Toss out duplicate values in the list of hashes
  • Sort the resulting set of hashes
  • Take the top K values from the sorted set

At this point, similarity is measured by counting the common values in any two fixed length sorted signature sets, which is a simple operation of O(K).

I assume this algorithm has already been published somewhere, as I’d find it hard to believe something so straightforward is actually original.  However, I’m not too mathematically inclined, at least in a formal manner, so I haven’t really waded into the mess of Greek letters that is the centerpiece of many an academic paper.  If you know where this algorithm was originally proposed, feel free to provide a link in the comments.  Here’s a link to the code: https://santanubasu@github.com/santanubasu/NDD.git

The NDDSig class shows how to construct and use the signatures.  As an interesting example, I used a tricky test case posed at the end of this post on the LingPipe blog (LingPipe is a product of Alias-I, a leading natural language processing software company).  The problem described in the blog was that the two pairs of tweets were returning the same similarity measures (50%) even though one was an actual near duplicate and the other was not.

Here are the actual near duplicate tweets:

Vet, 77, Busted For Obama Death Threat | The Smoking Gun http://t.co/MrTUwxv via @

Vet, 77, Busted For Obama Death Threat http://tinyurl.com/25zyxgp #tcot #tlot #sgp

And here are the somewhat similar but not actual duplicate tweets.

Playing a show in Chicago, IL at 9:00 PM today at LE PASSAGE http://artistdata.com/a/32on

Playing a show in Cape Girardeau, MO at 9:00 PM today at The Venue http://artistdata.com/a/32ow

As described in their blog:

Despite sharing half the words they are clearly not retweets and contain different information. It might be quite difficult to automatically reject ‘obama’ near duplicates at 50% token overlap but retain the ‘today’ near duplicates with 50% overlap

The NDDSig class will calculate similarity measures for the first pair of tweets at 70%, and the second set at 50%.  Granted, this is not an exhaustive test case.  Perhaps in a future post I’ll run some more in depth tests to see where things stand.

A Javascript B+ Tree

This post describes a Javascript implementation of the B+ tree data structure.  If you’re just here for the code, and don’t want to read any of this jibber jabber, here’s the github link: https://github.com/santanubasu/BPlusJS.

Ok, so now a bit more about this.  First off, here’s a link to a page on wikipedia describing a B+ tree.  Speaking with a complete lack of rigor, the following are key features of this data structure:

  • Lookup by key is very fast
  • Retrieval of key ranges is almost as fast
  • Insertion and deletion aren’t too slow

This data structure is pretty common in computer science.  The filesystem on your OS probably uses some variation of this.  If you’re willing to trade off some space for some time, and you’ll be doing retrievals more often than insertions, a B+ tree can be useful.

So why do something like this in Javascript?  The idea came to mind because I was thinking about what it would take to build a simple client side in memory queryable database in Javascript.  The practicality or usefulness of this can be left to another blog post, but one of the items that underlies such a database is a B+ tree.  It can form the basis of database indices, and is one way some actual filesystem and databases organize their data.

I’ve written a rough cut at the B+ tree structure itself.  It does insertion, deletion, key lookup, and range retrieval.  For the purpose of visualizing the tree structure, there is a toString function in the top level BTree class that will pretty print all current data, illustrating the left/right child fan out.  The demo.html page shows some examples of how to use the tree class.  Random node insertion and deletion showing a return to an empty structure, as well as some rough performance metrics are printed to console.

On my laptop (quad core i7, 2.2 GHz, 8 GB of memory), using Chromium, the performance is decent (for a JS app anyways).  Insertion of 500,000 random objects takes about 10 seconds, deletion of the same takes about 5 seconds.  Retrieval of an object by it’s key or a range of 1000 contiguous objects is almost instantaneous.  I’ll post some performance data in a more structured format in a future post.

Time to Start a Blog

I’ve worked in tech long enough now that I have more than a few ideas bouncing around in my head.  They might even be good ideas, but let’s not get ahead of ourselves.

This blog is a place for me to jot stuff down.  Writing about things seems to be one way to gain a better understanding of them, so at the very least, this should be helpful for me.  If you find it useful or interesting, all the better.