Thursday, May 04, 2006

Hierarchical Data (DB2 vs. MySQL)

I attended the MySQL User's Conference in Santa Clara last week -- big thanks to the organizers, it was an excellent event. While there, I attended a number of session talks, including one on hierarchical data handling in MySQL, presented by Mike Hillyer. I came away somewhat puzzled by the methodology of the handling of hierarchical data.

The method proposed to handle this kind of data was to used nested sets to encapsulate the data so that it can be processed through SQL query methods. In nested set theory, essentially you surround a root node with numerical values starting at 1, and going up to a given maximum number that allows you to give a left (1) and right (max) number to surround all your child nodes.

For example, if we had a parent with two child nodes, and each of those child nodes had two leaf nodes (leaf nodes are child nodes, but with the smallest possible data element), then the numbers would be:

parent: left => 1, right => 14
child: left => 2, right => 7
leaf: left => 3, right => 4
leaf: left => 5, right => 6
child: left => 8, right => 13
leaf: left => 9, right => 10
leaf: left => 11, right => 12

Then you can programmatically determine leaf nodes by a simple alegbraic formula that states that the leaf nodes are the only children where the left value + 1 = right value.

As interesting an implementation as this is (and you can read a more detailed exampled about it here), it still leaves me confused. Why not use a database that handles XML data trees (which is essentially what a hierarchical data set is) natively? The nested set theory is a much more inefficient means of data storage and retreival, and means that you have to continually rebase your node numbers if you add elements, and therefore you also need to keep re-indexing your database.

What I would recommend is that developers look into the native XML storage facilities of DB2's Viper release and see how much faster both data queries and database administration can be when you can handle hierarchical data in a more straightforward manner.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Google Analytics Alternative