Hierarchical Agglomerative Clustering in n dimensions
I am really happy about the blogging thing. Anyhow somebody named Kokila Dhamodharan from India got in touch with me last month about hierarchical agglomerative clustering of images. It happens I know a thing or two about that. Basically you have an n-dimensional, metric space where the dimensions are properties of the image that you are interested in (color, variation, etc). When I did this I was comparing the partitions of a segmented image so I had color, size, shape, major axis etc.
I realize that the asymptotic complexity analysis of my paper was a bit slapdash. But who cares when you have run times of a second. I know that everything is less than O(n^3) anyway.
So here’s the naive approach: You calculate pairwise distances O(n^2). You have n^2 edges, and run an MST algorithm, which takes, with m=n^2, O(m log m) = O(n^2 log n).
Of course a really asymptotically efficient MST algorithm is essentially O(n). And you can approximate the set of pairwise distances — somehow or other…
Anyway, my code relied heavily on Pedro F. Felzenszwalb and Daniel P. Huttenlocher’s code, available here. I even reused their MST implementation. My project also used some code for an image format called ppm.
I sent my code over to the guy in India, but I am too lazy to really figure out what the license is, so I’m not posting it. It looks like it is MIT license compatible. Pedro’s new repackaging is GPL.
It would be fun to put either Pedro’s app and/or my app online.
Here are the results from my code:
