Colloquium Speaker

Speaker:	Jeremy Kubica, Carnegie Mellon Robotics Institute
Topic:	Variable Tree Algorithms and the Efficient Discovery of Spatial Associations and Structure
Date:	Tuesday, September 20, 2005
Time:	11:00AM
Place:	Gould-Simpson, Room 906

Refreshments will be served in the 9th floor atrium of Gould-Simpson at 10:45 AM

Abstract

In this talk I examine the problem of finding spatial associations and structure within large, dense, noisy data sets. This problem arises in a wide range of domains, including: data mining, pattern matching, computational statistics, and target tracking. The goal is simply to find all sets of points that conform to an underlying model or class of models. Unfortunately, this problem can be computationally expensive, suffering a combinatorial explosion in the number of potential point sets that must be tested. Traditional search algorithms based on spatial data structures such as kd-trees, R-trees or metric trees can help reduce this problem, but in this talk I will describe a new class of search algorithms on these structures that can often give substantially greater savings.

This work is motivated by the task of efficiently linking faint asteroid detections. As we begin to push down into the noise and search for increasingly faint asteroids, we see a significant increase in the number of both true detections and spurious noise detections. In order to survive this increase in the size, density, and noise, we need to use a new type of search algorithm that exploits structure from all aspects of the search.

I will present a new type of spatial search algorithm that uses a variable number of KD-trees. The use of a dynamic, variable number of tree nodes, allows the search to tightly adapt to structure in the problem. I will describe how this new approach addresses potential drawbacks in both the constructive and traditional multiple tree approaches. Further, I will demonstrate the algorithm's advantages on astronomical data and discuss the approach's applicability to other domains.