In this talk I examine the problem of finding spatial associations and structure within large, dense, noisy data sets. This problem arises in a wide range of domains, including: data mining, pattern matching, computational statistics, and target tracking. The goal is simply to find all sets of points that conform to an underlying model or class of models. Unfortunately, this problem can be computationally expensive, suffering a combinatorial explosion in the number of potential point sets that must be tested. Traditional search algorithms based on spatial data structures such as kd-trees, R-trees or metric trees can help reduce this problem, but in this talk I will describe a new class of search algorithms on these structures that can often give substantially greater savings.
This work is motivated by the task of efficiently linking faint asteroid detections. As we begin to push down into the noise and search for increasingly faint asteroids, we see a significant increase in the number of both true detections and spurious noise detections. In order to survive this increase in the size, density, and noise, we need to use a new type of search algorithm that exploits structure from all aspects of the search.
I will present a new type of spatial search algorithm that uses a variable number of KD-trees. The use of a dynamic, variable number of tree nodes, allows the search to tightly adapt to structure in the problem. I will describe how this new approach addresses potential drawbacks in both the constructive and traditional multiple tree approaches. Further, I will demonstrate the algorithm's advantages on astronomical data and discuss the approach's applicability to other domains.