Spatial and Spatio-Temporal Aggregation

Funded by

National Science Foundation through grants IIS-0100436 and EIA-0080123.

Summary

Aggregate functions compute a scalar value, such as the average salary, when applied to a set of tuples. Aggregate computation is expensive, especially when time-varying attributes are involved in it. The task of computing aggregates becomes more challenging for spatial and spatio-temporal databases, as the spatial and temporal extent over which the aggregate value holds must be computed. The goal of the proposed research is to develop a suite of techniques for computing spatial and spatio-temporal aggregates. We will generalize existing temporal aggregation and spatial join algorithms to create efficient algorithms for computing spatial aggregates, then further generalize these algorithms to accommodate spatio-temporal aggregates. We will then tune the algorithms for limited buffer space, so that the database size that the new approaches can deal with is not constrained by the size of available memory. Then we will work on developing scalable techniques by parallelizing the proposed aggregation algorithms on a shared-nothing architecture. We also propose to design and implement new sequential and parallel algorithms for computing aggregates that exploit spatial index structures such as R-trees and spatio-temporal index structures.

Background

A spatio-temporal database associates facts with points or regions in a multidimensional space, the dimensions being space and time. Examples of spatio-temporal databases include the following.

Aggregate computation is expensive when evaluated over a single dimension. For example, finding the (time-varying) maximum salary of professors in the Computer Science Department involves computing the temporal extent of each maximum value, which requires determining the tuples that overlap each temporal instant. As a matter of fact, four of the seventeen TPC-D queries are temporal in nature; another ten queries involve temporal selection in the where clause.

The task of computing aggregates becomes even more challenging over multiple dimensions, where the problem is complicated by having to compute the spatial and temporal extent over which the aggregate value holds. Consider the following cases.

This aggregate computation can be evaluated in a sequential fashion, or can be parallelized. The parallel processing technology becomes even more attractive as the size of data-intensive applications grows, as evidenced in OLAP and data warehousing environments.

It should be noted that the problem of computing spatio-temporal aggregates is different from the relational aggregation that can often be seen in the data warehousing environment. While data items in the data warehousing environment are envisioned as points in their data domain, we deal with spatio-temporal data associated with multidimensional regions. Exploding the regions into sets of multidimensional point data for the purpose of aggregate computation is likely to be extremely expensive.

People

Investigators:

Doctoral students:

Publications


[Top of Page | Department Home Page]
http://www.cs.arizona.edu/stagg/
Bongki Moon (bkmoon@cs.arizona.edu)
Last updated May 28, 2004