Spatio and Spatio-Temporal Aggregation

Spatial and Spatio-Temporal Aggregation

Funded by

National Science Foundation through grants IIS-0100436 and EIA-0080123.

Summary

Aggregate functions compute a scalar value, such as the average salary, when applied to a set of tuples. Aggregate computation is expensive, especially when time-varying attributes are involved in it. The task of computing aggregates becomes more challenging for spatial and spatio-temporal databases, as the spatial and temporal extent over which the aggregate value holds must be computed. The goal of the proposed research is to develop a suite of techniques for computing spatial and spatio-temporal aggregates. We will generalize existing temporal aggregation and spatial join algorithms to create efficient algorithms for computing spatial aggregates, then further generalize these algorithms to accommodate spatio-temporal aggregates. We will then tune the algorithms for limited buffer space, so that the database size that the new approaches can deal with is not constrained by the size of available memory. Then we will work on developing scalable techniques by parallelizing the proposed aggregation algorithms on a shared-nothing architecture. We also propose to design and implement new sequential and parallel algorithms for computing aggregates that exploit spatial index structures such as R-trees and spatio-temporal index structures.

Background

A spatio-temporal database associates facts with points or regions in a multidimensional space, the dimensions being space and time. Examples of spatio-temporal databases include the following.

a two-dimensional cadastral database, over latitude and longitude, recording ownership of property a two-dimensional bitemporal database, over valid time and transaction time, recording when facts were valid in reality and when they were recorded in the database
a three-dimensional atmospheric database, over latitude, longitude, and altitude, recording concentrations of chemicals such as water vapor and ozone
a three-dimensional hydrologic database, over latitude, longitude, and validtime, recording amount of rainfall and accumulation of surface water reservoirs over time
a four-dimensional Earth Observing System (EOS) database, over latitude, longitude, elevation, and valid time, providing a historical record of measurements of the Earth and its atmosphere, calibrated into radiometric units yielding a multi-band raster file

Aggregate computation is expensive when evaluated over a single dimension. For example, finding the (time-varying) maximum salary of professors in the Computer Science Department involves computing the temporal extent of each maximum value, which requires determining the tuples that overlap each temporal instant. As a matter of fact, four of the seventeen TPC-D queries are temporal in nature; another ten queries involve temporal selection in the where clause.

The task of computing aggregates becomes even more challenging over multiple dimensions, where the problem is complicated by having to compute the spatial and temporal extent over which the aggregate value holds. Consider the following cases.

The value of an attribute associated with a region in space may vary over time. An example is the percentage of cloud cover over each 10-kilometer-square grid element. Here, the object (\ie, cloud) is identified spatially, and each object is associated with a time-varying sequence of values. Both the temporal and the spatial coordinates are required to uniquely identify a value.
The value of an attribute, which varies over time, may be associated with a cartographic feature that also varies over time. An example is the appraised value of a land packet. The appraised value may change yearly, and the boundary of the land packet changes as it is reapportioned and parts sold to others. As with the previous two cases, both temporal and spatial coordinates are required to identify a value, but the temporal coordinate(s) are utilized twice, first to identify a cartographic feature and then to select a particular attribute value associated with that feature.
Multi-attribute images are often the result of a query in the EOS environments, which retrieves all of the measurements called IFOVs (Instantaneous Field Of View) incident on the pixels that satisfy the spatio-temporal bounds of the query. Each pixel of multi-attribute images has several values associated with it. For example, Advanced Very High Resolution Radiometer (AVHRR) data consists of five separate channels. A multi-valued image for this sensor type might include a value for each of the five channels for each pixel. Attribute values are determined from the set of IFOVs contained in a pixel via an aggregate function such as MIN and MAX.

This aggregate computation can be evaluated in a sequential fashion, or can be parallelized. The parallel processing technology becomes even more attractive as the size of data-intensive applications grows, as evidenced in OLAP and data warehousing environments.

It should be noted that the problem of computing spatio-temporal aggregates is different from the relational aggregation that can often be seen in the data warehousing environment. While data items in the data warehousing environment are envisioned as points in their data domain, we deal with spatio-temporal data associated with multidimensional regions. Exploding the regions into sets of multidimensional point data for the purpose of aggregate computation is likely to be extremely expensive.

People

Investigators:

Bongki Moon (PI)
Richard Snodgrass (co-PI)

Doctoral students:

Publications

Wonik Choi, Bongki Moon, and Sukho Lee, "Adaptive Cell-Based Index for Moving Objects," Data and Knowledge Engineering 48(1):75-101, January 2004.
Faiz Currim, Sabah Currim, Curtis Dyreson and Richard T. Snodgrass, "Effecting Data Independence for Temporal XML Schemas," in Proceedings of the International Conference on Extending Data Base Technology, Crete, 12 pages, 2004.
Joseph Dunn, Sean Davey, Anne Descour and Richard T. Snodgrass, "Sequenced Subset Operators: Definition and Implementation," Proceedings of the IEEE International Conference on Data Engineering (ICDE2002), February, 2002, pp. 131-140.
Dengfeng Gao, Jose A. G. Gendrano, Bongki Moon, Richard T. Snodgrass, Minseok Park, Bruce C. Huang, and James M. Rodrigue, "Exploiting Main Memory for Efficient Parallel Aggregation for Temporal Databases." to appear in Distributed and Parallel Databases Journal, 2004, 45 pages..
Dengfeng Gao, Christian S. Jensen, Richard T. Snodgrass and Michael D. Soo, "Join Operations in Temporal Databases," to appear in the Very Large Databases Journal, 2004, 31 pages. Also TimeCenter TR-71. [.pdf]
Dengfeng Gao and Richard T. Snodgrass, "Temporal Slicing in the Evaluation of XML Queries," in the Proceedings of the International Conference on Very Large Databases, Berlin, September, 2003, Also TimeCenter TR-72. [.pdf]
Philip J. Harding, Quanzhong Li and Bongki Moon, XISS/R: XML Indexing and Storage System Using RDBMS" in Proceedings of VLDB03, pages 1073-1076, (Demo paper), September 2003, Berlin, Germany.
Seokjin Hong, Bongki Moon and Sukho Lee, "Processing Range Top-k Queries in a Sparse Data Cube," in Proceedings of the 2004 International Conference on Information and Knowledge Engineering (IKE'04), June 2004, Las Vegas, 2004.
Vijay Khatri, Sudha Ram and Richard T. Snodgrass, ST USM: Bridging the Semantic Gap with a Spatio-Temporal Conceptual Model, TimeCenter TR-64, November 2001, 75 pages. [.pdf]
Vijay Khatri, Sudha Ram, Richard T. Snodgrass and Grady O'Brien, "Supporting User-defined Granularities and Indeterminacy in a Spatiotemporal Conceptual Data Model," Annals of Mathematics and Artificial Intelligence on Spatial and Temporal Granularity, 36(1-2): 195-232, 2002. Also TimeCenter TR-55 [.pdf]
Vijay Khatri, Sudha Ram, and Richard T. Snodgrass, "Augmenting a Conceptual Model with Spatio-Temporal Annotations." to appear in IEEE Transactions on Knowledge and Data Engineering, 2004, 35 pages.
Vijay Khatri, Sudha Ram and Richard T. Snodgrass, "Augmenting Database Design-Support Environments to Capture the Spatio-Temporal Data Semantics," submitted to Information Systems, 37 pages, also TimeCenter Technical Report TR-73, 2003. [.pdf]
Taewon Lee, Bongki Moon and Sukho Lee, "Bulk Insertion for R-tree by Seeded Clustering," in Proceedings of 14th International Conference on Database and Expert Systems Applications (DEXA'2003), pages 129-138, September 2003, Prague, Czech Republic.
Wei Li, Dengfeng Gao, and Richard T. Snodgrass, "Skew Handling Techniques in Sort-Merge Join," to appear in the Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2002, 24 pages. [.pdf] Also TimeCenter TR-62 [.pdf]
Quanzhong Li, Ines Fernando Vega Lopez and Bongki Moon, "Skyline Index for Time Series Data," to appear in IEEE Transactions on Knowledge and Data Engineering, 2004.
Quanzhong Li and Bongki Moon, "Partition Based Path Join Algorithms for XML Data," in Proceedings of 14th International Conference on Database and Expert Systems Applications (DEXA'2003), pages 160-170, September 2003, Prague, Czech Republic.
Haitao Liu, "tDOM: A Time-Aware API for Managing Temporal XML Documents," TimeCenter Technical Report TR-74, 2003. [.pdf]
Sudha Ram, Richard T. Snodgrass, Vijay Khatri and Y. Hwang, "DISTIL: A Design Support Environment for Conceptual Modeling of Spatiotemporal Requirements," in Proceedings of the International Conference on Conceptual Modeling (ER2001), Yokohama, Japan, November, 2001, pp. 70-83.
Praveen R. Rao and Bongki Moon, "PRIX: Indexing and Querying XML Using Prufer Sequences," in Proceedings of ICDE04, pages 288-299, March 2004, Boston, MA.
Hyoseop Shin, Bongki Moon and Sukho Lee, "Tie-Breaking Strategies for Fast Distance Join Processing," Data and Knowledge Engineering, 41(1):67-83, April 2002.
Hyoseop Shin, Bongki Moon and Sukho Lee, "Partition-Based Similarity Join Processing in High Dimensional Data Spaces," in Proceedings of DEXA02, September 2002, Aix-en-Provence, France.
Hyoseop Shin, Bongki Moon and Sukho Lee, "Adaptive and Incremental Processing for Distance Join Queries," IEEE Transaction on Knowledge and Data Engineering 15(6):1561-1578, Nov/Dec, 2003.
Richard T. Snodgrass, Stanley Shiyong Yao and Christian Collberg, "Tamper Detection in Audit Logs," to appear in Proceedings of the International Conference on Very Large Databases, 12 pages, 2004.
Paolo Terenziani and Richard T. Snodgrass, "Reconciling Point-Based and Interval-Based Semantics in Temporal Databases: A Proper Treatment of the Telic/Atelic Distinction," IEEE Transactions on Knowledge and Data Engineering, 16(5):540-551, May 2004. Also TimeCenter TR-60 [.pdf].
Kristian Torp, Christian S. Jensen and Richard T. Snodgrass, "Modification Semantics in Now-Relative Databases," to appear in Information Systems, 2004, 34 pages.
Ines F. Vega Lopez, Richard T. Snodgrass, and Bongki Moon, "Spatiotemporal Aggregate Computation: A Survey," submitted to IEEE Transactions on Knowledge and Data Engineering, 40 pages, also TimeCenter TR-77, 2004. [.pdf]

[Top of Page | Department Home Page]
http://www.cs.arizona.edu/stagg/

Bongki Moon (bkmoon@cs.arizona.edu)

Last updated May 28, 2004