Spatial and Spatio-Temporal Aggregation
Funded by
National Science Foundation through grants IIS-0100436 and EIA-0080123.
Summary
Aggregate functions compute a scalar value, such as the average
salary, when applied to a set of tuples. Aggregate computation is expensive,
especially when time-varying attributes are involved in it. The task of
computing aggregates becomes more challenging for spatial and spatio-temporal
databases, as the spatial and temporal extent over which the aggregate
value holds must be computed. The goal of the proposed research is to develop
a suite of techniques for computing spatial and spatio-temporal aggregates.
We will generalize existing temporal aggregation and spatial join algorithms
to create efficient algorithms for computing spatial aggregates, then further
generalize these algorithms to accommodate spatio-temporal aggregates.
We will then tune the algorithms for limited buffer space, so that the
database size that the new approaches can deal with is not constrained
by the size of available memory. Then we will work on developing scalable
techniques by parallelizing the proposed aggregation algorithms on a shared-nothing
architecture. We also propose to design and implement new sequential and
parallel algorithms for computing aggregates that exploit spatial index
structures such as R-trees and spatio-temporal index structures.
Background
A spatio-temporal database associates facts with points or regions in a
multidimensional space, the dimensions being
space and time. Examples of spatio-temporal databases include the
following.
- a two-dimensional cadastral database, over latitude and
longitude, recording ownership of property a two-dimensional bitemporal
database, over valid time and transaction time, recording when facts were
valid in reality and when they were recorded in the database
- a three-dimensional atmospheric database, over latitude, longitude,
and altitude, recording concentrations of chemicals such as water vapor
and ozone
- a three-dimensional hydrologic database, over latitude, longitude,
and validtime, recording amount of rainfall and accumulation of surface
water reservoirs over time
- a four-dimensional Earth Observing System (EOS) database, over
latitude, longitude, elevation, and valid time, providing a historical
record of measurements of the Earth and its atmosphere, calibrated into
radiometric units yielding a multi-band raster file
Aggregate computation is expensive when evaluated over a single
dimension. For example, finding the (time-varying) maximum salary of
professors in the Computer Science Department involves computing the
temporal extent of each maximum value, which requires determining the
tuples that overlap each temporal instant. As a matter of fact, four of
the seventeen TPC-D
queries are temporal in nature; another ten queries
involve temporal selection in the where clause.
The task of computing aggregates becomes even more challenging over
multiple dimensions, where the problem is complicated by having to compute
the spatial and temporal extent over which the aggregate value holds.
Consider the following cases.
- The value of an attribute associated with a region in space may
vary over time. An example is the percentage of cloud cover over each
10-kilometer-square grid element. Here, the object (\ie, cloud) is
identified spatially, and each object is associated with a time-varying
sequence of values. Both the temporal and the spatial coordinates are
required to uniquely identify a value.
- The value of an attribute, which varies over time, may be associated
with
a cartographic feature that also varies over time. An example is the
appraised value of a land packet. The appraised value may change yearly,
and the boundary of the land packet changes as it is reapportioned and
parts sold to others. As with the previous two cases, both temporal and
spatial coordinates are required to identify a value, but the temporal
coordinate(s) are utilized twice, first to identify a cartographic feature
and then to select a particular attribute value associated with that
feature.
- Multi-attribute images are often the result of a query in the EOS
environments, which retrieves all of the measurements called IFOVs
(Instantaneous Field Of View) incident on the pixels that satisfy the
spatio-temporal bounds of the query. Each pixel of multi-attribute images
has several values associated with it. For example, Advanced Very High
Resolution Radiometer (AVHRR) data consists of five separate channels. A
multi-valued image for this sensor type might include a value for each of
the five channels for each pixel. Attribute values are determined from the
set of IFOVs contained in a pixel via an aggregate function such as MIN
and MAX.
This aggregate computation can be evaluated in a sequential fashion, or
can be
parallelized. The parallel processing technology becomes even more
attractive as
the size of data-intensive applications grows, as evidenced in OLAP and
data
warehousing environments.
It should be noted that the problem of computing spatio-temporal
aggregates is different from the relational aggregation
that can often be seen in the data warehousing environment. While data
items in the data warehousing environment are
envisioned as points in their data domain, we deal with spatio-temporal
data associated with multidimensional regions.
Exploding the regions into sets of multidimensional point data for the
purpose of aggregate computation is likely to be
extremely expensive.
People
Investigators:
Doctoral students:
Publications
- Wonik Choi, Bongki Moon, and Sukho Lee, "Adaptive Cell-Based Index for
Moving Objects," Data and Knowledge Engineering 48(1):75-101,
January 2004.
- Faiz Currim, Sabah Currim, Curtis Dyreson and Richard T. Snodgrass, "Effecting Data
Independence for Temporal XML Schemas," in
Proceedings of the International Conference on Extending Data Base
Technology, Crete, 12 pages, 2004.
- Joseph Dunn, Sean Davey, Anne Descour and Richard T. Snodgrass, "Sequenced
Subset Operators: Definition and Implementation," Proceedings of
the IEEE International Conference on Data Engineering (ICDE2002),
February, 2002, pp. 131-140.
- Dengfeng Gao, Jose A. G. Gendrano, Bongki Moon, Richard T. Snodgrass, Minseok Park, Bruce C. Huang,
and James M. Rodrigue, "Exploiting Main Memory for Efficient Parallel Aggregation
for Temporal Databases." to appear in Distributed and Parallel Databases
Journal, 2004, 45 pages..
- Dengfeng Gao, Christian S. Jensen, Richard T. Snodgrass and Michael D. Soo, "Join Operations
in Temporal Databases," to appear in the Very Large Databases
Journal, 2004, 31 pages. Also TimeCenter TR-71. [.pdf]
- Dengfeng Gao and Richard T. Snodgrass, "Temporal Slicing in the Evaluation of XML
Queries," in the Proceedings of the International Conference
on Very Large Databases, Berlin, September, 2003, Also
TimeCenter TR-72. [.pdf]
- Philip J. Harding, Quanzhong Li and Bongki Moon, XISS/R: XML Indexing and
Storage System Using RDBMS" in Proceedings of VLDB03, pages
1073-1076, (Demo paper), September 2003, Berlin, Germany.
- Seokjin Hong, Bongki Moon and Sukho Lee, "Processing Range Top-k Queries in
a Sparse Data Cube," in Proceedings of the 2004 International
Conference on Information and Knowledge Engineering (IKE'04), June 2004,
Las Vegas, 2004.
- Vijay Khatri, Sudha Ram and Richard T. Snodgrass, ST USM: Bridging the Semantic
Gap with a Spatio-Temporal Conceptual Model,
TimeCenter TR-64,
November 2001, 75 pages. [.pdf]
- Vijay Khatri, Sudha Ram, Richard T. Snodgrass and Grady O'Brien, "Supporting User-defined
Granularities and Indeterminacy in a Spatiotemporal Conceptual Data Model,"
Annals of Mathematics and Artificial Intelligence on Spatial and
Temporal Granularity, 36(1-2): 195-232, 2002. Also TimeCenter TR-55 [.pdf]
- Vijay Khatri, Sudha Ram, and Richard T. Snodgrass, "Augmenting a Conceptual
Model with Spatio-Temporal Annotations." to appear in IEEE
Transactions on Knowledge and Data Engineering, 2004, 35 pages.
- Vijay Khatri, Sudha Ram and Richard T. Snodgrass,
"Augmenting Database Design-Support Environments to Capture the
Spatio-Temporal Data Semantics," submitted to Information Systems,
37 pages, also TimeCenter Technical
Report TR-73, 2003. [.pdf]
- Taewon Lee, Bongki Moon and Sukho Lee, "Bulk Insertion for R-tree by
Seeded Clustering," in Proceedings of 14th International
Conference on Database and Expert Systems Applications (DEXA'2003), pages 129-138, September
2003, Prague, Czech Republic.
- Wei Li, Dengfeng Gao, and Richard T. Snodgrass, "Skew Handling Techniques in
Sort-Merge Join," to appear in the Proceedings of the ACM SIGMOD
International Conference on Management of Data, June 2002, 24 pages. [.pdf]
Also TimeCenter
TR-62 [.pdf]
- Quanzhong Li, Ines Fernando Vega Lopez and Bongki Moon, "Skyline Index for
Time Series Data," to appear in IEEE Transactions on Knowledge and
Data Engineering, 2004.
- Quanzhong Li and Bongki Moon, "Partition Based Path Join Algorithms for
XML Data," in Proceedings of 14th International
Conference on Database and Expert Systems Applications (DEXA'2003), pages
160-170, September 2003, Prague, Czech Republic.
- Haitao Liu, "tDOM: A Time-Aware API for Managing Temporal XML Documents,"
TimeCenter Technical Report TR-74, 2003. [.pdf]
- Sudha Ram, Richard T. Snodgrass, Vijay Khatri and Y. Hwang, "DISTIL: A Design Support
Environment for Conceptual Modeling of Spatiotemporal Requirements," in
Proceedings of the International Conference on Conceptual Modeling (ER2001),
Yokohama, Japan, November, 2001, pp. 70-83.
- Praveen R. Rao and Bongki Moon, "PRIX: Indexing and Querying XML Using
Prufer Sequences," in Proceedings of ICDE04, pages 288-299, March
2004, Boston, MA.
- Hyoseop Shin, Bongki Moon and Sukho Lee, "Tie-Breaking Strategies for Fast
Distance Join Processing," Data and Knowledge Engineering,
41(1):67-83, April 2002.
- Hyoseop Shin, Bongki Moon and Sukho Lee, "Partition-Based Similarity Join
Processing in High Dimensional Data Spaces," in Proceedings of DEXA02,
September 2002, Aix-en-Provence, France.
- Hyoseop Shin, Bongki Moon and Sukho Lee, "Adaptive and Incremental
Processing for Distance Join Queries," IEEE Transaction on Knowledge
and Data Engineering 15(6):1561-1578, Nov/Dec, 2003.
- Richard T. Snodgrass, Stanley Shiyong Yao and Christian Collberg,
"Tamper Detection in Audit Logs,"
to appear in Proceedings of the International Conference on Very Large
Databases, 12 pages, 2004.
- Paolo Terenziani and Richard T. Snodgrass, "Reconciling Point-Based and
Interval-Based Semantics in Temporal Databases: A Proper Treatment of the
Telic/Atelic Distinction," IEEE Transactions on
Knowledge and Data Engineering, 16(5):540-551, May 2004. Also
TimeCenter
TR-60 [.pdf].
- Kristian Torp, Christian S. Jensen and Richard T. Snodgrass,
"Modification Semantics in Now-Relative Databases," to appear in
Information Systems, 2004, 34 pages.
- Ines F. Vega Lopez, Richard T. Snodgrass, and Bongki Moon, "Spatiotemporal
Aggregate Computation: A Survey," submitted to IEEE Transactions on
Knowledge and Data Engineering, 40 pages, also TimeCenter TR-77, 2004. [.pdf]
[Top of Page | Department
Home Page]
http://www.cs.arizona.edu/stagg/
Bongki Moon (bkmoon@cs.arizona.edu)
Last updated May 28, 2004