The University of Arizona

Events & News

Colloquium

CategoryLecture
DateTuesday, November 12, 2013
Time11:00 am
Concludes12:15 pm
LocationGould-Simpson 906
DetailsFaculty Host: Richard Snodgrass
SpeakerMichael J. Carey
TitleBren Professor
AffiliationInformation & Computer Sciences, UC Irvine

AsterixDB: Introducing Big Data Management 2.0

AsterixDB is a new, full-function BDMS (Big Data Management System) with a rich feature set that distinguishes it from the other Big Data platforms in today's open source Big Data software ecosystem. This feature set makes it ideally-suited to current needs including web data warehousing, social data storage and analysis, and a variety of other use cases related to "Big Data problems".

AsterixDB has:
* A flexible, semistructured NoSQL style data model (ADM) based on JSON.
* A declarative query language (AQL) for expressing a wide range of queries.
* A parallel runtime engine, Hyracks, that has been scale-tested to 1000's of cores.
* Partitioned, LSM-based data storage and indexing to support efficient data intake.
* Support for externally stored data (e.g., in HDFS) as well as natively stored data.
* A rich set of primitive types, including spatial, temporal, and textual data types.
* B+ tree, R tree, and inverted keyword (exact and fuzzy) secondary indexes.
* Support for fuzzy, spatial, and temporal queries as well as traditional queries.
* A notion of data feeds to support continuous ingestion from external data sources.
* Basic transactional capabilities akin to those of a modern NoSQL store.

This talk will provide a technical overview of AsterixDB, touching on the user-level and architectural features most relevant to modern Big Data use cases. It will also briefly highlight some current efforts to explore the potential benefits of AsterixDB in areas including behavioral science, social data analytics, cell phone event analytics, education, and health care.

Biography

Michael J. Carey is a Bren Professor of Information and Computer Sciences at UC Irvine. Before joining UCI in 2008, Carey worked at BEA Systems for seven years and led the development of BEA's AquaLogic Data Services Platform product for virtual data integration. Carey also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the 2000-2001 Internet bubble. Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests all center around data-intensive computing and scalable data management (a.k.a. Big Data).