From rts Wed Jul 14 14:26:07 1993
Date: Wed, 14 Jul 1993 14:26:02 MST
From: "Rick Snodgrass" <rts>
To: ahn@cbnmva.att.com, ariavg@cgsvax.claremont.edu, cleung@vnet.ibm.com,
        csj@iesd.auc.dk, elmasri@cse.uta.edu, fabio@deis64.cineca.it,
        jcliffor@is-4.stern.nyu.edu, kaefer@informatik.uni-kl.de,
        leomark@cc.gatech.edu, rts@cs.arizona.edu
Subject: step 1
Content-Length: 9175
X-Lines: 215
Status: RO

In the following, I list my personal views on the aspects residing in
the various classes, as my contribution to the first task of the TSQL2
design effort.

Required
	* Temporal support be optional
		It is important to be an extension of SQL2's data
		model when possible, not a replacement. Hence, the
		schema definition language should allow the definition
		of snapshot relations, when temporal support is not
		desired. Similarly, it should be possible to derive a
		snapshot relation from a temporal relation.

	* Valid time support, including past and future
		While some models only include valid time support up
		to now, it is important to provide support for future
		valid time so that planning activities can be
		accommodated.

 	* Language data model allows multiple representations
		This was agreed upon at the workshop. In particular,
		it would be best if the data model accommodated the
		major temporal data models proposed to date, including
		attribute timestamped models.

	* Implementable with event- or interval-timestamped tuples
		This is the most straightforward representational
		model, in terms of extending current relational
		technology. In my opinion, no TSQL2 proposal will be
		accepted if it requires attribute timestamping. (From
		the last aspect, the model should nevertheless
		*accept* implementation using an attribute timestamped
		representational model.)		

	* Extended range and precision of timestamp values
		SQL2 is limited to A.D., to 9999 years, and to an
		excessive coarse precision of seconds for a
		representation of 20 bytes. It is also not
		sufficiently defined (addition is implementation
		defined!, and it is not stated which of seven possible
		definitions of second is used.) For temporal databases
		to be used in scientific applications, as well as by
		historians and others requiring an extended range, the
		representation and semantics must be extended and
		be better defined. I advocate the timestamp
		representation and semantics used in MultiCal, which
		can represent all of time (+- 18 Byears) to the
		granularity of a second, and all of recorded time to
		the granularity of a microsecond, in 8 bytes.

	* Shashi Gadia's restructuring
		As a simple example, consider a relation with
		Employee Name and Dept. Shashi, in TempSQL, allows one
		to restructure this relation so that it is grouped on
		Employee (gives the department history for each
		employee, i.e., which department the employee was
		with, and when), or grouped on Dept (gives the
		employee history for each dept, i.e., which
		employee(s) were associated with the department, and
		when). I view this as a very powerful capability to
		support in the query language.
Required, cont.

	* Extension, not replacement, of snapshot algebra
		Current DBMS implementations are based on the snapshot
		algebra. The temporal algebra used with the TSQL2
		temporal data model should contain temporal operators
		that are extensions of the operations in the snapshot
		algebra. Snapshot reducibility is also highly desired,
		so that, for example, optimization strategies will
		continue to work in the new data model. 

	* User-defined time support, including events, intervals, and
			fixed and variable spans
		User-defined time support in SQL2 is greatly flawed.
		C.J. Date has listed many of the problems with it. The
		MultiCal proposal cleans up these problems. I advocate
		adopting this proposal for user-defined time. Note
		that the current SQL2 support for user-defined time
		can be simulated in a DBMS, using a preprocessor and
		some small, additional code in the runtime system,
		providing a migration path for legacy applications. I
		know of no other proposals for user-defined time.

	* Use of chronons in the representation
		The current SQL2 timestamp representation is a fixed
		decimal representation. This is equivalent to using
		chronons of a fixed fraction of a second. For
		instance, TIME(3) in SQL2 is equivalent to a chronon
		size of 1 millisecond.  Using chronons simply tightens
		up the semantics of timestamps, while not restricting
		the representation.

	* Fixed system granularity
		There should be an assumed system granularity (minimum
		representable interval), but this granularity should
		be implementation-dependent. This is consistent with
		SQL2.

	* Existing aggregates have temporal analogues
		It is important that existing language features such
		as aggregates still apply in the temporal data model.

	* Multiple calendar support, in input, output, timestamp
			operations
		Internationalization is a significant concern.
		However, SQL2 requires the Gregorian calendar (even in
		its representation). If the MultiCal approach is
		adopted, the number of keywords is reduced
		significantly, while increasing the functionality.

	* Multiple language support in timestamp input and output
		MultiCal also supports this. I know of no other
		proposal for multiple language support.

Optional
	* Transaction time support, perhaps with simple language
			support
		One complicating issue is that transaction time
		support	requires quite dramatic changes to the
		architecture (see Stonebraker's discussion of this in
		his TKDE article on Postgres). Hence, this might
		conflict with the constraint of being implementable
		with current DB technology (for example, Postgres
		required significant changes to concurrency control
		and logging). Note that the support of an insertion
		time attribute is quite minimal when compared with
		other proposals for transaction time.	

	* Ungrouped Complete
		While this is certainly desirable, it is not clear
		whether it is consistent with other criteria. An
		argument needs to be made that this feature can be
		supported without major surgery to the DBMS
		implementation.

	* Fabio's history variables
		Fabio has given several examples where history
		variables (which range over histories, associated with
		tuple variables, if I have the terminology correct...)
		are quite nice syntactically. If these can be
		supported without major surgery (my intuition is that
		they can), then it would be nice to include them.

	* Calendar-dependent timestamp functions
		An example of this is addition of an event and a span.
		Say the span was "one month." One calendar might
		consider a month to be exactly 30 days, while another
		calendar might consider a month to be a variable
		number of days, depending on what event it was being
		added to. I advocate using MultiCal's approach to
		calendar-dependent timestamp functions, including
		variable spans.
	
	* Temporal aggregates
		New aggregates, such as first, that are intrinsically
		temporal would be highly useful.

Omitted
	* Grouped Complete
		Grouped complete has some highly desirable properties.
		However, its treatment of "histories" as first class
		objects can be difficult to implement. Also, it can
		limit functionality. Given an employee relation, with
		name and dept as attributes, we have a name history and a
		dept history, per employee. If we join with a mgr relation,
		with manager and dept histories, the result gives the
		history of individual employees associated with
		individual departments (most such histories would
		include only one interval.) It doesn't seem possible
		to ask about the history of managers for a particular
		employee. Given the newness of the concept, and the
		fact that there is little implementation experience, I
		recommend that this feature be considered for
		inclusion in TSQL3's data model, but not TSQL2. 

	* Support for valid-time indeterminacy
		This is also clearly desirable, but given the newness
		of the area, I believe that further research is
		required before valid-time indeterminacy can be added
		to SQL. So, for now, I advocate leaving this to the
		TSQL3 design.

	* Requires object IDs in the implementation
		Requiring object IDs is tantamount to adding
		object-orientedness to the data model. While many
		consider this to be advantageous (which is why they
		are being supported in SQL3), they distract the TSQL2
		design from *temporal* support. Ideally, temporal and
		OO support should be orthogonal. So I advocate leaving
		this to the TSQL3 design.

Undesirable
	* Requires nested relations at the implementation
		This is counter to the charter of the TSQL2 design
		effort. If nested relations are *required* (rather
		than *permitted* as a representational data model),
		then in my view the database community will reject the
		TSQL2 design as an impractical academic exercise.

	* Requires support of recursion
		Ditto.

	* Discussion of indexing, query optimization
		While it should be possible to exploit much of the
		experience and results of temporal indexing and query
		optimization, the focus of the working group should be
		on language design. We should not pick a particular
		indexing strategy or query optimization approach.

	* Separate "when" clause
		While this clause was introduced in TQuel and other
		proposals, I now feel that it is awkward to separate
		the nontemporal and temporal predicates, especially in
		the presence of predicates on user-defined time.