From rts Wed Jul 14 14:26:07 1993 Date: Wed, 14 Jul 1993 14:26:02 MST From: "Rick Snodgrass" To: ahn@cbnmva.att.com, ariavg@cgsvax.claremont.edu, cleung@vnet.ibm.com, csj@iesd.auc.dk, elmasri@cse.uta.edu, fabio@deis64.cineca.it, jcliffor@is-4.stern.nyu.edu, kaefer@informatik.uni-kl.de, leomark@cc.gatech.edu, rts@cs.arizona.edu Subject: step 1 Content-Length: 9175 X-Lines: 215 Status: RO In the following, I list my personal views on the aspects residing in the various classes, as my contribution to the first task of the TSQL2 design effort. Required * Temporal support be optional It is important to be an extension of SQL2's data model when possible, not a replacement. Hence, the schema definition language should allow the definition of snapshot relations, when temporal support is not desired. Similarly, it should be possible to derive a snapshot relation from a temporal relation. * Valid time support, including past and future While some models only include valid time support up to now, it is important to provide support for future valid time so that planning activities can be accommodated. * Language data model allows multiple representations This was agreed upon at the workshop. In particular, it would be best if the data model accommodated the major temporal data models proposed to date, including attribute timestamped models. * Implementable with event- or interval-timestamped tuples This is the most straightforward representational model, in terms of extending current relational technology. In my opinion, no TSQL2 proposal will be accepted if it requires attribute timestamping. (From the last aspect, the model should nevertheless *accept* implementation using an attribute timestamped representational model.) * Extended range and precision of timestamp values SQL2 is limited to A.D., to 9999 years, and to an excessive coarse precision of seconds for a representation of 20 bytes. It is also not sufficiently defined (addition is implementation defined!, and it is not stated which of seven possible definitions of second is used.) For temporal databases to be used in scientific applications, as well as by historians and others requiring an extended range, the representation and semantics must be extended and be better defined. I advocate the timestamp representation and semantics used in MultiCal, which can represent all of time (+- 18 Byears) to the granularity of a second, and all of recorded time to the granularity of a microsecond, in 8 bytes. * Shashi Gadia's restructuring As a simple example, consider a relation with Employee Name and Dept. Shashi, in TempSQL, allows one to restructure this relation so that it is grouped on Employee (gives the department history for each employee, i.e., which department the employee was with, and when), or grouped on Dept (gives the employee history for each dept, i.e., which employee(s) were associated with the department, and when). I view this as a very powerful capability to support in the query language. Required, cont. * Extension, not replacement, of snapshot algebra Current DBMS implementations are based on the snapshot algebra. The temporal algebra used with the TSQL2 temporal data model should contain temporal operators that are extensions of the operations in the snapshot algebra. Snapshot reducibility is also highly desired, so that, for example, optimization strategies will continue to work in the new data model. * User-defined time support, including events, intervals, and fixed and variable spans User-defined time support in SQL2 is greatly flawed. C.J. Date has listed many of the problems with it. The MultiCal proposal cleans up these problems. I advocate adopting this proposal for user-defined time. Note that the current SQL2 support for user-defined time can be simulated in a DBMS, using a preprocessor and some small, additional code in the runtime system, providing a migration path for legacy applications. I know of no other proposals for user-defined time. * Use of chronons in the representation The current SQL2 timestamp representation is a fixed decimal representation. This is equivalent to using chronons of a fixed fraction of a second. For instance, TIME(3) in SQL2 is equivalent to a chronon size of 1 millisecond. Using chronons simply tightens up the semantics of timestamps, while not restricting the representation. * Fixed system granularity There should be an assumed system granularity (minimum representable interval), but this granularity should be implementation-dependent. This is consistent with SQL2. * Existing aggregates have temporal analogues It is important that existing language features such as aggregates still apply in the temporal data model. * Multiple calendar support, in input, output, timestamp operations Internationalization is a significant concern. However, SQL2 requires the Gregorian calendar (even in its representation). If the MultiCal approach is adopted, the number of keywords is reduced significantly, while increasing the functionality. * Multiple language support in timestamp input and output MultiCal also supports this. I know of no other proposal for multiple language support. Optional * Transaction time support, perhaps with simple language support One complicating issue is that transaction time support requires quite dramatic changes to the architecture (see Stonebraker's discussion of this in his TKDE article on Postgres). Hence, this might conflict with the constraint of being implementable with current DB technology (for example, Postgres required significant changes to concurrency control and logging). Note that the support of an insertion time attribute is quite minimal when compared with other proposals for transaction time. * Ungrouped Complete While this is certainly desirable, it is not clear whether it is consistent with other criteria. An argument needs to be made that this feature can be supported without major surgery to the DBMS implementation. * Fabio's history variables Fabio has given several examples where history variables (which range over histories, associated with tuple variables, if I have the terminology correct...) are quite nice syntactically. If these can be supported without major surgery (my intuition is that they can), then it would be nice to include them. * Calendar-dependent timestamp functions An example of this is addition of an event and a span. Say the span was "one month." One calendar might consider a month to be exactly 30 days, while another calendar might consider a month to be a variable number of days, depending on what event it was being added to. I advocate using MultiCal's approach to calendar-dependent timestamp functions, including variable spans. * Temporal aggregates New aggregates, such as first, that are intrinsically temporal would be highly useful. Omitted * Grouped Complete Grouped complete has some highly desirable properties. However, its treatment of "histories" as first class objects can be difficult to implement. Also, it can limit functionality. Given an employee relation, with name and dept as attributes, we have a name history and a dept history, per employee. If we join with a mgr relation, with manager and dept histories, the result gives the history of individual employees associated with individual departments (most such histories would include only one interval.) It doesn't seem possible to ask about the history of managers for a particular employee. Given the newness of the concept, and the fact that there is little implementation experience, I recommend that this feature be considered for inclusion in TSQL3's data model, but not TSQL2. * Support for valid-time indeterminacy This is also clearly desirable, but given the newness of the area, I believe that further research is required before valid-time indeterminacy can be added to SQL. So, for now, I advocate leaving this to the TSQL3 design. * Requires object IDs in the implementation Requiring object IDs is tantamount to adding object-orientedness to the data model. While many consider this to be advantageous (which is why they are being supported in SQL3), they distract the TSQL2 design from *temporal* support. Ideally, temporal and OO support should be orthogonal. So I advocate leaving this to the TSQL3 design. Undesirable * Requires nested relations at the implementation This is counter to the charter of the TSQL2 design effort. If nested relations are *required* (rather than *permitted* as a representational data model), then in my view the database community will reject the TSQL2 design as an impractical academic exercise. * Requires support of recursion Ditto. * Discussion of indexing, query optimization While it should be possible to exploit much of the experience and results of temporal indexing and query optimization, the focus of the working group should be on language design. We should not pick a particular indexing strategy or query optimization approach. * Separate "when" clause While this clause was introduced in TQuel and other proposals, I now feel that it is awkward to separate the nontemporal and temporal predicates, especially in the presence of predicates on user-defined time.