ACM Computing Surveys 28(4es), December 1996, http://www.acm.org/pubs/citations/journals/surveys/1996-28-4es/a89-snodgrass/. Copyright © 1996 by the Association for Computing Machinery, Inc. See the permissions statement below. This article derives from a position statement prepared for the Workshop on Strategic Directions in Computing Research.


The Inefficiency of Misalignment


Richard T. Snodgrass

University of Arizona, Computer Science Department
P.O. Box 210077, Tucson, AZ 85721-0077 USA
rts@cs.arizona.edu, http://www.cs.arizona.edu/people/rts

Abstract: This paper examines the influence of users, academia, and vendors on the SQL standard, focusing on triggers, object-oriented features, Open Data Base Connectivity (ODBC) and time support. In all cases, users and academia have had little impact on the standard. This misalignment has generated constructs that meet needs (technical and otherwise) of the participating vendors; the needs and insights of users and the academic community have generally not informed the resulting standard. Three basic structural impediments are identified that prevent these two communities from participating in standards design. Radical changes to the process are required to balance input to the standards from vendors, users, and academia. Everyone will benefit from a more balanced process.

Categories and Subject Descriptors: H.2.3 [Database Management]: Languages - query languages

General Terms: Languages

Additional Key Words and Phrases: Object-oriented database, ODBC, SQL, standards, temporal database, triggers.


Table of Contents


1 Introduction

In this era of downsizing companies and shrinking research budgets, the relevance of the research enterprise has become a frequent topic. Conference panels regularly ask, is research polishing a round ball? Should research lead, follow, or get out of the way? To the extent that computer science is an engineering discipline, the ultimate determinant of success is the increase in effectiveness of the users of computer science technology. To ground this discussion, I'll consider the database field, though similar analyses and conclusions apply to other computer science areas, including programming languages, software engineering, communication protocols, hardware design, and computer graphics.

A quick look over the past 25 years suggests that research has had a significant impact. After all, relational databases, purely a research topic in 1971, is the solution of choice for administrative data processing, and indeed is a primary basis for the information age in which we happily find ourselves.

A closer look reveals a frustrating lack of involvement in this revolution by academic research. To determine why research hasn't had more impact, and indeed whether it should or could influence deployment of database technology, we consider the SQL standard, which in many ways presents a fairly accurate microcosm of the interaction of the major players: users of DB technology, DBMS vendors, and academia. We argue that structural problems prevent academia and users from influencing standards, thereby alienating those two communities and reducing the usability of commercial DBMS products.

We should emphasize that it is the structure that is at fault here, not the participants. All parties have been optimizing their own objectives. Vendors have generally attempted to increase market share, academics have published papers, and users have been focused on getting their applications working. The unfortunate result is that vendors have to implement a standard query language that is forced on them by other vendors, that doesn't adequately meet user needs, and that doesn't exploit insights that can be found in the research literature.

2 SQL

SQL has achieved virtual dominance in the database world; it is the lingua franca of interaction with a DBMS. Many applications and rapid application development (RAD) environments use embedded SQL, fourth-generation languages are translated into SQL, and sophisticated graphical user interfaces (GUIs) interface with the DBMS via SQL. There have been three major releases of the SQL standard since it emerged from IBM: SQL-86, SQL-89, and SQL-92. The next iteration, SQL3, is rapidly moving towards adoption by the American (ANSI) and international (ISO) standardization bodies.

Since SQL has such an impact on the information technology user, the central question becomes, who decides what is in the standard? We use a simple iconic figure to illustrate both that and the related question of who influences whom (see Figure 1). We will track the influence of each of the players on each other, as well as on the standard, via arcs in the figure. The labels are prominent examples, but are not exhaustive. The thickness of each arc indicates degree of influence, with the incoming influence normalized. We call this representation an Influence Flow Diagram, or IFD.

Influence Flow

Figure 1. The Influence Flow Diagram

To probe further, we examine four areas of the SQL standard that have been initiated over the last decade: triggers, object-oriented features, Open Data Base Connectivity (ODBC), and time support. We provide an IFD for each, then compute a composite IFD to identify general trends.

Concerning triggers, industry has always been out front, with academia only recently attempting to catch up. Triggers were initially introduced by Interbase and Sybase in an attempt to differentiate those products in the marketplace. This strategy was quite effective for Sybase, helping it to become one of the major DBMS vendors. Within a few years, users included triggers on their must-have checklists, forcing all vendors to add this capability, with syntax similar to that supported by Sybase. (This phenomenon has occurred repeatedly, most recently with DBMS-WWW interfaces.) Triggers are now part of the SQL3 draft standard, as a poorly motivated combination of constructs from the various products. Academia has had virtually no input at any stage, and is left bemoaning the lack of a clean underlying semantics. The user community has also had little influence and harbors similar concerns about the scalability of the proposed constructs (see Figure 2a).

Influence Flow for Triggers and OO Features

Figure 2. Influence Flow Diagrams for Triggers (a) and Object-Oriented Features (b)

Objects originated and blossomed in academia, then transitioned to perhaps a dozen startup object-oriented database (OODB) vendors when established relational vendors ignored this technology. While OODBs have leveled off at under 10% market share, users continue to ask for OO features in SQL. The OODBMS vendors, realizing that the disparate query-language variants were contributing to their demise, banded together and produced a de facto standard based on SQL called ODMG. The relational vendors, which control the SQL standards committees, developed an incompatible OO extension as part of SQL3 (see Figure 2b).

Open Data Base Connectivity (ODBC) is a call-level interface (CLI) developed by Microsoft and based on an earlier specification developed by two industry groups, X/Open and the SQL Access Group. SQL-92 already had CLIs for several programming languages; ODBC is an incompatible CLI for the C programming language. Microsoft has never shown much interest in SQL standardization, but joined the committee for the sole purpose of handing over the ODBC specification for incorporation into SQL3; Microsoft's membership on the committee has since lapsed. The miraculous speed at which this inconsistent portion was added to the standard can be attributed to Microsoft's dominant position in the market. Simply put, Microsoft did to the other relational vendors what those vendors did to the OO vendors, and what IBM did to the other relational vendors in the SQL-86 standard: use market share to dictate the standard. Users and academia were largely ignored in all three of these cases (see Figure 3a).

Influence Flow for ODBC and Temporal Features

Figure 3. Influence Flow Diagrams for ODBC and Temporal Features

As a final case study, consider time support. SQL-89 had none, though most vendors adopted IBM's DB2 language constructs for DATE, TIME, TIMESTAMP, and INTERVAL column types. Even earlier, in 1988, the user and academic communities were aware of the significant design flaws in the definition of those types. Nevertheless, a slightly cleaned-up version of DB2's constructs was incorporated into SQL-92, retaining most of the known flaws.

The academic community vowed not to let this fate befall valid-time support, and so in an unprecedented initiative, 18 of the most active temporal database researchers developed in 1994 a comprehensive second-generation temporal extension of SQL-92 called TSQL2, which quickly achieved acceptance in that community. IBM representatives on the ISO committee in 1996 instead pushed through a temporal extension that had been previously rejected by this consensus effort. While efforts are ongoing to also incorporate the constructs favored by users and researchers, the story remains that of one or two vendors dictating the standard (see Figure 3b).

3 Imbalances: The Problem

As a rough approximation of the general flow of influence between the players, we average the four case studies to get Figure 4.

Composite Flow

Figure 4. Composite Flow of Influence Based on the Case Studies

The interactions among users, vendors, and academia are about where they should be, given the peculiarities of each of these players. The problem lies in the interaction with the standard. Users and academia have little influence on the standard. No significant portion of the standard has been written by representatives from either of these two communities. Inefficiency originates from two basic causes, out of the nine interactions illustrated: the negligible influence by users and academia on standards. The language constructs in the standard are dictated largely by what vendors want to implement, and non-technical considerations often come into play in design decisions. Academia, cut off from the discussion, turns instead to more theoretical, `clean' problems. Users, who have little choice but to employ the resulting standard, spend huge sums on consulting and training, to the delight of vendors. And the vendor representatives on the committee cannot exploit coalitions with users and academic representatives; rather, they are limited to negotiating solely with their competitors, who listen dubiously to technical arguments. No one benefits from this misalignment.

4 The Causes

There are three basic hurdles that prevent users and the academic community from having any influence on the SQL standard. The first is an obvious one: lack of resources. Membership in a national standards committee is expensive. Attendance is required at the six national meetings each year and, if one wishes to influence the standard, at the two international meetings per year, the latter held all over the world, for up to two weeks per meeting. All travel costs are born by the member. Longevity of membership is essential for influence. Attaining the expertise to write the arcane and highly technical change proposals requires several years; after that major initiatives take years to play out. In addition, literally hundreds of change proposals must be digested each year. (The April, 1996 ANSI meeting alone considered 119 documents!) From one-third to all of the member's professional time must be devoted to standards activities.

A second problem is that the reward structures for academics and users alike are not conducive to standards work. Users who are off at standards meetings are not doing the ``real work'' for which their employer derives utility. Similarly, few professors would choose to participate in any activity for which there are no institutional or professional rewards. Also, their teaching schedule precludes regular attendance at standards meetings. It is to the vendors' credit that they have largely financed the many years of highly skilled work that has produced the SQL standards (this investment is conservatively estimated at US $30M). Anyone who attends a standards meeting is immediately impressed with the talent focused on this activity and the immense dedication displayed by the participants.

Finally, there is the hurdle of representation. Only a few (large) DBMS vendors can afford dedicating one or more full-time staff positions to standards activities, and so only a few vendors are represented on standards committee. Each such member effects the wishes of an entire vendor, and thus a sizable portion of the vendor community. Achieving such representation of the user and academic communities is much more difficult. A conscientious user representative should invest considerable effort polling users on what they want as well as educating them on what is going on with the standards efforts. Academic representatives have an even more difficult time, because the research milieu encourages, even requires, each individual research to adopt a distinct approach, making consensus building in this community a rare event.

5 A Solution

The hurdles are maintained and reinforced by the vendors. Procedures maintain the status quo. Radical changes to the process are required to balance the input to the standard from vendors, users, and academia. While vendors should continue to have significant input, users and academia must also be allowed to influence evolving standards.

The Association for Computing Machinery is the obvious change agent, due to its size, its international presence, and its ability to represent users, vendors, and academia alike. The IEEE and standards bodies within the government could aid in this transition. Finally, academia must also take some responsibility for changing the status quo.

If standards are to benefit from the contributions of users and academia, ACM must undertake major initiatives to wrest control of the standardization process from vendors and eliminate the hurdles now preventing participation by expert, affected parties. The current laissez-faire attitude has resulted in an inefficient, distorted approach to designing standards.

Everyone will benefit from a more balanced arrangement. Users will have a more friendly database language, whose design emphasizes to a greater extent comprehensibility over ease of implementation. Academia will contribute its expertise and, in sharing ownership of the standard, will orient more research towards relevant problems. Vendors will implement more effective languages, and can address more directly the difficult technical questions, with less concern for untoward pressure from competitors.

Similar gains are possible across computer science, with the potential for realizing significant, qualitative increases in the efficient development of computer technology, and ultimately in the usability of that technology. By addressing the underlying problems, ACM can better leverage the abilities of all of its members, thus making contributions possible from a much wider participant base.


Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.