Joint Colloquium

The Department of Linguistics and the Department of Computer Science at The University of Arizona invite you to a colloquium presentation by
Speaker: Eric Fosler-Lussier
Columbia University
Topic: "What did you say?": Ambiguity and Errors in Spoken Language Systems
Date: Friday, March 7
Time: 2:00 - 3:15 PM
Place: Douglass 101
Reception follows in the Linguistics Department Lounge, Douglass 216


The science of automatic speech recognition (ASR) has made significant advances in the last decade; for example, we have seen the deployment of automated telephony-based systems to make train reservations, find flight schedules, and transfer calls via an automated operator. However, full interactivity has not yet been achieved: the language that can be used with these systems is often restricted to simple phrases, the systems work better for some users than others (due to accents, voice quality, gender, etc.), and the systems have difficulty with varying noise conditions (e.g., in the car on a cell phone). As we seek to develop systems that work without restrictions of this kind, the variability of the input increases, we must cope with a larger number of errors made by the system.

Some of these errors can be traced to linguistic phenomena: for example, fast talkers, speaking spontaneously, will often pronounce the phrases "can't elope", "can elope," and "cantaloupe" the same. Models of this type of phonetic behavior (pronunciation models) have shown great potential in diagnostic experiments but have led to only modest improvements in ASR performance. My colleagues and I have hypothesized that this is because the more entries a pronunciation model contains, the likelier it is that two words can be confused by the system. Thus, we are searching for a lexical confusability metric that will tell us when it is appropriate to add a new pronunciation to the model. An important first step in this process is to predict when the ASR system is likely to make an error due to pronunciation ambiguity. We have developed a framework that models the speech recognition process as a set of cascading weighted finite state transducers; by composing the set of models, we can obtain an find words that are likely to be confusable according to the recognizer's internal models.

Even if the system recognizes all of the words correctly, the semantics of the user input may still be ambiguous (e.g., "Leave from New York" in a flight application might mean one of three airports). The second half of the talk describes efforts to compensate for errors and language ambiguity within the natural language understanding component of a spoken dialogue system. We have developed a method of tracking information given by a user throughout a dialogue; by adding in contextual knowledge we can compute a set of system beliefs and detect inconsistencies caused by ambiguities and errors. When such a conflict is detected, it can usually be clarified through further dialogue with the user. Examples from a deployed travel reservation system will illustrate the utility of this approach.

This talk describes joint work with Egbert Ammicht, Alex Potamianos, Jeff Kuo, and Ingunn Amdal.

ABOUT THE SPEAKER Eric Fosler-Lussier received his B.A. in Linguistics and B.A.S. in Cognitive Science in 1993 from the University of Pennsylvania, and his Ph.D. in Computer Science from the University of California, Berkeley in 1999 under the supervision of Nelson Morgan. He was a graduate student researcher in the International Computer Science Institute in Berkeley, and a postdoctoral researcher there in 1999- 2000. He was a member of the Dialogue Systems Research Department at Bell Labs, Lucent Technologies in 2000-2002, and is currently a Visiting Scientist in the Department of Electrical Engineering, Columbia University. He is currently developing new algorithms for automatic speech recognition that incorporate multiple sources of information within the recognition process.