The Department of Linguistics and the Department of Computer Science at The University of Arizona invite you to a colloquium presentation by
University of Pennsylvania
Using Model-Theoretic Semantic Interpretation to Guide
Parsing and Word Recognition in a Spoken Language Interface
|Date:||Friday, February 28|
|Time:||2:00 - 3:15 PM|
The development of speaker-independent mixed-initiative speech interfaces, in which users not only answer questions but also ask questions and give instructions, is currently limited by the performance of language models based largely on word co-occurrences. Even under ideal circumstances, with large application-specific corpora on which to train, conventional language models are not sufficiently predictive to correctly analyze a wide variety of inputs from a wide variety of speakers, such as might be encountered in a general-purpose interface for directing robots, office assistants, or other agents with complex capabilities.
This talk therefore explores the use of statistical models of language conditioned on the meanings or denotations of input utterances in the context of an interface's underlying application environment or world model. This use of model-theoretic interpretation represents an important extension to the `semantic grammars' used in existing spoken language interfaces (which rely on co-occurrences among lexically-determined semantic frames and slot fillers) in that the probability of an analysis is now also conditioned on the presence or absence of denoted entities and relations in the world model.
Since there are a hyper-exponential number of possible parse tree analyses attributable to any string of words, and many possible word strings attributable to any utterance, this use of model-theoretic interpretation must involve some kind of sharing of partial results between competing analyses if interpretation is to be performed on large numbers of possible analyses in a practical interactive application. This talk presents a formal result that model-theoretic semantic interpretation can be factored (cut into well-behaved partial results) and memoized (shared between possible analyses) in polynomial time, in much the same way that simple syntactic structure is factored into context-free rules and shared in standard dynamic programming parsing algorithms. This polynomial bound holds even for analyses containing generalized quantifiers, which are traditionally analyzed to have second-order (exponential) denotations. The talk will also present the practical result that this approach does indeed yield a statistically significant improvement in accuracy in analyzing a corpus of spoken directions to 3-D animated agents.
ABOUT THE SPEAKER: William Schuler received his BA in English Literature in 1992 and BS in Computer Science in 1995, both from the University of Michigan. He received his MS in Computer and Information Science from the University of Pennsylvania in 1997, and is completing his PhD there this semester, under the supervision of Aravind Joshi and Martha Palmer. His research so far is directed toward the goal of developing an interface through which untrained users can express any behavior desired of an artificial agent using ordinary spoken directions.