The University of Arizona

Events & News

Colloquium

CategoryLecture
DateTuesday, April 2, 2013
Time11:00 am
Concludes12:00 pm
LocationGould-Simpson 906
SpeakerDr. Mihai Surdeanu
TitleAssociate Professor
AffiliationSISTA

Learning from the World

Natural language processing (NLP) applications have benefited immensely from the advent of “big data” and machine learning. For example, IBM’s Watson learned to successfully compete in Jeopardy! by using a question answering model trained on millions of Wikipedia pages and other documents. However, this abundance of textual data does not always come free: a lot of it has low quality (e.g., the text is often ungrammatical) or does not illustrate exactly the problem of interest. In this talk I show that such data is still valuable and can be used to train models for end-to-end NLP applications. I will focus on two specific NLP applications: question answering trained from Yahoo! Answers question-answer pairs, and information extraction trained from Wikipedia infoboxes aligned with web texts. I will show that: (a) low-quality text can be made useful by converting it to semantic representations, and (b) training data that incompletely models the problem of interest can be successfully incorporated through anomaly-aware machine learning models.

Biography

Dr. Mihai Surdeanu is a new Associate Professor with the School of Information: Science, Technology and Arts (SISTA). Previously he was a Senior Research Associate in the Computer Science Department at Stanford University and lead researcher/CTO of Lex Machina, a company that focuses on information extraction and risk analysis in the legal domain. Mihai Surdeanu earned a PhD degree in Computer Science from Southern Methodist University, Dallas, TX, in 2001. Before joining Stanford in 2008, he worked as a research scientist at Language Computer Corp. (where he was later VP of Engineering), Technical University of Catalonia and Yahoo! Research Barcelona. His research interests focus on the extraction of semantic meaning from natural language texts and using it to construct end-to-end NLP applications such as question answering and information extraction.