The University of Arizona

Events & News

CS Colloquium

DateThursday, September 29, 2016
Time11:00 am
Concludes12:15 pm
LocationGould-Simpson 906
DetailsPlease join us for coffee and light refreshments at 10:45am, Gould-Simpson, 9th Floor Atrium

Faculty Host: Dr. Joshua Levine
SpeakerMatt Berger
AffiliationAir Force Research Laboratory

Exploring Document Collections through Document Usage

As document collections grow in size and diversity, the visual exploration of documents becomes essential for the understanding of general themes, the comparison of documents, and discovery of documents of interest. Document exploration is traditionally driven by a data representation that is characterized by a document’s contents, i.e. the collection of words in a document. In this talk I will introduce a different perspective on document exploration: rather than visualize documents via what they are, we propose a technique to explore documents via how they are commonly used. Document usage can serve as a rich data representation, for instance in scientific literature, documents tend to be used for the purposes of datasets, evaluation methodologies, surveys, code, or experimental comparisons. We capture usage of a document through how it is cited by others in a document collection, and use this to obtain a joint representation of words and documents in a high-dimensional space where a word’s proximity to a document reflects the document’s citation context. We show how to use this embedding for interactively steering a 2D document projection through user-defined concepts, specified as arbitrary phrases, existing documents, or document:phrase analogies. We use a large corpus of documents from the computer vision domain to showcase the effectiveness of our technique for exploring document usage across a wide variety of concepts.


Matthew Berger is a Computer Scientist with the Air Force Research Laboratory. He obtained his PhD in 2013 from the University of Utah, focusing on various problems in geometry processing. His current research interests are in machine learning and computer vision, considering such topics as subspace learning and tracking from missing data, perceptual similarity learning from relative feedback, as well as image and text representation learning, with applications towards background estimation in video, identification of swarm behaviors, zero shot learning, and RF estimation from images via deeply learned features.