# Events & News

# CS Colloquium

Category | Lecture |

Date | Tuesday, February 3, 2015 |

Time | 11:00 am |

Concludes | 12:15 pm |

Location | Gould-Simpson 906 |

Details | Please join us for coffee and light refreshments at 11am in Gould-Simpson 906. Faulty Host: Carlos Scheidegger |

Speaker | Joseph J. Pfeiffer, III |

Title | M.S. |

Affiliation | Purdue University |

## Learning and Sampling from Scalable Generative Graph Models

Networks provide an effective representation to model many real-world domains, with edges (e.g., friendships, citations, hyperlinks) representing relationships between items (e.g., individuals, papers, webpages). By understanding common network features, we can develop models of the distribution from which the network was likely sampled. These models can be incorporated into real world tasks, such as modeling partially observed networks for improving relational machine learning, performing hypothesis tests for anomaly detection, or simulating algorithms on large scale (or future) datasets. However, naively sampling networks does not scale to real-world domains; for example, drawing a single random network sample consisting of a billion users would take approximately a decade with modern hardware. In this talk, we discuss scalable generative network models that utilize the sparsity found in real world domains to scalably sample networks. In particular, real world social networks have a subquadratic number of observed edges in the number of vertices, rather than quadratic, meaning their properties can vary significantly from dense networks. We begin by defining a key approximating framework to facilitate modeling of complex network properties utilizing the sparsity of real world networks. Expanding on this insight, we extend the framework to incorporate two key network properties that were previously difficult to capture. The first, transitivity, models the tendency of triangles to form in social networks despite the sparsity of the network. The second models the dependencies of the edge formations on the corresponding vertex attributes, capturing the correlation of features between linked vertices. Further, we demonstrate how this model can extend to other higher order graph statistics, such as modeling the joint degree distribution. Each of these methods is provably scalable, and we empirically demonstrate their scale and accuracy on networks with millions of items and edges.

## Biography

Joseph Pfeiffer is a doctoral candidate advised by Professor Jennifer Neville in the Department of Computer Science at Purdue University. His work encompasses both network science and relational machine learning, with his recently defended thesis focusing on scalable generative graph models, relational semi-supervised learning, and relational learning in partially observed domains. Joseph is the recipient of both the Frederick N. Andrews and Bilsland Dissertation Fellowships from Purdue, and his recent work has appeared at venues including AAAI, WWW, ICDM and CIKM.