For an updated list of publications, see my publications page, or check out my google scholar or DBLP profiles. For a broader overview of my research interests and goals, read on!
If you are interested in doing research with me as a graduate student, I’d encourage you to take a look at my information for prospective students page.
Research overview: algorithms for data science
My research focuses on algorithms and computational methods for data analysis, especially data that can be modeled by a graph or network. My overarching goal is to develop methods that are fast, satisfy strong mathematical guarantees, and can be used to extract new insights from datasets coming from biology, sociology, medicine, e-commerce, and many other application domains.
Bridging the theory-practice gap in data analysis
There’s often a large gap between the best theoretical results and the actual practical tools that people use to analyze and extract information from large complex datasets. In my research I aim to help bridge that gap whenever possible. This often involves one of two high-level approaches:
- Take a theoretical tool or idea that comes with strong guarantees, and make it more practical for real world use, without sacrificing those theoretical guarantees. Some examples from my research:
-
- My work on localized graph clustering (ICML 2016, SIAM Data Mining 2019) built on earlier algorithms that came with great theoretical guarantees but were challenging to implement and use in practice. I was able to develop methods that came with similar theoretical guarantees and were also easy to implement. I was able to then use my implementations to tackle several large scale MRI segmentation problems.
- Correlation clustering is a problem that originated in the theoretical computer science literature, but has also used for applications such as image segmentation, cross-lingual link detection, and cancer mutation analysis. The best theoretical algorithms for correlation clustering rely on solving an extremely large optimization problem, that in practice almost no one uses because of its high memory requirements. In my work, I developed new practical optimization methods (SIMODS 2019, SIAM CSC 2020) that scaled up to problems with over a trillion constraints.
-
- Develop a deeper theoretical understanding of a practical technique that is often used in practice, but previously came with no formal theoretical guarantees. A key example from my research:
-
- My PhD thesis focused on a framework for graph clustering called LambdaCC (WWW 2018), which unifies previous clustering algorithms and objective functions that were not previously known to be related. While many other clustering methods come with no formal approximation guarantees, LambdaCC comes with approximation algorithms that are guaranteed to output a solution that is within a bounded factor of the optimal solution. In follow-up work, I developed techniques for learning how to set hyperparameters to best fit an application of interest (WWW 2019), as well as theoretically rigorous techniques for approximating the generalized objective across all parameter regimes (MFCS 2020)
-
Some examples of applications
I aim to develop tools that can be broadly be applied to different types of datasets and application domains. Here is a non-exhaustive list of application domains and problems that have arisen in my work:
- Sociology. E.g., quantifying homophily (the tendency for people to interact and connect with people that are similar to them) in group settings (preprint); understanding how factors such as dorm, major, and graduation year affect social network connections among different students on university campuses (WWW 2019).
- Biology. E.g., finding sets of related genes in a gene-interaction network (WWW 2018)
- E-commerce. E.g., detecting sets of related products in an Amazon product hypergraph (KDD 2020, WWW 2021)
- Medical imaging. E.g., quickly identifying the left atrial cavity in a full body MRI scan (SIAM Data Mining 2019)
Research communities
My research is very interdisciplinary and I enjoy collaborating with other researchers across computer science, mathematics, statistics, and other fields. My work has appeared in ACM conferences such as ICML, WWW (Graph algorithms and social networks track), and KDD, and SIAM venues such as SIAM Data Mining, SIAM CSC, and the new SIAM journal on the Mathematics of Data Science.
I’m especially excited to be a member of the recently formed SIAM Activity Group on Applied and Computational Discrete Algorithms which “brings together researchers who design and study combinatorial and graph algorithms motivated by applications.” The goals and focus of SIAM ACDA closely align with my research and my interests. You can check out their webpage for more details on the activity group and the ACDA conference.
Research videos
I occasionally post videos about my research on my YouTube Research Channel, ranging from short promotional videos, to research talks, to behind-the-scenes videos. For more information you can check out my Videos page.
Here’s a short video that gives an overview of a KDD 2020 paper, that additionally touches on some early work on graph clustering with resolution parameters: