Discrete Structures in Machine Learning 2017

Traditionally, machine learning has been focused on methods where objects reside in continuous domains. The goal of this workshop is to advance state-of-the-art methods in machine learning that involve discrete structures.

Models with ultimately discrete solutions play an important role in machine learning. At its core, statistical machine learning is concerned with making inferences from data, and when the underlying variables of the data are discrete, both the tasks of model inference as well as predictions using the inferred model are inherently discrete algorithmic problems. Many of these problems are notoriously hard, and even those that are theoretically tractable become intractable in practice with abundant and steadily increasing amounts of data. As a result, standard theoretical models and off-the-shelf algorithms become either impractical or intractable (and in some cases both).

While many problems are hard in the worst case, the problems of practical interest are often much more well-behaved, and have the potential to be modeled in ways that make them tractable. Indeed, many discrete problems in machine learning can possess beneficial structure; such structure has been an important ingredient in many successful (approximate) solution strategies. Examples include submodularity, marginal polytopes, symmetries and exchangeability.

Machine learning, algorithms, discrete mathematics and combinatorics as well as applications in computer vision, speech, NLP, biology and network analysis are all active areas of research, each with an increasingly large body of foundational knowledge. The workshop aims to ask questions that enable communication across these fields. In particular, this year we aim to address the investigation of combinatorial structures allows to capture complex, high-order dependencies in discrete learning problems prevalent in deep learning, social networks, etc. An emphasis will be given on uncertainty and structure that results from problem instances being estimated from data.

Schedule

09.00 - 09.15	Welcome
09.15 - 10.00	Andrea Montanari An Instability in Variational Inference for Topic Models
10.00 - 10.30	Spotlights I Aidan Gomez, Sicong Huang, Ivan Zhang, Bryan Li, Muhammad Osama and Lukasz Kaiser. Unsupervised Cipher Cracking Using Discrete GANs Aryan Mokhtari, Hamed Hassani and Amin Karbasi. Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap Sima Behpour, Wei Xing and Brian Ziebart. Learning Robust Cuts for Semi-Supervised and Multi-label Classification Mohammadhossein Bateni, Hossein Esfandiari and Vahab Mirrokni. Optimal algorithms for coverage problems on massive data: Extended abstract Stephan Clémençon and Anna Korba. Ranking Median Regression: Learning to Order through Local Consensus Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud. Backpropagation through the void: optimizing control variates for black-box gradient estimation.
10.30 - 11.00	Coffee Break and Posters
11.00 - 11.45	David Tse Medoids in almost linear time via multi-armed bandits
11.45 - 12.00	Spotlights II Tongyi Cao and Akshay Krishnamurthy. Disagreement-based Combinatorial Pure Exploration Alessandro Epasto, Vahab Mirrokni and Morteza Zadimoghaddam. Scalable diversity maximization via small-size composable core-sets Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh and Yining Wang. Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach Alexander Lenail, Ludwig Schmidt, Johnathan Li, Tobias Ehrenberger, Karen Sachs, Stefanie Jegelka and Ernest Fraenkel. Graph-Sparse Logistic Regression Lisa Hellerstein and Devorah Kletenik. Revisiting the Approximation Bound for Stochastic Submodular Cover
12.00 - 13.30	Lunch
13.30 - 14.15	Robert D. Kleinberg Recharging Bandits: Learning to Schedule Recurring Interventions
14.15 - 14.30	contributed talk: Erik Lindgren, Murat Kocaoglu, Alexandros G. Dimakis and Sriram Vishwanath. Submodularity and Minimum Cost Intervention Design for Learning Causal Graphs
14.30 - 14.45	contributed talk: Zelda Mariet and Suvrit Sra. Elementary Symmetric Polynomials for Optimal Experimental Design
14.45 - 15.00	Spotlights III Ethan R. Elenberg, Alex Dimakis, Moran Feldman and Amin Karbasi Streaming Weak Submodularity: Interpreting Neural Networks on the Fly Holakou Rahmanian, David Helmbold and S V N Vishwanathan. Online Learning of Permutations Using Extended Formulation Ting Chen, Martin Renqiang Min and Yizhou Sun. Learning K-way D-dimensional Discrete Code For Compact Embedding Representations Sepehr Assadi, Eric Balkanski and Renato Paes Leme. Online Ranking with Minimal Inversions Pratiksha Thaker, Marc Brockschmidt, Alexander Gaunt and Daniel Tarlow. Semantics-aware program sampling
15.00 - 16.00	Coffee Break and Posters
16.00 - 16.45	Nina Balcan Foundations of Data Driven Combinatorial Algorithm Selection
16.45 - 17.00	contributed talk: So Nakashima and Takanori Maehara. Optimization from Non-Uniform Samples
17.00 - 17.15	contributed talk: Nicholas Monath, Ari Kobren, Akshay Krishnamurthy and Andrew McCallum. Gradient-based Hierarchical Clustering
17.15 - 18.30	Posters

Invited Talks

Andrea Montanari (Stanford)

An Instability in Variational Inference for Topic Models

Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a convex combination of a small number of vectors corresponding to ‘topics’ or ‘components’. The weights are assumed to have a Dirichlet prior distribution. The standard approach towards approximating the posterior is to use variational inference algorithms, and in particular naive mean field approximation. We show that popular variational algorithms suffer of an instability that can produce misleading conclusions. Namely, for significant regimes of the model parameters, variational algorithms appear to output a non-trivial decomposition into topics. However –for the same parameter values– the data contain no actual information about the true decomposition, and hence the output of the algorithm is uncorrelated with the true topic decomposition.

[Joint work with Behrooz Ghorbani and Hamid Javadi]

David Tse (Stanford)

Medoids in almost linear time via multi-armed bandits

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and the single-cell RNA-Seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a 5-10x improvement in performance over the current state of the art. We further draw connections with separable submodular maximization and use Med-dit as a building block to enable large-scale exemplar clustering.

This is joint work with Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos and Martin J. Zhang.

Nina Balcan (CMU)

Foundations of Data Driven Combinatorial Algorithm Selection

Algorithm configuration is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems of significant importance to machine learning, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm’s behavior, so the algorithm’s performance is a discontinuous function of its parameters.

In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. We provide strong computational and statistical performance guarantees for several combinatorial partitioning problems (including various forms of clustering), both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively.

Robert D. Kleinberg (Cornell)

Recharging Bandits: Learning to Schedule Recurring Interventions

Traditional multi-armed bandit models posit that the payoff distribution of each action (or "arm") is stationary over time, and hence that the goal of learning is to identify the arm with the highest expected payoff and choose that one forever after. However, in many applications the efficacy of an action depends on the amount of time that has elapsed since it was last performed. Examples arise in precision agriculture, online education, and music recommendations. In this talk we introduce a generalization of the multi-armed bandit problem that models such applications. In the course of analyzing algorithms for this problem, we will encounter some interesting combinatorial questions about coloring the integers subject to bounds on the sizes of subintervals that exclude a given color.

This talk is based on joint work with Nicole Immorlica.

Accepted Papers

Aidan Gomez, Sicong Huang, Ivan Zhang, Bryan Li, Muhammad Osama and Lukasz Kaiser. Unsupervised Cipher Cracking Using Discrete GANs
Aryan Mokhtari, Hamed Hassani and Amin Karbasi. Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap
Sima Behpour, Wei Xing and Brian Ziebart. Learning Robust Cuts for Semi-Supervised and Multi-label Classification
Mohammadhossein Bateni, Hossein Esfandiari and Vahab Mirrokni. Optimal algorithms for coverage problems on massive data: Extended abstract
Zelda Mariet and Suvrit Sra. Elementary Symmetric Polynomials for Optimal Experimental Design
Stephan Clémençon and Anna Korba. Ranking Median Regression: Learning to Order through Local Consensus
So Nakashima and Takanori Maehara. Optimization from Non-Uniform Samples
Tongyi Cao and Akshay Krishnamurthy. Disagreement-based Combinatorial Pure Exploration
Alessandro Epasto, Vahab Mirrokni and Morteza Zadimoghaddam. Scalable diversity maximization via small-size composable core-sets
Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh and Yining Wang. Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach
Alexander Lenail, Ludwig Schmidt, Johnathan Li, Tobias Ehrenberger, Karen Sachs, Stefanie Jegelka and Ernest Fraenkel. Graph-Sparse Logistic Regression
Lisa Hellerstein and Devorah Kletenik. Revisiting the Approximation Bound for Stochastic Submodular Cover
Ethan R. Elenberg, Alex Dimakis, Moran Feldman and Amin Karbasi Streaming Weak Submodularity: Interpreting Neural Networks on the Fly
Holakou Rahmanian, David Helmbold and S V N Vishwanathan. Online Learning of Permutations Using Extended Formulation
Ting Chen, Martin Renqiang Min and Yizhou Sun. Learning K-way D-dimensional Discrete Code For Compact Embedding Representations
Nicholas Monath, Ari Kobren, Akshay Krishnamurthy and Andrew McCallum. Gradient-based Hierarchical Clustering
Erik Lindgren, Murat Kocaoglu, Alexandros G. Dimakis and Sriram Vishwanath. Submodularity and Minimum Cost Intervention Design for Learning Causal Graphs
Sepehr Assadi, Eric Balkanski and Renato Paes Leme. Online Ranking with Minimal Inversions
Pratiksha Thaker, Marc Brockschmidt, Alexander Gaunt and Daniel Tarlow. Semantics-aware program sampling

Call for Papers

Discrete optimization problems and combinatorial structures are ubiquitous in machine learning. They arise for discrete labels with complex dependencies, structured estimators, learning with graphs, partitions, permutations, or when selecting informative subsets of data or features.

What are efficient algorithms for handling such problems? Can we robustly solve them in the presence of noise? What about streaming or distributed settings? Which models are computationally tractable and rich enough for applications? What theoretical worst-case bounds can we show? What explains good performance in practice?

Such questions are the theme of the DISCML workshop. It aims to bring together theorists and practitioners to explore new applications, models and algorithms, and mathematical properties and concepts that can help learning with complex interactions and discrete structures.

We invite high-quality submissions that present recent results related to discrete and combinatorial problems in machine learning, and submissions that discuss open problems or controversial questions and observations, e.g., missing theory to explain why algorithms work well in certain instances but not in general, or illuminating worst case examples. We also welcome the description of well-tested software and benchmarks.

Areas of interest include, but are not restricted to:

discrete optimization in context of deep learning
graph algorithms
continuous relaxations
learning and inference in discrete probabilistic models
algorithms for large data (streaming, sketching, distributed)
online learning
new applications

Submissions

Please submit contributions in NIPS 2017 format (length max. 6 pages, non-anonymous) on easychair.

Submission deadline: November 1, 2017.

Organizers

Jeff A. Bilmes (University of Washington, Seattle)
Stefanie Jegelka (MIT)
Amin Karbasi (Yale University)
Andreas Krause (ETH Zurich, Switzerland)
Yaron Singer (Harvard University)