Traditionally, machine learning has been focused on methods where objects reside in continuous domains. The goal of this workshop is to advance state-of-the-art methods in machine learning that involve discrete structures.
Models with ultimately discrete solutions play an important role in machine learning. At its core, statistical machine learning is concerned with making inferences from data, and when the underlying variables of the data are discrete, both the tasks of model inference as well as predictions using the inferred model are inherently discrete algorithmic problems. Many of these problems are notoriously hard, and even those that are theoretically tractable become intractable in practice with abundant and steadily increasing amounts of data. As a result, standard theoretical models and off-the-shelf algorithms become either impractical or intractable (and in some cases both).
While many problems are hard in the worst case, the problems of practical interest are often much more well-behaved, and have the potential to be modeled in ways that make them tractable. Indeed, many discrete problems in machine learning can possess beneficial structure; such structure has been an important ingredient in many successful (approximate) solution strategies. Examples include submodularity, marginal polytopes, symmetries and exchangeability.
Machine learning, algorithms, discrete mathematics and combinatorics as well as applications in computer vision, speech, NLP, biology and network analysis are all active areas of research, each with an increasingly large body of foundational knowledge. The workshop aims to ask questions that enable communication across these fields. In particular, this year we aim to address the investigation of combinatorial structures allows to capture complex, high-order dependencies in discrete learning problems prevalent in deep learning, social networks, etc. An emphasis will be given on uncertainty and structure that results from problem instances being estimated from data.
09.00 - 09.15 | Welcome |
09.15 - 10.00 |
An Instability in Variational Inference for Topic Models |
10.00 - 10.30 |
Spotlights I
|
10.30 - 11.00 | Coffee Break and Posters |
11.00 - 11.45 |
Medoids in almost linear time via multi-armed bandits |
11.45 - 12.00 |
Spotlights II
|
12.00 - 13.30 | Lunch |
13.30 - 14.15 |
Recharging Bandits: Learning to Schedule Recurring Interventions |
14.15 - 14.30 | contributed talk: Erik Lindgren, Murat Kocaoglu, Alexandros G. Dimakis and Sriram Vishwanath. Submodularity and Minimum Cost Intervention Design for Learning Causal Graphs |
14.30 - 14.45 | contributed talk: Zelda Mariet and Suvrit Sra. Elementary Symmetric Polynomials for Optimal Experimental Design |
14.45 - 15.00 |
Spotlights III
|
15.00 - 16.00 | Coffee Break and Posters |
16.00 - 16.45 |
Foundations of Data Driven Combinatorial Algorithm Selection |
16.45 - 17.00 | contributed talk: So Nakashima and Takanori Maehara. Optimization from Non-Uniform Samples |
17.00 - 17.15 | contributed talk: Nicholas Monath, Ari Kobren, Akshay Krishnamurthy and Andrew McCallum. Gradient-based Hierarchical Clustering |
17.15 - 18.30 | Posters |
An Instability in Variational Inference for Topic Models
Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a convex combination of a small number of vectors corresponding to ‘topics’ or ‘components’. The weights are assumed to have a Dirichlet prior distribution. The standard approach towards approximating the posterior is to use variational inference algorithms, and in particular naive mean field approximation. We show that popular variational algorithms suffer of an instability that can produce misleading conclusions. Namely, for significant regimes of the model parameters, variational algorithms appear to output a non-trivial decomposition into topics. However –for the same parameter values– the data contain no actual information about the true decomposition, and hence the output of the algorithm is uncorrelated with the true topic decomposition.
[Joint work with Behrooz Ghorbani and Hamid Javadi]
Medoids in almost linear time via multi-armed bandits
Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and the single-cell RNA-Seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a 5-10x improvement in performance over the current state of the art. We further draw connections with separable submodular maximization and use Med-dit as a building block to enable large-scale exemplar clustering.
This is joint work with Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos and Martin J. Zhang.
Foundations of Data Driven Combinatorial Algorithm Selection
Algorithm configuration is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners typically optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems of significant importance to machine learning, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm’s behavior, so the algorithm’s performance is a discontinuous function of its parameters.
In this talk, I will present new work that helps put data driven combinatorial algorithm selection on firm foundations. We provide strong computational and statistical performance guarantees for several combinatorial partitioning problems (including various forms of clustering), both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively.
Recharging Bandits: Learning to Schedule Recurring Interventions
Traditional multi-armed bandit models posit that the payoff distribution of each action (or "arm") is stationary over time, and hence that the goal of learning is to identify the arm with the highest expected payoff and choose that one forever after. However, in many applications the efficacy of an action depends on the amount of time that has elapsed since it was last performed. Examples arise in precision agriculture, online education, and music recommendations. In this talk we introduce a generalization of the multi-armed bandit problem that models such applications. In the course of analyzing algorithms for this problem, we will encounter some interesting combinatorial questions about coloring the integers subject to bounds on the sizes of subintervals that exclude a given color.
This talk is based on joint work with Nicole Immorlica.
Discrete optimization problems and combinatorial structures are ubiquitous in machine learning. They arise for discrete labels with complex dependencies, structured estimators, learning with graphs, partitions, permutations, or when selecting informative subsets of data or features.
What are efficient algorithms for handling such problems? Can we robustly solve them in the presence of noise? What about streaming or distributed settings? Which models are computationally tractable and rich enough for applications? What theoretical worst-case bounds can we show? What explains good performance in practice?
Such questions are the theme of the DISCML workshop. It aims to bring together theorists and practitioners to explore new applications, models and algorithms, and mathematical properties and concepts that can help learning with complex interactions and discrete structures.
We invite high-quality submissions that present recent results related to discrete and combinatorial problems in machine learning, and submissions that discuss open problems or controversial questions and observations, e.g., missing theory to explain why algorithms work well in certain instances but not in general, or illuminating worst case examples. We also welcome the description of well-tested software and benchmarks.
Areas of interest include, but are not restricted to:
Please submit contributions in NIPS 2017 format (length max. 6 pages, non-anonymous) on easychair.
Submission deadline: November 1, 2017.