# Research

# Research Program

My research interests are broadly in the intersection of optimization and machine learning. Specifically, I'm very interested in going beyond **accuracy **(which today, thanks to deep learning, we have achieved near-human performance), but also try to achieve other desiderata such as compute and memory efficiency, human interaction, label efficiency, robustness, fairness, etc. I'm interested in efficiency on multiple fronts: label efficiency (how can we learn with less labeled data), model efficiency (reducing model complexity for resource-constrained environments), and time and resource efficiency (how do we reduce end to end running time of training, and train models on resource-constrained environments). I am also interested in building intelligent systems that organize, analyze, and summarize massive amounts of data, and also automatically learn from this. Below are some of the concrete applications I'm currently very interested in working on.

**Data Subset Selection/Coresets for efficient learning**: How do we select the right sets of data making training/inference/hyperparameter tuning etc. more efficient, particularly for deep learning? I'm interested in speeding up deep learning by an order of magnitude (e.g. 5x to 10x speedup) by training on much smaller subsets without significant loss in accuracy or other evaluation metrics.**Active Learning for Deep Models**: How do we tradeoff between uncertainty and diversity in a principled manner for active learning (i.e. iteratively selecting labeled data points) in deep learning? This is particularly important since labeled data is very time-consuming and expensive to obtain for real-world problems. We study techniques that can achieve 2x - 5x labeling cost reductions for a wide range of applications.**Data Programming and Weak Supervision:**Using Weak Supervision to automatically create*noisy*labeled data for reducing labeling costs**Robust Learning:**How do we learn machine learning models in a robust manner in the presence of noisy labels, outliers, distribution shift, and imbalance.**Fair Learning:**Learning Deep Models and ML Models while ensuring fairness to under-represented and minority classes and attributes.**Feature selection:**What are principled ways of selecting the right sets of features and how to do these in model-dependent or model-independent ways? How do we do these when eliciting features have a cost associated (e.g. in medical domains, each additional medical test might have a cost) and in an online manner.**Neural Network Compression and Architecture Search in Resource Constraints:**How do we compress neural networks (top-down) or search for resource-constrained architectures (bottom-up) in an efficient manner?**Data Summarization**: What makes a good summary of data and how do we consume these summaries**Data Partitioning:**Efficient partitioning of data for clustering and distributed training

I'm also interested in achieving multiple desiderata simultaneously, i.e., approaches that can be efficient (either label or compute efficient), while being robust, fair, etc.

## Motivating Applications

Below are more details of the applications listed above. We study each of the applications below in a broad range of domains including computer vision, video analytics, speech recognition, and natural language processing/text classification.

### Data Subset Selection/Coresets for Efficient Learning

Selecting the right dataset for training is a critical problem today given massive datasets – both from training efficiency and labeling cost. This could be unsupervised, where we don’t have labels (select a subset of unlabeled data points for labeling) or supervised, where we have labels (for faster training or hyper-parameter tuning). In either case, **we are interested in obtaining a representative subset of instances for training machine learning models**. We show that the problem of selecting a subset of data with maximum likelihood on the training set is a submodular optimization problem, for several classifiers. We show that by learning of the right data subsets, we can achieve significant speedups in training time (between 5x - 10x) with minimal loss in accuracy.

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Abir De, Rishabh Iyer,

**GRAD-MATCH: A Gradient Matching Based Data Subset Selection for Efficient Deep Model Training**, To Appear in ICML 2021Durga Sivasubramanian, Rishabh Iyer, Ganesh Ramakrishnan, and Abir De,

**Training Data Subset Selection for Regression with Controlled Validation Error**, To Appear in ICML 2021Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer,

**GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning**, 35th AAAI Conference on Artificial Intelligence, AAAI 2021Krishnateja Killamsetty, Xujiang Zhou, Feng Chen, and Rishabh Iyer,

**RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning****,**arXiv: 2106.07760Kai Wei, Rishabh Iyer, Jeff Bilmes,

**Submodularity in data subset selection and active learning****,**International Conference on Machine Learning (ICML) 2015Yuzong Liu, Rishabh Iyer, Katrin Kirchhoff, Jeff Bilmes,

**SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization**, Computer Speech & Language 42, 122-142, 2017 (shorter version also appeared in INTERSPEECH 2015)Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan,

**Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019

### Active and Semi-Supervised Learning

We can similarly reduce the labeling costs by selecting (in an active learning manner) the right subset/batch of examples to label. Active Learning and SSL approaches can reduce the amount labeled data required significantly (by almost 5x to 20x) while not significantly reducing accuracy. I am also interested in active and SSL algorithms in realistic settings, i.e. with OOD, rare classes, imbalance, etc. We applied active learning to a number of application domains including computer vision, text classification, and speech recognition.

Kai Wei, Rishabh Iyer, Jeff Bilmes,

**Submodularity in data subset selection and active learning****,**International Conference on Machine Learning (ICML) 2015Vishal Kaushal, Rishabh Iyer, Suraj Kothiwade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan,

**Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV), 2019Suraj Kothawade; Nathan Beck; Krishnateja Killamsetty; Rishabh Iyer,

**SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios**, arXiv: 2107.00717Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen,

**Robust Semi-Supervised Learning with Out of Distribution Data****,**arXiv:2010.03658Nathan Beck, Durga Sivasubramanian, Apurva Dani, Ganesh Ramakrishnan, and Rishabh Iyer,

**Effective Evaluation of Deep Active Learning on Image Classification Tasks**, arXiv: 2106.15324

### Fair Learning

The goal of fair learning is to enable learning such that the resulting model performs well on under-represented slices, attributes, and slices. There are different notions of fairness, and depending on the specifics, the learning algorithm changes a little. I'm interested in studying approaches that can work for a broad range of fairness metrics, and can also be easily combined with other desiderata like robustness, compute-efficiency, or label efficiency.

MS Ozdayi, M Kantarcioglu, R Iyer, **BiFair: Training Fair Models with Bilevel Optimization****, **arXiv:2106.04757

### Data Programming

Getting high quality labelled data is very expensive, and machine learning models require massive amounts of labelled data. I am studying approaches of weak supervision for effectively learning machine learning models with very few labelled instances and a large number of unlabelled instances using noisy labels from multiple sources (semi-supervised data programming). I'm also interested in subset selection problems in this space (e.g. how do we select a subset of labeling functions for robustness, and selecting a subset of labeled instances to complement

Ayush Maheshwari, Oishik Chatterjee, KrishnaTeja Killamsetty, Ganesh Ramakrishnan, and Rishabh Iyer,

**Data Programming using Semi-Supervision and Subset Selection**, To Appear in Findings of ACL, 2021 (Long Paper)Atul Sahay, Anshul Nasery, Ayush Maheshwari, Ganesh Ramakrishnan, and Rishabh Iyer,

**Rule Augmented Unsupervised Constituency Parsing****,**To Appear in Findings of ACL, 2021 (Short Paper)

### Robust Learning

Can we make machine learning algorithms robust to noisy labels, out of distribution samples, distribution shift and imbalance? We study this problem in various settings (supervised, semi-supervised, and few shot learning) and also study the impact of robustness in these settings. We pose this problem as a bi-level optimization, and study algorithms for solving this.

Krishnateja Killamsetty, Changbin Li, Chen Zhou, Rishabh Iyer, and Feng Chen,

**A Reweighted Meta Learning Framework for Robust Few Shot Learning****,**arXiv:2011.06782Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen,

**Robust Semi-Supervised Learning with Out of Distribution Data****,**arXiv:2010.03658

### Data Summarization

I am interested in several applications of data summarization including video summarization, image collection summarization, document/text summarization and summarization of topic hierarchies. We study questions like what are natural models for summarization, how do we choose the right models for different problems/domains and how do we learn the right combinations of functions for various tasks. A lot of effort is also spent on interpretability of models, evaluation and loss functions for summarization, and at the core of it, understanding what makes a good summary for the problem at hand. We have also created new datasets for domain specific video summarization and image collection summarization. We recently released a dataset called VISIOCITY with large videos for video summarization and video understanding.

Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes,

**Learning mixtures of submodular functions for image collection summarization****,**In Advances in Neural Information Processing Systems (NIPS) 2014Ramkrishna Bairi, Rishabh Iyer, Ganesh Ramakrishnan, Jeff Bilmes,

**Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures****,**In Association of Computational Linguists (ACL) 2015Vishal Kaushal, Sandeep Subramanium, Suraj Kothiwade, Rishabh Iyer, and Ganesh Ramakrishnan,

**A Framework Towards Domain Specific Video Summarization****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan,

**Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Himanshu Asnani, and Rishabh Iyer,

**A Unified Framework for Generic, Query-Focused, Privacy Preserving and Update Summarization using Submodular Information Measures,**arXiv:2010.05631V. Kaushal, S. Kothawade, R. Iyer and G. Ramakrishnan,

**Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework**, ACMM Workshops 2020**Link to the Dataset**

### Data Partitioning

We seek to intelligently partition data for large scale distributed training, so that we can achieve superior results compared to simple random partitioning and other baselines. We demonstrate that diversified partitioning via submodular functions can achieve significant improvements on several distributed deep learning and general machine learning tasks.

Kai Wei, Rishabh Iyer, Shenjie Wang, Wenruo Bai, Jeff Bilmes,

**Mixed robust/average submodular partitioning: Fast algorithms, guarantees, and applications****,**In Advances of Neural Information Processing Systems (NIPS) 2015Kai Wei, Rishabh Iyer, Shenjie Wang, Wenruo Bai, Jeff Bilmes,

**How to intelligently distribute training data to multiple compute nodes: Distributed machine learning via submodular partitioning****,**Neural Information Processing Society (NIPS) Workshop, Montreal, Canada 2015

### Feature Selection

Feature Selection is a very important pre-processing step for machine learning and data science applications, and is used to mostly reduce prediction time and memory, feature acquisition cost, and remove noisy and irrelevant features. We study a parameterized feature selection framework using submodular functions, and particularly using a family of mutual information based models. We show how this framework can be extended to cost-aware feature elicitation.

Rishabh Iyer, Jeff Bilmes,

**Algorithms for approximate minimization of the difference between submodular functions, with applications****,**Uncertainty in Artificial Intelligence (UAI) 2012Srijita Das, Rishabh Iyer, Sriraam Natarajan ,

**A Clustering based Selection Framework for Cost Aware and Test-time Feature Elicitation,**In CODS-COMAD 2021 Research TrackSrijita Das, Rishabh Iyer, Sriraam Natarajan,

**A Parameterized Information-theoretic Feature Selection Framework for****Test-time Feature Elicitation,**In Review 2021

## Theoretical Advances

To solve the motivating applications listed in Thread 1, below are some of the theoretical directions I'm pursuing.

### UNIFIED ALGORITHMS AND THEORY OF SUBMODULAR OPTIMIZATION

Submodular Optimization is a rich and expressive class of non-linear discrete optimization problems which generalize important combinatorial functions like set cover, facility location, log-determinants, etc. A number of applications such as data subset selection, data summarization, data partitioning, and active learning naturally involving flavors of submodular optimization. In this thread, we develop fast and scalable algorithms for a number of problems which occur in practice. Examples include submodular minimization, submodular maximization, difference of submodular optimization, submodular optimization subject to submodular constraints and ratio of submodular optimization. This framework of algorithms achieved (near) optimal approximation guarantees, while being easy to implement and scaling to massive datasets. Empirically, **we demonstrated** o**rders of magnitude speedups and our algorithms have been used in several real world applications**. Our algorithms have been for several real world problems including cooperative cuts for image segmentation and cooperative matching, diffusion aware optimization, path planning, mobile crowd-sensing, trajectory optimization for aerial 3D scanning, sensor placement under cooperative costs, limited vocabulary speech data selection etc. Some relevant publications are:

Rishabh Iyer, Jeff Bilmes,

**Algorithms for approximate minimization of the difference between submodular functions, with applications****,**Uncertainty in Artificial Intelligence (UAI) 2012Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes,

**Fast semidifferential-based submodular function optimization****,**International Conference on Machine Learning (ICML) 2013 (**Winner of the Best Paper Award)**Rishabh Iyer and Jeff Bilmes,

**Submodular optimization with submodular cover and submodular knapsack constraints****,**In Advances Neural Information Processing Systems 2013**(Winner of the Outstanding Paper Award)**Rishabh K Iyer, Stefanie Jegelka, Jeff A Bilmes,

**Curvature and optimal algorithms for learning and minimizing submodular functions****,**In Advances of Neural Information Processing Systems 2013Rishabh Iyer, Stefanie Jegelka, Jeff Bilmes,

**Monotone Closure of Relaxed Constraints in Submodular Optimization: Connections Between Minimization and Maximization****,**Uncertainty in Artificial Intelligence (UAI) 2014Kai Wei, Rishabh K. Iyer, Jeff A. Bilmes,

**Fast multi-stage submodular maximization****,**International Conference on Machine Learning, ICML 2014Wenruo Bai, Rishabh Iyer, Kai Wei, Jeff Bilmes,

**Algorithms for optimizing the ratio of submodular functions****,**In Proc. International Conference on Machine Learning( ICML) 2016Rishabh Iyer and Jeff Bilmes,

**Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs,**To Appear in Artificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, JapanRishabh Iyer and Jeff Bilmes,

**A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems,**To Appear in Artificial Intelligence and Statistics (AISTATS) 2019

### LEARNING WITH SUBMODULAR FUNCTIONS

While submodular optimization occurs in inference, a critical component of fitting submodular functions to machine learning applications is **learning** the right submodular functions. In this thread, we study a rich class of models associated with submodular functions and the associated learning problems.

Vishal Kaushal, Sandeep Subramanium, Suraj Kothiwade, Rishabh Iyer, and Ganesh Ramakrishnan,

**A Framework Towards Domain Specific Video Summarization****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019Vishal Kaushal, Rishabh Iyer, Khoshrav Doctor, Anurag Sahoo, Pratik Dubal, Suraj Kothawade, Rohan Mahadev, Kunal Dargan, Ganesh Ramkrishnan,

**Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity,Representation, Coverage and Importance****,**7th IEEE Winter Conference on Applications of Computer Vision (WACV) 2019Rishabh Iyer and Jeff Bilmes,

**Submodular point processes with applications to machine learning****,**Proc. Artificial Intelligence and Statistics (AISTATS) 2015Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, Jeff A Bilmes,

**Learning mixtures of submodular functions for image collection summarization****,**In Advances in Neural Information Processing Systems (NIPS) 2014Suraj Kothawade, Jiten Girdhar, Chandrashekar Lavania, and Rishabh Iyer,

**Deep Submodular Networks for Extractive Data Summarization****,**arXiv:2010.08593

### SUBMODULAR INFORMATION FUNCTIONS

This thread studies the intersection between submodular/combinatorial optimization and information theory via the study of submodular information measures. We study properties, modeling and representational power, instantiations, and applications of such measures. Examples of this include submodular mutual information, submodular distance metrics, divergences, and multi-set submodular information measures.

Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani,

**Submodular Combinatorial Information Measures with Applications in Machine Learning****,**The 32nd International Conference on Algorithmic Learning Theory, ALT 2021.Jennifer A Gillenwater, Rishabh K Iyer, Bethany Lusch, Rahul Kidambi, Jeff A Bilmes,

**Submodular hamming metrics****,**In Advances in Neural Information Processing Systems 2015Vishal Kaushal, Suraj Kothawade, Ganesh Ramakrishnan, Jeff Bilmes, Himanshu Asnani, and Rishabh Iyer,

**A Unified Framework for Generic, Query-Focused, Privacy Preserving and Update Summarization using Submodular Information Measures,**arXiv:2010.05631Himanshu Asnani, Jeff Bilmes, and Rishabh Iyer,

**Independence Properties of Generalized Submodular Information Measures****,**2021 IEEE International Symposium on Information Theory, ISIT 2021

### DISCRETE AND CONTINUOUS BILEVEL OPTIMIZATION

A growing number of applications for efficient and robust learning involve bi-level optimization. In this thread, we study approaches for solving such bilevel optimization problems in an efficient manner, particularly for deep learning models. I'm particularly interested in bi-level optimization problems that have a discrete component in them (i.e. a mixed discrete/continuous bi-level optimization problem).

Krishnateja Killamsetty, Changbin Li, Chen Zhou, Rishabh Iyer, and Feng Chen,

**A Reweighted Meta Learning Framework for Robust Few Shot Learning****,**arXiv:2011.06782Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen,

**Robust Semi-Supervised Learning with Out of Distribution Data****,**arXiv:2010.03658Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer,

**GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning**, 35th AAAI Conference on Artificial Intelligence (AAAI) 2021