Killian T, Daulton S, Konidaris G, Doshi-Velez F. Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes. Neural Information Processing Systems (NIPS). 2017. Paper
Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P. A Bayesian Framework for Learning Rule Sets for Interpretable Classification. Journal of Machine Learning. 2017;18 (70) :1-37. Paper
Ross AS, Hughes MC, Doshi-Velez F. Right for the Right Reasons: Training Differentiable Models by Constraining their Explananations. International Joint Conference on Artificial Intelligence (IJCAI). 2017. Paper
Doshi-Velez F, Williamson S. Restricted Indian Buffet Processes. Statistics and Computing. 2017;27 (5) :1205-1223. Paper
Wu M, Ghassemi M, Fend M, Celi LA, Szolovits P, Doshi-Velez F. Understanding Vasopressor Intervention and Weaning: Risk Prediction in a Public Heterogeneous Clinical Time Series Database. Journal of the American Medical Informatics Association. 2017;24 (3) :488-495.Abstract


Background The widespread adoption of electronic health records allows us to ask evidence-based questions about the need for and benefits of specific clinical interventions in critical-care settings across large populations.

Objective We investigated the prediction of vasopressor administration and weaning in the intensive care unit. Vasopressors are commonly used to control hypotension, and changes in timing and dosage can have a large impact on patient outcomes.

Materials and Methods We considered a cohort of 15 695 intensive care unit patients without orders for reduced care who were alive 30 days post-discharge. A switching-state autoregressive model (SSAM) was trained to predict the multidimensional physiological time series of patients before, during, and after vasopressor administration. The latent states from the SSAM were used as predictors of vasopressor administration and weaning.

Results The unsupervised SSAM features were able to predict patient vasopressor administration and successful patient weaning. Features derived from the SSAM achieved areas under the receiver operating curve of 0.92, 0.88, and 0.71 for predicting ungapped vasopressor administration, gapped vasopressor administration, and vasopressor weaning, respectively. We also demonstrated many cases where our model predicted weaning well in advance of a successful wean.

Conclusion Models that used SSAM features increased performance on both predictive tasks. These improvements may reflect an underlying, and ultimately predictive, latent state detectable from the physiological time series.


Depewag S, Hernández-Lobato JM, Doshi-Velez F, Udluft S. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks. ICLR. 2017.Abstract

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing α  -divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

Parbhoo S, Bogojeska J, Zazzi M, Roth V, Doshi-Velez F. Combining Kernel and Model Based Learning for HIV Therapy Selection. Neural Information Processing Systems (NIPS) Workshop for Machine Learning and Healthcare. 2016. Paper
Killian TW, Konidaris G, Doshi-Velez F. Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes. Neural Information Processing Systems (NIPS) Workshop for Machine Learning and Healthcare. 2016. Paper
Hughes MC, Elibol HM, McCoy T, Perlis R, Doshi-Velez F. Supervised topic models for clinical interpretability. Neural Information Processing Systems (NIPS) Workshop for Machine Learning and Healthcare. 2016. Paper
Masood MA, Doshi-Velez F. Robust Posterior Exploration in NMF. International Conference on Machine Learning (ICML) Workshop on Geometry in Machine Learning. 2016. Paper
Shain C, Bryce W, Jin L, Krakovna V, Doshi-Velez F, Miller T, Schuler W, Schwartz L. Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input. Computational Linguistics: Technical Papers (COLING). 2016 :964-975. Paper
Doshi-Velez F, Konidaris G. Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations. IJCAI. 2016. Paper
Elibol M, Nguyen V, Linderman S, Johnson M, Hashmi A, Doshi-Velez F. Cross-Corpora Unsupervised Learning of Trajectories in Autism Spectrum Disorders. Journal of Machine Learning Research. 2016;17 (1) :4597-4634. Paper
Tran D, Kim M, Doshi-Velz F. Spectral M-estimation with Application to Hidden Markov Models: Supplementary Material. AISTATS. 2016. Paper
Pan W, Doshi-Velez F. A Characterization of the Non-Uniqueness of Nonnegative Matrix Factorizations. arXiv:1604.00653 . 2016.Abstract

Nonnegative matrix factorization (NMF) is a popular dimension reduction technique that produces interpretable decomposition of the data into parts. However, this decompostion is not generally identifiable (even up to permutation and scaling). While other studies have provide criteria under which NMF is identifiable, we present the first (to our knowledge) characterization of the non-identifiability of NMF. We describe exactly when and how non-uniqueness can occur, which has important implications for algorithms to efficiently discover alternate solutions, if they exist.

Xia X, Protopapas P, Doshi-Velez F. Cost-Sensitive Batch Mode Active Learning: Designing Astronomical Observation by Optimizing Telescope Time and Telescope Choice. 2016.Abstract

Masood A, Pan W, Doshi-Velez F. An Empirical Comparison of Sampling Quality Metrics: A Case Study for Bayesian Nonnegative Matrix Factorization. arXiv preprint arXiv:1606.06250. 2016.Abstract

In this work, we empirically explore the question: how can we assess the quality of samples from some target distribution? We assume that the samples are provided by some valid Monte Carlo procedure, so we are guaranteed that the collection of samples will asymptotically approximate the true distribution. Most current evaluation approaches focus on two questions: (1) Has the chain mixed, that is, is it sampling from the distribution? and (2) How independent are the samples (as MCMC procedures produce correlated samples)? Focusing on the case of Bayesian nonnegative matrix factorization, we empirically evaluate standard metrics of sampler quality as well as propose new metrics to capture aspects that these measures fail to expose. The aspect of sampling that is of particular interest to us is the ability (or inability) of sampling methods to move between multiple optima in NMF problems. As a proxy, we propose and study a number of metrics that might quantify the diversity of a set of NMF factorizations obtained by a sampler through quantifying the coverage of the posterior distribution. We compare the performance of a number of standard sampling methods for NMF in terms of these new metrics.

Gafford J, Doshi-Velez F, Wood R, Walsh C. Machine Learning Approaches to Environmental Disturbance Rejection in Multi-Axis Optoelectronic Force Sensors. Sensors and Actuators A: Physical. 2016;248 :78-87.Abstract

Light-intensity modulated (LIM) force sensors are seeing increasing interest in the field of surgical robotics and flexible systems in particular. However, such sensing modalities are notoriously susceptible to ambient effects such as temperature and environmental irradiance which can register as false force readings. We explore machine learning techniques to dynamically compensate for environmental biases that plague multi-axis optoelectronic force sensors. In this work, we fabricate a multisensor: three-axis LIM force sensor with integrated temperature and ambient irradiance sensing manufactured via a monolithic, origami-inspired fabrication process called printed-circuit MEMS. We explore machine learning regression techniques to compensate for temperature and ambient light sensitivity using on-board environmental sensor data. We compare batch-based ridge regression, kernelized regression and support vector techniques to baseline ordinary least-squares estimates to show that on-board environmental monitoring can substantially improve sensor force tracking performance and output stability under variable lighting and large (>100 °C) thermal gradients. By augmenting the least-squares estimate with nonlinear functions describing both environmental disturbances and cross-axis coupling effects, we can reduce the error in Fx, Fy and Fz by 10%, 33%, and 73%, respectively. We assess viability of each algorithm tested in terms of both prediction accuracy and computational overhead, and analyze kernel-based regression for prediction in the context of online force feedback and haptics applications in surgical robotics. Finally, we suggest future work for fast approximation and prediction using stochastic, sparse kernel techniques.

Lingren T, Chen P, Bochenek J, Doshi-Velez F, Manning-Courtney P, Bickel J, Welchons LW, Reinhold J, Bing N, Ni Y, et al. Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder. PLoS ONE 11(7): e0159621. 2016. Paper
Krakovna V, Doshi-Velez F. Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. arXiv:1606.05320 . 2016.Abstract

Abstract: As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks (RNNs), state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining an RNN with a hidden Markov model (HMM), a simpler and more transparent model. We explore various combinations of RNNs and HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained first, then a small LSTM is given HMM state distributions and trained to fill in gaps in the HMM's performance; and a jointly trained hybrid model. We find that the LSTM and HMM learn complementary information about the features in the text.