Elibol M, Nguyen V, Linderman S, Johnson M, Hashmi A, Doshi-Velez F. Cross-Corpora Unsupervised Learning of Trajectories in Autism Spectrum Disorders. Journal of Machine Learning Research. 2016;17 (1) :4597-4634. Paper
Tran D, Kim M, Doshi-Velz F. Spectral M-estimation with Application to Hidden Markov Models: Supplementary Material. AISTATS. 2016. Paper
Pan W, Doshi-Velez F. A Characterization of the Non-Uniqueness of Nonnegative Matrix Factorizations. arXiv:1604.00653 . 2016.Abstract

Nonnegative matrix factorization (NMF) is a popular dimension reduction technique that produces interpretable decomposition of the data into parts. However, this decompostion is not generally identifiable (even up to permutation and scaling). While other studies have provide criteria under which NMF is identifiable, we present the first (to our knowledge) characterization of the non-identifiability of NMF. We describe exactly when and how non-uniqueness can occur, which has important implications for algorithms to efficiently discover alternate solutions, if they exist.

Xia X, Protopapas P, Doshi-Velez F. Cost-Sensitive Batch Mode Active Learning: Designing Astronomical Observation by Optimizing Telescope Time and Telescope Choice. 2016.Abstract

Masood A, Pan W, Doshi-Velez F. An Empirical Comparison of Sampling Quality Metrics: A Case Study for Bayesian Nonnegative Matrix Factorization. arXiv preprint arXiv:1606.06250. 2016.Abstract

In this work, we empirically explore the question: how can we assess the quality of samples from some target distribution? We assume that the samples are provided by some valid Monte Carlo procedure, so we are guaranteed that the collection of samples will asymptotically approximate the true distribution. Most current evaluation approaches focus on two questions: (1) Has the chain mixed, that is, is it sampling from the distribution? and (2) How independent are the samples (as MCMC procedures produce correlated samples)? Focusing on the case of Bayesian nonnegative matrix factorization, we empirically evaluate standard metrics of sampler quality as well as propose new metrics to capture aspects that these measures fail to expose. The aspect of sampling that is of particular interest to us is the ability (or inability) of sampling methods to move between multiple optima in NMF problems. As a proxy, we propose and study a number of metrics that might quantify the diversity of a set of NMF factorizations obtained by a sampler through quantifying the coverage of the posterior distribution. We compare the performance of a number of standard sampling methods for NMF in terms of these new metrics.

Gafford J, Doshi-Velez F, Wood R, Walsh C. Machine Learning Approaches to Environmental Disturbance Rejection in Multi-Axis Optoelectronic Force Sensors. Sensors and Actuators A: Physical. 2016;248 :78-87.Abstract

Light-intensity modulated (LIM) force sensors are seeing increasing interest in the field of surgical robotics and flexible systems in particular. However, such sensing modalities are notoriously susceptible to ambient effects such as temperature and environmental irradiance which can register as false force readings. We explore machine learning techniques to dynamically compensate for environmental biases that plague multi-axis optoelectronic force sensors. In this work, we fabricate a multisensor: three-axis LIM force sensor with integrated temperature and ambient irradiance sensing manufactured via a monolithic, origami-inspired fabrication process called printed-circuit MEMS. We explore machine learning regression techniques to compensate for temperature and ambient light sensitivity using on-board environmental sensor data. We compare batch-based ridge regression, kernelized regression and support vector techniques to baseline ordinary least-squares estimates to show that on-board environmental monitoring can substantially improve sensor force tracking performance and output stability under variable lighting and large (>100 °C) thermal gradients. By augmenting the least-squares estimate with nonlinear functions describing both environmental disturbances and cross-axis coupling effects, we can reduce the error in Fx, Fy and Fz by 10%, 33%, and 73%, respectively. We assess viability of each algorithm tested in terms of both prediction accuracy and computational overhead, and analyze kernel-based regression for prediction in the context of online force feedback and haptics applications in surgical robotics. Finally, we suggest future work for fast approximation and prediction using stochastic, sparse kernel techniques.

Lingren T, Chen P, Bochenek J, Doshi-Velez F, Manning-Courtney P, Bickel J, Welchons LW, Reinhold J, Bing N, Ni Y, et al. Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder. PLoS ONE 11(7): e0159621. 2016. Paper
Krakovna V, Doshi-Velez F. Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. arXiv:1606.05320 . 2016.Abstract

Abstract: As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks (RNNs), state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining an RNN with a hidden Markov model (HMM), a simpler and more transparent model. We explore various combinations of RNNs and HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained first, then a small LSTM is given HMM state distributions and trained to fill in gaps in the HMM's performance; and a jointly trained hybrid model. We find that the LSTM and HMM learn complementary information about the features in the text.

Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P. Bayesian Or's of And's for Interpretable Classification with Application to Context Aware Recommender Systems. arXiv:1504.07614. 2015. Paper
Kim B, Shah JA, Doshi-Velez F. Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction, in Advances in Neural Information Processing Systems. ; 2015 :2251–2259. Paper
Doshi-Velez F, Avillach P, Palmer N, Bousvaros A, Ge Y, Fox K, Steinberg G, Spettell C, Juster I, Kohane I. Prevalence of Inflammatory Bowel Disease Among Patients with Autism Spectrum Disorders. Inflammatory bowel diseases. 2015;21 :2281–2288. Paper
Doshi-Velez F, Pfau D, Wood F, Roy N. Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2015;37 (2) :394 - 407.Abstract

Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation.

Doshi-Velez F, Marshall YE. HackEbola with Data: On the hackathon format for timely data analysis. 2015.Abstract
For more information, see the event page:
Summary Paper
Doshi-Velez F, Wallace BC, Adams RP. Graph-Sparse LDA: A Topic Model with Structured Sparsity. AAAI . 2015.Abstract

Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.

Doshi-Velez F, Ge Y, Kohane I. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics. 2014;133 :e54–e63. Paper
Doshi-Velez F, Wallace B, Adams R. Graph-Sparse LDA: A Topic Model with Structured Sparsity. arXiv:1410.4510. 2014. Paper
Konidaris G, Doshi-Velez F. Hidden Parameter Markov Decision Processes: An Emerging Paradigm for Modeling Families of Related Tasks. AAAI 2014 Fall Symposium on Knowledge, Skill, and Behavior Transfer in Autonomous Robots. 2014.Abstract

The goal of transfer is to use knowledge obtained by solving one task to improve a robot’s (or software agent’s) performance in future tasks. In general, we do not expect this to work; for transfer to be feasible, there must be something in common between the source task(s) and goal task(s). The question at the core of the transfer learning enterprise is therefore: what makes two tasks related?, or more generally, how do you define a family of related tasks? Given a precise definition of how a particular family of tasks is related, we can formulate clear optimization methods for selecting source tasks and determining what knowledge should be imported from the source task(s), and how it should be used in the target task(s). This paper describes one model that has appeared in several different research scenarios where an agent is faced with a family of tasks that have similar, but not identical, dynamics (or reward functions). For example, a human learning to play baseball may, over the course of their career, be exposed to several different bats, each with slightly different weights and lengths. A human who has learned to play baseball well with one bat would be expected to be able to pick up any similar bat and use it. Similarly, when learning to drive a car, one may learn in more than one car, and then be expected to be able to drive any make and model of car (within reasonable variations) with little or no relearning. These examples are instances of exactly the kind of flexible, reliable, and sample-efficient behavior that we should be aiming to achieve in robotics applications. One way to model such a family of tasks is to posit that they are generated by a small set of latent parameters (e.g., the length and weight of the bat, or parameters describing the various physical properties of the car’s steering system and clutch) that are fixed for each problem instance (e.g., for each bat, or car), but are not directly observable by the agent. Defining a distribution over these latent parameters results in a family of related tasks, and transfer is feasible to the extent that the number of latent variables is small, the task dynamics (or reward function) vary smoothly with them, and to the extent to which they can either be ignored or identified using transition data from the task. This model has appeared under several different names in the literature; we refer to it as a hidden-parameter Markov decision process (or HIPMDP).

Ghassemi M, Naumann T, Doshi-Velez F, Brimmer N, Joshi R, Rumshisky A, Szolovits P. Unfolding Physiological State: Mortality Modelling in Intensive Care Units. ACM SIGKDD international conference on Knowledge discovery and data mining. 2014 :75-84 .Abstract

Accurate knowledge of a patient’s disease state and trajectory is critical in a clinical setting. Modern electronic healthcare records contain an increasingly large amount of data, and the ability to automatically identify the factors that influence patient outcomes stand to greatly improve the ef- ficiency and quality of care. We examined the use of latent variable models (viz. Latent Dirichlet Allocation) to decompose free-text hospital notes into meaningful features, and the predictive power of these features for patient mortality. We considered three prediction regimes: (1) baseline prediction, (2) dynamic (timevarying) outcome prediction, and (3) retrospective outcome prediction. In each, our prediction task differs from the familiar time-varying situation whereby data accumulates; since fewer patients have long ICU stays, as we move forward in time fewer patients are available and the prediction task becomes increasingly difficult. We found that latent topic-derived features were effective in determining patient mortality under three timelines: inhospital, 30 day post-discharge, and 1 year post-discharge mortality. Our results demonstrated that the latent topic features important in predicting hospital mortality are very different from those that are important in post-discharge. mortality. In general, latent topic features were more predictive than structured features, and a combination of the two performed best. The time-varying models that combined latent topic features and baseline features had AUCs that reached 0.85, 0.80, and 0.77 for in-hospital, 30 day post-discharge and 1 year post-discharge mortality respectively. Our results agreed with other work suggesting that the first 24 hours of patient information are often the most predictive of hospital mortality. Retrospective models that used a combination of latent topic features and structured features achieved AUCs of 0.96, 0.82, and 0.81 for in-hospital, 30 day, and 1-year mortality prediction. Our work focuses on the dynamic (time-varying) setting, because models from this regime could facilitate an on-going severity stratification system that helps d

Doshi-Velez F, Ge Y, Kohane I. Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis. Pediatrics. 2013;10.1542 (peds.2013) :0819.Abstract

OBJECTIVE: The distinct trajectories of patients with autism spectrum disorders (ASDs) have not been extensively studied, particularly regarding clinical manifestations beyond the neurobehavioral criteria from the Diagnostic and Statistical Manual of Mental Disorders. The objective of this study was to investigate the patterns of co-occurrence of medical comorbidities in ASDs.

METHODS: International Classification of Diseases, Ninth Revision codes from patients aged at least 15 years and a diagnosis of ASD were obtained from electronic medical records. These codes were aggregated by using phenotype-wide association studies categories and processed into 1350-dimensional vectors describing the counts of the most common categories in 6-month blocks between the ages of 0 to 15. Hierarchical clustering was used to identify subgroups with distinct courses.

RESULTS: Four subgroups were identified. The first was characterized by seizures (n = 120, subgroup prevalence 77.5%). The second (n = 197) was characterized by multisystem disorders including gastrointestinal disorders (prevalence 24.3%) and auditory disorders and infections (prevalence 87.8%), and the third was characterized by psychiatric disorders (n = 212, prevalence 33.0%). The last group (n = 4316) could not be further resolved. The prevalence of psychiatric disorders was uncorrelated with seizure activity (P = .17), but a significant correlation existed between gastrointestinal disorders and seizures (P < .001). The correlation results were replicated by using a second sample of 496 individuals from a different geographic region.

CONCLUSIONS: Three distinct patterns of medical trajectories were identified by unsupervised clustering of electronic health record diagnoses. These may point to distinct etiologies with different genetic and environmental contributions. Additional clinical and molecular characterizations will be required to further delineate these subgroups.

Doshi-Velez F, Konidaris G. Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations. CoRR. 2013;abs/1308.3513.Abstract

Control applications often feature tasks with similar, but not identical, dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric regression approach for learning its structure from data. In the control setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a new task instance, allowing an agent to flexibly adapt to task variations.