The goal of transfer is to use knowledge obtained by solving one task to improve a robot’s (or software agent’s) performance in future tasks. In general, we do not expect this to work; for transfer to be feasible, there must be something in common between the source task(s) and goal task(s). The question at the core of the transfer learning enterprise is therefore: what makes two tasks related?, or more generally, how do you define a family of related tasks? Given a precise definition of how a particular family of tasks is related, we can formulate clear optimization methods for selecting source tasks and determining what knowledge should be imported from the source task(s), and how it should be used in the target task(s). This paper describes one model that has appeared in several different research scenarios where an agent is faced with a family of tasks that have similar, but not identical, dynamics (or reward functions). For example, a human learning to play baseball may, over the course of their career, be exposed to several different bats, each with slightly different weights and lengths. A human who has learned to play baseball well with one bat would be expected to be able to pick up any similar bat and use it. Similarly, when learning to drive a car, one may learn in more than one car, and then be expected to be able to drive any make and model of car (within reasonable variations) with little or no relearning. These examples are instances of exactly the kind of flexible, reliable, and sample-efficient behavior that we should be aiming to achieve in robotics applications. One way to model such a family of tasks is to posit that they are generated by a small set of latent parameters (e.g., the length and weight of the bat, or parameters describing the various physical properties of the car’s steering system and clutch) that are fixed for each problem instance (e.g., for each bat, or car), but are not directly observable by the agent. Defining a distribution over these latent parameters results in a family of related tasks, and transfer is feasible to the extent that the number of latent variables is small, the task dynamics (or reward function) vary smoothly with them, and to the extent to which they can either be ignored or identified using transition data from the task. This model has appeared under several different names in the literature; we refer to it as a hidden-parameter Markov decision process (or HIPMDP).
Accurate knowledge of a patient’s disease state and trajectory is critical in a clinical setting. Modern electronic healthcare records contain an increasingly large amount of data, and the ability to automatically identify the factors that influence patient outcomes stand to greatly improve the ef- ficiency and quality of care. We examined the use of latent variable models (viz. Latent Dirichlet Allocation) to decompose free-text hospital notes into meaningful features, and the predictive power of these features for patient mortality. We considered three prediction regimes: (1) baseline prediction, (2) dynamic (timevarying) outcome prediction, and (3) retrospective outcome prediction. In each, our prediction task differs from the familiar time-varying situation whereby data accumulates; since fewer patients have long ICU stays, as we move forward in time fewer patients are available and the prediction task becomes increasingly difficult. We found that latent topic-derived features were effective in determining patient mortality under three timelines: inhospital, 30 day post-discharge, and 1 year post-discharge mortality. Our results demonstrated that the latent topic features important in predicting hospital mortality are very different from those that are important in post-discharge. mortality. In general, latent topic features were more predictive than structured features, and a combination of the two performed best. The time-varying models that combined latent topic features and baseline features had AUCs that reached 0.85, 0.80, and 0.77 for in-hospital, 30 day post-discharge and 1 year post-discharge mortality respectively. Our results agreed with other work suggesting that the first 24 hours of patient information are often the most predictive of hospital mortality. Retrospective models that used a combination of latent topic features and structured features achieved AUCs of 0.96, 0.82, and 0.81 for in-hospital, 30 day, and 1-year mortality prediction. Our work focuses on the dynamic (time-varying) setting, because models from this regime could facilitate an on-going severity stratification system that helps d
OBJECTIVE: The distinct trajectories of patients with autism spectrum disorders (ASDs) have not been extensively studied, particularly regarding clinical manifestations beyond the neurobehavioral criteria from the Diagnostic and Statistical Manual of Mental Disorders. The objective of this study was to investigate the patterns of co-occurrence of medical comorbidities in ASDs.
METHODS:International Classification of Diseases, Ninth Revision codes from patients aged at least 15 years and a diagnosis of ASD were obtained from electronic medical records. These codes were aggregated by using phenotype-wide association studies categories and processed into 1350-dimensional vectors describing the counts of the most common categories in 6-month blocks between the ages of 0 to 15. Hierarchical clustering was used to identify subgroups with distinct courses.
RESULTS: Four subgroups were identified. The first was characterized by seizures (n = 120, subgroup prevalence 77.5%). The second (n = 197) was characterized by multisystem disorders including gastrointestinal disorders (prevalence 24.3%) and auditory disorders and infections (prevalence 87.8%), and the third was characterized by psychiatric disorders (n = 212, prevalence 33.0%). The last group (n = 4316) could not be further resolved. The prevalence of psychiatric disorders was uncorrelated with seizure activity (P = .17), but a significant correlation existed between gastrointestinal disorders and seizures (P < .001). The correlation results were replicated by using a second sample of 496 individuals from a different geographic region.
CONCLUSIONS: Three distinct patterns of medical trajectories were identified by unsupervised clustering of electronic health record diagnoses. These may point to distinct etiologies with different genetic and environmental contributions. Additional clinical and molecular characterizations will be required to further delineate these subgroups.
Control applications often feature tasks with similar, but not identical, dynamics. We introduce the Hidden Parameter Markov Decision Process (HiP-MDP), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors, and introduce a semiparametric regression approach for learning its structure from data. In the control setting, we show that a learned HiP-MDP rapidly identifies the dynamics of a new task instance, allowing an agent to flexibly adapt to task variations.
Abstract—Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation. Index Terms—Artificial intelligence, machine learning, reinforcement learning, partially-observable Markov decision process, hierarchial Dirichlet process hidden Markov model.
It is commonly stated that reinforcement learning (RL) algorithms require more samples to learn than humans. In this work, we investigate this claim using two standard problems from the RL literature. We compare the performance of human subjects to RL techniques. We find that context—the meaningfulness of the observations—plays a significant role in the rate of human RL. Moreover, without contextual information, humans often fare much worse than classic algorithms. Comparing the detailed responses of humans and RL algorithms, we also find that humans appear to employ rather different strategies from standard algorithms, even in cases where they had indistinguishable performance to them.