We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing α -divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.
Nonnegative matrix factorization (NMF) is a popular dimension reduction technique that produces interpretable decomposition of the data into parts. However, this decompostion is not generally identifiable (even up to permutation and scaling). While other studies have provide criteria under which NMF is identifiable, we present the first (to our knowledge) characterization of the non-identifiability of NMF. We describe exactly when and how non-uniqueness can occur, which has important implications for algorithms to efficiently discover alternate solutions, if they exist.
In this work, we empirically explore the question: how can we assess the quality of samples from some target distribution? We assume that the samples are provided by some valid Monte Carlo procedure, so we are guaranteed that the collection of samples will asymptotically approximate the true distribution. Most current evaluation approaches focus on two questions: (1) Has the chain mixed, that is, is it sampling from the distribution? and (2) How independent are the samples (as MCMC procedures produce correlated samples)? Focusing on the case of Bayesian nonnegative matrix factorization, we empirically evaluate standard metrics of sampler quality as well as propose new metrics to capture aspects that these measures fail to expose. The aspect of sampling that is of particular interest to us is the ability (or inability) of sampling methods to move between multiple optima in NMF problems. As a proxy, we propose and study a number of metrics that might quantify the diversity of a set of NMF factorizations obtained by a sampler through quantifying the coverage of the posterior distribution. We compare the performance of a number of standard sampling methods for NMF in terms of these new metrics.
Light-intensity modulated (LIM) force sensors are seeing increasing interest in the field of surgical robotics and flexible systems in particular. However, such sensing modalities are notoriously susceptible to ambient effects such as temperature and environmental irradiance which can register as false force readings. We explore machine learning techniques to dynamically compensate for environmental biases that plague multi-axis optoelectronic force sensors. In this work, we fabricate a multisensor: three-axis LIM force sensor with integrated temperature and ambient irradiance sensing manufactured via a monolithic, origami-inspired fabrication process called printed-circuit MEMS. We explore machine learning regression techniques to compensate for temperature and ambient light sensitivity using on-board environmental sensor data. We compare batch-based ridge regression, kernelized regression and support vector techniques to baseline ordinary least-squares estimates to show that on-board environmental monitoring can substantially improve sensor force tracking performance and output stability under variable lighting and large (>100 °C) thermal gradients. By augmenting the least-squares estimate with nonlinear functions describing both environmental disturbances and cross-axis coupling effects, we can reduce the error in Fx, Fy and Fz by 10%, 33%, and 73%, respectively. We assess viability of each algorithm tested in terms of both prediction accuracy and computational overhead, and analyze kernel-based regression for prediction in the context of online force feedback and haptics applications in surgical robotics. Finally, we suggest future work for fast approximation and prediction using stochastic, sparse kernel techniques.
Abstract: As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks (RNNs), state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining an RNN with a hidden Markov model (HMM), a simpler and more transparent model. We explore various combinations of RNNs and HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained first, then a small LSTM is given HMM state distributions and trained to fill in gaps in the HMM's performance; and a jointly trained hybrid model. We find that the LSTM and HMM learn complementary information about the features in the text.
Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation.