A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes