Bayesian nonparametric approaches for reinforcement learning in partially observable domains