Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

September 6, 2019

Our paper was accepted at NeurIPS 2019. Yay! :) We use ensembles of neural networks to get (per-state) uncertainty estimates and dynamically switch between Temporal-Difference and Monte Carlo estimates. Now you can do more accurate on-policy evaluation from logs. #reinforcementlearning #neurips

Work done in collaboration with Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly and Gergely Neu.