Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

You searched for +publisher:"University of Texas – Austin" +contributor:("Sutton, Richard"). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters


University of Texas – Austin

1. -7411-0398. Data efficient reinforcement learning with off-policy and simulated data.

Degree: PhD, Computer Science, 2019, University of Texas – Austin

Learning from interaction with the environment  – trying untested actions, observing successes and failures, and tying effects back to causes  – is one of the first capabilities we think of when considering autonomous agents. Reinforcement learning (RL) is the area of artificial intelligence research that has the goal of allowing autonomous agents to learn in this way. Despite much recent success, many modern reinforcement learning algorithms are still limited by the requirement of large amounts of experience before useful skills are learned. Two possible approaches to improving data efficiency are to allow algorithms to make better use of past experience collected with past behaviors (known as off-policy data) and to allow algorithms to make better use of simulated data sources. This dissertation investigates the use of such auxiliary data by answering the question, "How can a reinforcement learning agent leverage off-policy and simulated data to evaluate and improve upon the expected performance of a policy?" This dissertation first considers how to directly use off-policy data in reinforcement learning through importance sampling. When used in reinforcement learning, importance sampling is limited by high variance that leads to inaccurate estimates. This dissertation addresses this limitation in two ways. First, this dissertation introduces the behavior policy gradient algorithm that adapts the data collection policy towards a policy that generates data that leads to low variance importance sampling evaluation of a fixed policy. Second, this dissertation introduces the family of regression importance sampling estimators which improve the weighting of already collected off-policy data so as to lower the variance of importance sampling evaluation of a fixed policy. In addition to evaluation of a fixed policy, we apply the behavior policy gradient algorithm and regression importance sampling to batch policy gradient policy improvement. In the case of regression importance sampling, this application leads to the introduction of the sampling error corrected policy gradient estimator that improves the data efficiency of batch policy gradient algorithms. Towards the goal of learning from simulated experience, this dissertation introduces an algorithm  – the grounded action transformation algorithm  – that takes small amounts of real world data and modifies the simulator such that skills learned in simulation are more likely to carry over to the real world. Key to this approach is the idea of local simulator modification  – the simulator is automatically altered to better model the real world for actions the data collection policy would take in states the data collection policy would visit. Local modification necessitates an iterative approach: the simulator is modified, the policy improved, and then more data is collected for further modification. Finally, in addition to examining them each independently, this dissertation also considers the possibility of combining the use of simulated data with importance sampled… Advisors/Committee Members: Stone, Peter, 1971- (advisor), Niekum, Scott (committee member), Krähenbühl, Philipp (committee member), Sutton, Richard (committee member).

Subjects/Keywords: Artificial intelligence; Reinforcement learning; Robotics; Off-policy; Sim-to-real

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

-7411-0398. (2019). Data efficient reinforcement learning with off-policy and simulated data. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/7716

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Chicago Manual of Style (16th Edition):

-7411-0398. “Data efficient reinforcement learning with off-policy and simulated data.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed August 07, 2020. http://dx.doi.org/10.26153/tsw/7716.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

MLA Handbook (7th Edition):

-7411-0398. “Data efficient reinforcement learning with off-policy and simulated data.” 2019. Web. 07 Aug 2020.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Vancouver:

-7411-0398. Data efficient reinforcement learning with off-policy and simulated data. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2020 Aug 07]. Available from: http://dx.doi.org/10.26153/tsw/7716.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Council of Science Editors:

-7411-0398. Data efficient reinforcement learning with off-policy and simulated data. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/7716

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

.