Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

You searched for +publisher:"University of Texas – Austin" +contributor:("Levine, Sergey"). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters

1. Mahjourian, Reza. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play.

Degree: PhD, Computer Science, 2019, University of Texas – Austin

Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation. Advisors/Committee Members: Miikkulainen, Risto (advisor), Levine, Sergey (committee member), Sentis, Luis (committee member), Niekum, Scott (committee member), Mok, Aloysius (committee member).

Subjects/Keywords: Robotics; Table tennis; Self-play; Reinforcement learning; Hierarchical policy

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Mahjourian, R. (2019). Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/72812

Chicago Manual of Style (16th Edition):

Mahjourian, Reza. “Hierarchical policy design for sample-efficient learning of robot table tennis through self-play.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed August 14, 2020. http://hdl.handle.net/2152/72812.

MLA Handbook (7th Edition):

Mahjourian, Reza. “Hierarchical policy design for sample-efficient learning of robot table tennis through self-play.” 2019. Web. 14 Aug 2020.

Vancouver:

Mahjourian R. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2020 Aug 14]. Available from: http://hdl.handle.net/2152/72812.

Council of Science Editors:

Mahjourian R. Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://hdl.handle.net/2152/72812

.