Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

Sorted by: relevance · author · university · dateNew search

You searched for +publisher:"Delft University of Technology" +contributor:("Celemin Paez, Carlos"). Showing records 1 – 3 of 3 total matches.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters


Delft University of Technology

1. Scholten, Jan (author). Deep Reinforcement Learning with Feedback-based Exploration.

Degree: 2019, Delft University of Technology

Deep Reinforcement Learning enables us to control increasingly complex and high-dimensional problems. Modelling and control design is longer required, which paves the way to numerous in- novations, such as optimal control of evermore sophisticated robotic systems, fast and efficient scheduling and logistics, effective personal drug dosing schemes that minimise complications, as well as applications not yet conceived. Yet, this potential is obstructed by the need for vast amounts of data. Without it, deep Reinforcement Learning (RL) cannot work. If we want to advance RL re- search and its applications, a primary concern is to improve this sample efficiency. Otherwise, all potential is restricted to settings where interaction is abundant, whilst this is seldom the case in real-world scenarios. In this thesis we will study binary corrective feedback as a general and intuitive manner to in- corporate human intuition and domain knowledge in model-free machine learning. In accordance with our conclusions drawn from literature, we will present two algorithms, namely Probabilistic Merging of Policies (PMP) and its extension Predictive PMP (PPMP). Both methods estimate the abili- ties of their inbuilt Reinforcement Learning (RL) entity by computing the covariance over multiple output heads of the actor network. Subsequently, the corrections are quantified by comparing the uncertainty in what is learned with the inaccuracy of the given feedback. The resulting new action estimates will immediately be applied as probabilistic conditional exploration. The first algorithm is a surprisingly clean and straightforward way to accelerate an off-policy RL baseline and as well improves on existing work that learns from corrections only. Its extension Predictive Probabilistic Merging of Policies (PPMP) predicts the corrected samples. This gives the most substantial improve- ments, whilst the required feedback is further reduced. We demonstrate our algorithms in combination with Deep Deterministic Policy Gradient (DDPG) on continuous control problems of the OpenAI Gym. We show that the greatest part of the otherwise ignorant learning process is indeed evaded. Moreover, we achieve drastic improvements in final performance, robustness to erroneous feedback and feedback efficiency both for simulated and real human feedback, and show that our method is able to outperform the demonstrator.

Systems and Control

Advisors/Committee Members: Kober, Jens (mentor), Celemin Paez, Carlos (graduation committee), Delft University of Technology (degree granting institution).

Subjects/Keywords: Reinforcement Learning; Artificial Intelligence; Machine Learning; Interactive Learning

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Scholten, J. (. (2019). Deep Reinforcement Learning with Feedback-based Exploration. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:acb6ea20-5e74-457e-84cd-56e33dd72979

Chicago Manual of Style (16th Edition):

Scholten, Jan (author). “Deep Reinforcement Learning with Feedback-based Exploration.” 2019. Masters Thesis, Delft University of Technology. Accessed January 26, 2021. http://resolver.tudelft.nl/uuid:acb6ea20-5e74-457e-84cd-56e33dd72979.

MLA Handbook (7th Edition):

Scholten, Jan (author). “Deep Reinforcement Learning with Feedback-based Exploration.” 2019. Web. 26 Jan 2021.

Vancouver:

Scholten J(. Deep Reinforcement Learning with Feedback-based Exploration. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Jan 26]. Available from: http://resolver.tudelft.nl/uuid:acb6ea20-5e74-457e-84cd-56e33dd72979.

Council of Science Editors:

Scholten J(. Deep Reinforcement Learning with Feedback-based Exploration. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:acb6ea20-5e74-457e-84cd-56e33dd72979


Delft University of Technology

2. Wout, Daan (author). Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework.

Degree: 2019, Delft University of Technology

A prevalent approach for learning a control policy in the model-free domain is by engaging Reinforcement Learning (RL). A well known disadvantage of RL is the necessity for extensive amounts of data for a suitable control policy. For systems that concern physical application, acquiring this vast amount of data might take an extraordinary amount of time. In contrast, humans have shown to be very efficient in detecting a suitable control policy for reference tracking problems. Employing this intuitive knowledge has proven to render model-free learning strategies suitable for physical applications. Recent studies have shown that learning a policy by directive action corrections is a very efficient approach in employing this do-main knowledge. Moreover, feedback based methods do not necessarily require expert knowledge on modelling and control and are therefore more generally applicable. The current state-of-the-art regarding directional feedback was introduced by Celemin and Ruiz-del Solar (2015) and coined COrrective Advice Communicated by Humans (COACH). In this framework the trainer is able to correct the observed actions by providing directive advise for iterative policy updates. However, COACH employs Radial Basis Function (RBF) networks, which limit the capabilities to apply the framework on higher dimensional problems due to an infeasible tuning process.This study introduces Gaussian Process Coach (GPC), an algorithm preserving COACH’s structure, but introducing Gaussian Processes (GPS) as alternative to RBF networks. Moreover, the employment of GPS allows for uncertainty estimation of the policy, which will be used for 1) inquiringhigh-informative feedback samples in an Active Learning (AL) framework, 2) introduce an Adaptive Learning Rate (ALR) that adapts the learning rate to the coarse or refine focused learning phase of the trainer and 3) establish a novel sparsification technique that is specifically designed for iterative GP policy updates. We will show by employing synthesized and human teachers that the novel algorithm outperforms COACH on every domain tested, with the most outspoken difference on higher dimensional problems. Furthermore, we will prove the independent contributions of AL and ALR.

Systems and Control

Advisors/Committee Members: Kober, Jens (mentor), Celemin Paez, Carlos (mentor), Gavrila, Dariu (graduation committee), Delft University of Technology (degree granting institution).

Subjects/Keywords: Machine Learning; Interactive Learning; Gaussian Process; Regression; Feedback

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Wout, D. (. (2019). Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:d6cff61f-8e74-4714-b713-f127c1392b7a

Chicago Manual of Style (16th Edition):

Wout, Daan (author). “Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework.” 2019. Masters Thesis, Delft University of Technology. Accessed January 26, 2021. http://resolver.tudelft.nl/uuid:d6cff61f-8e74-4714-b713-f127c1392b7a.

MLA Handbook (7th Edition):

Wout, Daan (author). “Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework.” 2019. Web. 26 Jan 2021.

Vancouver:

Wout D(. Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework. [Internet] [Masters thesis]. Delft University of Technology; 2019. [cited 2021 Jan 26]. Available from: http://resolver.tudelft.nl/uuid:d6cff61f-8e74-4714-b713-f127c1392b7a.

Council of Science Editors:

Wout D(. Policy Learning with Human Teachers: Using directive feedback in a Gaussian framework. [Masters Thesis]. Delft University of Technology; 2019. Available from: http://resolver.tudelft.nl/uuid:d6cff61f-8e74-4714-b713-f127c1392b7a


Delft University of Technology

3. Khattar, Varun (author). Adaptation of a non-linear controller based on Reinforcement Learning.

Degree: 2018, Delft University of Technology

Closed-loop control systems, which utilize output signals for feedback to generate control inputs, can achieve high performance. However, robustness of feedback control loops can be lost if system changes and uncertainties are too large. Adaptive control combines the traditional feedback structure with providing adaptation mechanisms that adjust a controller for a system with parameter uncertainties by using performance error information on line. Reinforcement learning (RL) is one of the many methods that can be used for adaptive control. The aim of this thesis is to adapt a non-linear Anti-lock Braking System (ABS) controller of a passenger car obtained as a simplified symbolic approximation of the solution to the Bellman equation to model-plant mismatches and process variations. Results for adaptation to dry and wet asphalt have been obtained successfully and have been compared with hand tuned and adaptive proportional-integral (P-I) controllers.

Mechanical Engineering | Vehicle Engineering | Dynamics and Controls

Advisors/Committee Members: Babuska, Robert (mentor), Shyrokau, Barys (graduation committee), Celemin Paez, Carlos (graduation committee), Delft University of Technology (degree granting institution).

Subjects/Keywords: Reinforcement Learning; Adaptive control; Anti-lock braking system

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Khattar, V. (. (2018). Adaptation of a non-linear controller based on Reinforcement Learning. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:e7ee0f2c-91c4-40d7-bd90-4b5ae61dd54f

Chicago Manual of Style (16th Edition):

Khattar, Varun (author). “Adaptation of a non-linear controller based on Reinforcement Learning.” 2018. Masters Thesis, Delft University of Technology. Accessed January 26, 2021. http://resolver.tudelft.nl/uuid:e7ee0f2c-91c4-40d7-bd90-4b5ae61dd54f.

MLA Handbook (7th Edition):

Khattar, Varun (author). “Adaptation of a non-linear controller based on Reinforcement Learning.” 2018. Web. 26 Jan 2021.

Vancouver:

Khattar V(. Adaptation of a non-linear controller based on Reinforcement Learning. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Jan 26]. Available from: http://resolver.tudelft.nl/uuid:e7ee0f2c-91c4-40d7-bd90-4b5ae61dd54f.

Council of Science Editors:

Khattar V(. Adaptation of a non-linear controller based on Reinforcement Learning. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:e7ee0f2c-91c4-40d7-bd90-4b5ae61dd54f

.