Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

Sorted by: relevance · author · university · dateNew search

You searched for subject:(Policy gradient). Showing records 1 – 25 of 25 total matches.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters


University of Alberta

1. Dick, Travis B. Policy Gradient Reinforcement Learning Without Regret.

Degree: MS, Department of Computing Science, 2015, University of Alberta

 This thesis consists of two independent projects, each contributing to a central goal of artificial intelligence research: to build computer systems that are capable of… (more)

Subjects/Keywords: Policy Gradient; Baseline; Reinforcement Learning

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Dick, T. B. (2015). Policy Gradient Reinforcement Learning Without Regret. (Masters Thesis). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/df65vb663

Chicago Manual of Style (16th Edition):

Dick, Travis B. “Policy Gradient Reinforcement Learning Without Regret.” 2015. Masters Thesis, University of Alberta. Accessed January 24, 2021. https://era.library.ualberta.ca/files/df65vb663.

MLA Handbook (7th Edition):

Dick, Travis B. “Policy Gradient Reinforcement Learning Without Regret.” 2015. Web. 24 Jan 2021.

Vancouver:

Dick TB. Policy Gradient Reinforcement Learning Without Regret. [Internet] [Masters thesis]. University of Alberta; 2015. [cited 2021 Jan 24]. Available from: https://era.library.ualberta.ca/files/df65vb663.

Council of Science Editors:

Dick TB. Policy Gradient Reinforcement Learning Without Regret. [Masters Thesis]. University of Alberta; 2015. Available from: https://era.library.ualberta.ca/files/df65vb663


Arizona State University

2. Barron, Trevor Paul. Adaptive Curvature for Stochastic Optimization.

Degree: Computer Science, 2019, Arizona State University

 This thesis presents a family of adaptive curvature methods for gradient-based stochastic optimization. In particular, a general algorithmic framework is introduced along with a practical… (more)

Subjects/Keywords: Statistics; Robotics; Natural gradient descent; Policy gradient methods; Truncated Newton methods

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Barron, T. P. (2019). Adaptive Curvature for Stochastic Optimization. (Masters Thesis). Arizona State University. Retrieved from http://repository.asu.edu/items/53675

Chicago Manual of Style (16th Edition):

Barron, Trevor Paul. “Adaptive Curvature for Stochastic Optimization.” 2019. Masters Thesis, Arizona State University. Accessed January 24, 2021. http://repository.asu.edu/items/53675.

MLA Handbook (7th Edition):

Barron, Trevor Paul. “Adaptive Curvature for Stochastic Optimization.” 2019. Web. 24 Jan 2021.

Vancouver:

Barron TP. Adaptive Curvature for Stochastic Optimization. [Internet] [Masters thesis]. Arizona State University; 2019. [cited 2021 Jan 24]. Available from: http://repository.asu.edu/items/53675.

Council of Science Editors:

Barron TP. Adaptive Curvature for Stochastic Optimization. [Masters Thesis]. Arizona State University; 2019. Available from: http://repository.asu.edu/items/53675


University of Alberta

3. Hackman, Leah M. Faster Gradient-TD Algorithms.

Degree: MS, Department of Computing Science, 2012, University of Alberta

Gradient-TD methods are a new family of learning algorithms that are stable and convergent under a wider range of conditions than previous reinforcement learning algorithms.… (more)

Subjects/Keywords: Aritificial; Off-Policy; Gradient-TD; Intelligence; Reinforcement; Hybrid-GQ; Learning

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Hackman, L. M. (2012). Faster Gradient-TD Algorithms. (Masters Thesis). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/6w924d09v

Chicago Manual of Style (16th Edition):

Hackman, Leah M. “Faster Gradient-TD Algorithms.” 2012. Masters Thesis, University of Alberta. Accessed January 24, 2021. https://era.library.ualberta.ca/files/6w924d09v.

MLA Handbook (7th Edition):

Hackman, Leah M. “Faster Gradient-TD Algorithms.” 2012. Web. 24 Jan 2021.

Vancouver:

Hackman LM. Faster Gradient-TD Algorithms. [Internet] [Masters thesis]. University of Alberta; 2012. [cited 2021 Jan 24]. Available from: https://era.library.ualberta.ca/files/6w924d09v.

Council of Science Editors:

Hackman LM. Faster Gradient-TD Algorithms. [Masters Thesis]. University of Alberta; 2012. Available from: https://era.library.ualberta.ca/files/6w924d09v


University of Alberta

4. Das Gupta, Ujjwal. Adaptive Representation for Policy Gradient.

Degree: MS, Department of Computing Science, 2015, University of Alberta

 Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Methods like policy gradient, that… (more)

Subjects/Keywords: Representation Learning; Decision Trees; Policy Gradient; Reinforcement Learning

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Das Gupta, U. (2015). Adaptive Representation for Policy Gradient. (Masters Thesis). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/zk51vk289

Chicago Manual of Style (16th Edition):

Das Gupta, Ujjwal. “Adaptive Representation for Policy Gradient.” 2015. Masters Thesis, University of Alberta. Accessed January 24, 2021. https://era.library.ualberta.ca/files/zk51vk289.

MLA Handbook (7th Edition):

Das Gupta, Ujjwal. “Adaptive Representation for Policy Gradient.” 2015. Web. 24 Jan 2021.

Vancouver:

Das Gupta U. Adaptive Representation for Policy Gradient. [Internet] [Masters thesis]. University of Alberta; 2015. [cited 2021 Jan 24]. Available from: https://era.library.ualberta.ca/files/zk51vk289.

Council of Science Editors:

Das Gupta U. Adaptive Representation for Policy Gradient. [Masters Thesis]. University of Alberta; 2015. Available from: https://era.library.ualberta.ca/files/zk51vk289


Delft University of Technology

5. Goedhart, Menno (author). Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly.

Degree: 2017, Delft University of Technology

 Flight control of the DelFly is challenging, because of its complex dynamics and variability due to manufacturing inconsistencies. Machine Learning algorithms can be used to… (more)

Subjects/Keywords: Reinforcement Learning; DelFly; Flapping Wing; Classification; Machine Learning; Policy Gradient

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Goedhart, M. (. (2017). Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:0f103559-d985-47b7-b145-fc814527f307

Chicago Manual of Style (16th Edition):

Goedhart, Menno (author). “Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly.” 2017. Masters Thesis, Delft University of Technology. Accessed January 24, 2021. http://resolver.tudelft.nl/uuid:0f103559-d985-47b7-b145-fc814527f307.

MLA Handbook (7th Edition):

Goedhart, Menno (author). “Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly.” 2017. Web. 24 Jan 2021.

Vancouver:

Goedhart M(. Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly. [Internet] [Masters thesis]. Delft University of Technology; 2017. [cited 2021 Jan 24]. Available from: http://resolver.tudelft.nl/uuid:0f103559-d985-47b7-b145-fc814527f307.

Council of Science Editors:

Goedhart M(. Intelligent Flapping Wing Control: Reinforcement Learning for the DelFly. [Masters Thesis]. Delft University of Technology; 2017. Available from: http://resolver.tudelft.nl/uuid:0f103559-d985-47b7-b145-fc814527f307

6. Poulin, Nolan. Proactive Planning through Active Policy Inference in Stochastic Environments.

Degree: MS, 2018, Worcester Polytechnic Institute

  In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable… (more)

Subjects/Keywords: active learning markov decision process softmax boltzmann policy gradient

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Poulin, N. (2018). Proactive Planning through Active Policy Inference in Stochastic Environments. (Thesis). Worcester Polytechnic Institute. Retrieved from etd-052818-100711 ; https://digitalcommons.wpi.edu/etd-theses/1267

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Poulin, Nolan. “Proactive Planning through Active Policy Inference in Stochastic Environments.” 2018. Thesis, Worcester Polytechnic Institute. Accessed January 24, 2021. etd-052818-100711 ; https://digitalcommons.wpi.edu/etd-theses/1267.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Poulin, Nolan. “Proactive Planning through Active Policy Inference in Stochastic Environments.” 2018. Web. 24 Jan 2021.

Vancouver:

Poulin N. Proactive Planning through Active Policy Inference in Stochastic Environments. [Internet] [Thesis]. Worcester Polytechnic Institute; 2018. [cited 2021 Jan 24]. Available from: etd-052818-100711 ; https://digitalcommons.wpi.edu/etd-theses/1267.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Poulin N. Proactive Planning through Active Policy Inference in Stochastic Environments. [Thesis]. Worcester Polytechnic Institute; 2018. Available from: etd-052818-100711 ; https://digitalcommons.wpi.edu/etd-theses/1267

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

7. Könönen, Ville. Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches.

Degree: 2004, Helsinki University of Technology

Modern computing systems are distributed, large, and heterogeneous. Computers, other information processing devices and humans are very tightly connected with each other and therefore it… (more)

Subjects/Keywords: Markov games; reinforcement learning; Nash equilibrium; Stackelberg equilibrium; value function approximation; policy gradient

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Könönen, V. (2004). Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches. (Thesis). Helsinki University of Technology. Retrieved from http://lib.tkk.fi/Diss/2004/isbn9512273594/

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Könönen, Ville. “Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches.” 2004. Thesis, Helsinki University of Technology. Accessed January 24, 2021. http://lib.tkk.fi/Diss/2004/isbn9512273594/.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Könönen, Ville. “Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches.” 2004. Web. 24 Jan 2021.

Vancouver:

Könönen V. Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches. [Internet] [Thesis]. Helsinki University of Technology; 2004. [cited 2021 Jan 24]. Available from: http://lib.tkk.fi/Diss/2004/isbn9512273594/.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Könönen V. Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches. [Thesis]. Helsinki University of Technology; 2004. Available from: http://lib.tkk.fi/Diss/2004/isbn9512273594/

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


University of Alberta

8. Maei, Hamid Reza. Gradient Temporal-Difference Learning Algorithms.

Degree: PhD, Department of Computing Science, 2011, University of Alberta

 We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales… (more)

Subjects/Keywords: Policy Evaluation; Temporal-Difference learning; Reinforcement Learning; Stochastic Gradient-Descent; Value Function Approximation

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Maei, H. R. (2011). Gradient Temporal-Difference Learning Algorithms. (Doctoral Dissertation). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/8s45q967t

Chicago Manual of Style (16th Edition):

Maei, Hamid Reza. “Gradient Temporal-Difference Learning Algorithms.” 2011. Doctoral Dissertation, University of Alberta. Accessed January 24, 2021. https://era.library.ualberta.ca/files/8s45q967t.

MLA Handbook (7th Edition):

Maei, Hamid Reza. “Gradient Temporal-Difference Learning Algorithms.” 2011. Web. 24 Jan 2021.

Vancouver:

Maei HR. Gradient Temporal-Difference Learning Algorithms. [Internet] [Doctoral dissertation]. University of Alberta; 2011. [cited 2021 Jan 24]. Available from: https://era.library.ualberta.ca/files/8s45q967t.

Council of Science Editors:

Maei HR. Gradient Temporal-Difference Learning Algorithms. [Doctoral Dissertation]. University of Alberta; 2011. Available from: https://era.library.ualberta.ca/files/8s45q967t


University of Washington

9. Deole, Aditya. Model Free Optimal Control Approach for UAVs.

Degree: 2020, University of Washington

 This thesis discusses use of model free control algorithms for application on an UAV. The work contrasts use of LQR based methods to conventional model… (more)

Subjects/Keywords: LQR; Model-free; Policy gradient; Q-learning; UAV; Mechanical engineering; Mechanical engineering

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Deole, A. (2020). Model Free Optimal Control Approach for UAVs. (Thesis). University of Washington. Retrieved from http://hdl.handle.net/1773/46114

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Deole, Aditya. “Model Free Optimal Control Approach for UAVs.” 2020. Thesis, University of Washington. Accessed January 24, 2021. http://hdl.handle.net/1773/46114.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Deole, Aditya. “Model Free Optimal Control Approach for UAVs.” 2020. Web. 24 Jan 2021.

Vancouver:

Deole A. Model Free Optimal Control Approach for UAVs. [Internet] [Thesis]. University of Washington; 2020. [cited 2021 Jan 24]. Available from: http://hdl.handle.net/1773/46114.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Deole A. Model Free Optimal Control Approach for UAVs. [Thesis]. University of Washington; 2020. Available from: http://hdl.handle.net/1773/46114

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


University of North Texas

10. Cox, Carissa. Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area.

Degree: 2011, University of North Texas

 Land use regulations are typically established as a response to development activity. For effective growth management and habitat preservation, the opposite should occur. This study… (more)

Subjects/Keywords: Tree preservation; urban rural gradient; hot spot rendering; preservation policy; DFW; ordinances; internal evaluation

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Cox, C. (2011). Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area. (Thesis). University of North Texas. Retrieved from https://digital.library.unt.edu/ark:/67531/metadc84194/

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Cox, Carissa. “Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area.” 2011. Thesis, University of North Texas. Accessed January 24, 2021. https://digital.library.unt.edu/ark:/67531/metadc84194/.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Cox, Carissa. “Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area.” 2011. Web. 24 Jan 2021.

Vancouver:

Cox C. Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area. [Internet] [Thesis]. University of North Texas; 2011. [cited 2021 Jan 24]. Available from: https://digital.library.unt.edu/ark:/67531/metadc84194/.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Cox C. Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area. [Thesis]. University of North Texas; 2011. Available from: https://digital.library.unt.edu/ark:/67531/metadc84194/

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


KTH

11. Olafsson, Björgvin. Partially Observable Markov Decision Processes for Faster Object Recognition.

Degree: Computer Science and Communication (CSC), 2016, KTH

  Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search… (more)

Subjects/Keywords: pomdp; policy gradient; optimal control; object detection; computer vision; information rewards; fixation policy; observation model; Computer Sciences; Datavetenskap (datalogi)

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Olafsson, B. (2016). Partially Observable Markov Decision Processes for Faster Object Recognition. (Thesis). KTH. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Olafsson, Björgvin. “Partially Observable Markov Decision Processes for Faster Object Recognition.” 2016. Thesis, KTH. Accessed January 24, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Olafsson, Björgvin. “Partially Observable Markov Decision Processes for Faster Object Recognition.” 2016. Web. 24 Jan 2021.

Vancouver:

Olafsson B. Partially Observable Markov Decision Processes for Faster Object Recognition. [Internet] [Thesis]. KTH; 2016. [cited 2021 Jan 24]. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Olafsson B. Partially Observable Markov Decision Processes for Faster Object Recognition. [Thesis]. KTH; 2016. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


University of Alberta

12. Bastani, Meysam. Model-Free Intelligent Diabetes Management Using Machine Learning.

Degree: MS, Department of Computing Science, 2013, University of Alberta

 Each patient with Type-1 diabetes must decide how much insulin to inject before each meal to maintain an acceptable level of blood glucose. The actual… (more)

Subjects/Keywords: reinforcement learning; policy gradient; diabetes; insulin dosage adjustment; supervised learning; machine learning; type-1 diabetes; actor-critic

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Bastani, M. (2013). Model-Free Intelligent Diabetes Management Using Machine Learning. (Masters Thesis). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/c12579s341

Chicago Manual of Style (16th Edition):

Bastani, Meysam. “Model-Free Intelligent Diabetes Management Using Machine Learning.” 2013. Masters Thesis, University of Alberta. Accessed January 24, 2021. https://era.library.ualberta.ca/files/c12579s341.

MLA Handbook (7th Edition):

Bastani, Meysam. “Model-Free Intelligent Diabetes Management Using Machine Learning.” 2013. Web. 24 Jan 2021.

Vancouver:

Bastani M. Model-Free Intelligent Diabetes Management Using Machine Learning. [Internet] [Masters thesis]. University of Alberta; 2013. [cited 2021 Jan 24]. Available from: https://era.library.ualberta.ca/files/c12579s341.

Council of Science Editors:

Bastani M. Model-Free Intelligent Diabetes Management Using Machine Learning. [Masters Thesis]. University of Alberta; 2013. Available from: https://era.library.ualberta.ca/files/c12579s341


Halmstad University

13. Olsson, Anton. Domain Transfer for End-to-end Reinforcement Learning.

Degree: Information Technology, 2020, Halmstad University

  In this master thesis project a LiDAR-based, depth image-based and semantic segmentation image-based reinforcement learning agent is investigated and compared forlearning in simulation and… (more)

Subjects/Keywords: Reinforcement Learning; Domain Transfer; Deep Deterministic Policy Gradient; Reinforcement Learning in Real-time; Computer Sciences; Datavetenskap (datalogi); Computer Engineering; Datorteknik

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Olsson, A. (2020). Domain Transfer for End-to-end Reinforcement Learning. (Thesis). Halmstad University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Olsson, Anton. “Domain Transfer for End-to-end Reinforcement Learning.” 2020. Thesis, Halmstad University. Accessed January 24, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Olsson, Anton. “Domain Transfer for End-to-end Reinforcement Learning.” 2020. Web. 24 Jan 2021.

Vancouver:

Olsson A. Domain Transfer for End-to-end Reinforcement Learning. [Internet] [Thesis]. Halmstad University; 2020. [cited 2021 Jan 24]. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Olsson A. Domain Transfer for End-to-end Reinforcement Learning. [Thesis]. Halmstad University; 2020. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


University of Waterloo

14. Pereira, Sahil. Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments.

Degree: 2020, University of Waterloo

 This thesis explores the application of multi-agent reinforcement learning in domains containing asymmetries between agents, caused by differences in information and position, resulting in a… (more)

Subjects/Keywords: reinforcement learning; multi-agent; stackelberg model; hierarchical environments; game theory; machine learning; continuous space; policy gradient; markov games; actor critic

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Pereira, S. (2020). Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments. (Thesis). University of Waterloo. Retrieved from http://hdl.handle.net/10012/15851

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Pereira, Sahil. “Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments.” 2020. Thesis, University of Waterloo. Accessed January 24, 2021. http://hdl.handle.net/10012/15851.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Pereira, Sahil. “Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments.” 2020. Web. 24 Jan 2021.

Vancouver:

Pereira S. Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments. [Internet] [Thesis]. University of Waterloo; 2020. [cited 2021 Jan 24]. Available from: http://hdl.handle.net/10012/15851.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Pereira S. Stackelberg Multi-Agent Reinforcement Learning for Hierarchical Environments. [Thesis]. University of Waterloo; 2020. Available from: http://hdl.handle.net/10012/15851

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


Australian National University

15. Aberdeen, Douglas. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes .

Degree: 2003, Australian National University

 Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car,… (more)

Subjects/Keywords: POMDP; Reinforcement Learning; Policy gradient; cluster; high performance computing

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Aberdeen, D. (2003). Policy-Gradient Algorithms for Partially Observable Markov Decision Processes . (Thesis). Australian National University. Retrieved from http://hdl.handle.net/1885/48180

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Aberdeen, Douglas. “Policy-Gradient Algorithms for Partially Observable Markov Decision Processes .” 2003. Thesis, Australian National University. Accessed January 24, 2021. http://hdl.handle.net/1885/48180.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Aberdeen, Douglas. “Policy-Gradient Algorithms for Partially Observable Markov Decision Processes .” 2003. Web. 24 Jan 2021.

Vancouver:

Aberdeen D. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes . [Internet] [Thesis]. Australian National University; 2003. [cited 2021 Jan 24]. Available from: http://hdl.handle.net/1885/48180.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Aberdeen D. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes . [Thesis]. Australian National University; 2003. Available from: http://hdl.handle.net/1885/48180

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


Australian National University

16. Greensmith, Evan. Policy Gradient Methods: Variance Reduction and Stochastic Convergence .

Degree: 2005, Australian National University

 In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient(more)

Subjects/Keywords: reinforcement learning; policy gradient; stochastic convergence; variance reduction

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Greensmith, E. (2005). Policy Gradient Methods: Variance Reduction and Stochastic Convergence . (Thesis). Australian National University. Retrieved from http://hdl.handle.net/1885/47105

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Greensmith, Evan. “Policy Gradient Methods: Variance Reduction and Stochastic Convergence .” 2005. Thesis, Australian National University. Accessed January 24, 2021. http://hdl.handle.net/1885/47105.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Greensmith, Evan. “Policy Gradient Methods: Variance Reduction and Stochastic Convergence .” 2005. Web. 24 Jan 2021.

Vancouver:

Greensmith E. Policy Gradient Methods: Variance Reduction and Stochastic Convergence . [Internet] [Thesis]. Australian National University; 2005. [cited 2021 Jan 24]. Available from: http://hdl.handle.net/1885/47105.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Greensmith E. Policy Gradient Methods: Variance Reduction and Stochastic Convergence . [Thesis]. Australian National University; 2005. Available from: http://hdl.handle.net/1885/47105

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation


University of South Florida

17. Michaud, Brianna. A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution.

Degree: 2016, University of South Florida

 Between 1988 and 2014, otter trawls, seine nets, and plankton nets were deployed along the salinity gradients of 18 estuaries by the University of South… (more)

Subjects/Keywords: multivariate analysis; community; estuarine management; estuarine gradient; Natural Resources Management and Policy; Other Oceanography and Atmospheric Sciences and Meteorology

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Michaud, B. (2016). A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution. (Thesis). University of South Florida. Retrieved from https://scholarcommons.usf.edu/etd/6543

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Michaud, Brianna. “A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution.” 2016. Thesis, University of South Florida. Accessed January 24, 2021. https://scholarcommons.usf.edu/etd/6543.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Michaud, Brianna. “A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution.” 2016. Web. 24 Jan 2021.

Vancouver:

Michaud B. A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution. [Internet] [Thesis]. University of South Florida; 2016. [cited 2021 Jan 24]. Available from: https://scholarcommons.usf.edu/etd/6543.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Michaud B. A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution. [Thesis]. University of South Florida; 2016. Available from: https://scholarcommons.usf.edu/etd/6543

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

18. Aklil, Nassim. Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment.

Degree: Docteur es, Intelligence Artificielle et Robotique, 2017, Université Pierre et Marie Curie – Paris VI

La prise de décision est un domaine très étudié en sciences, que ce soit en neurosciences pour comprendre les processus sous tendant la prise de… (more)

Subjects/Keywords: Apprentissage par renforcement; Apprentissage budgétisé; Apprentissage profond; Neurosciences computationnelles; Compromis exploration/exploitation; Policy gradient; Budgeted learning; Computational neuroscience; Deep learning; 629.89

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Aklil, N. (2017). Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment. (Doctoral Dissertation). Université Pierre et Marie Curie – Paris VI. Retrieved from http://www.theses.fr/2017PA066225

Chicago Manual of Style (16th Edition):

Aklil, Nassim. “Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment.” 2017. Doctoral Dissertation, Université Pierre et Marie Curie – Paris VI. Accessed January 24, 2021. http://www.theses.fr/2017PA066225.

MLA Handbook (7th Edition):

Aklil, Nassim. “Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment.” 2017. Web. 24 Jan 2021.

Vancouver:

Aklil N. Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment. [Internet] [Doctoral dissertation]. Université Pierre et Marie Curie – Paris VI; 2017. [cited 2021 Jan 24]. Available from: http://www.theses.fr/2017PA066225.

Council of Science Editors:

Aklil N. Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire : Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment. [Doctoral Dissertation]. Université Pierre et Marie Curie – Paris VI; 2017. Available from: http://www.theses.fr/2017PA066225


Cal Poly

19. McDowell, Journey. Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks.

Degree: MS, Mechanical Engineering, 2019, Cal Poly

  Two controller performances are assessed for generalization in the path following task of autonomously backing up a tractor-trailer. Starting from random locations and orientations,… (more)

Subjects/Keywords: Linear Quadratic Regulator; Deep Deterministic Policy Gradient; LQR; DDPG; Autonomous Vehicle; Machine Learning; Navigation, Guidance, Control, and Dynamics

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

McDowell, J. (2019). Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks. (Masters Thesis). Cal Poly. Retrieved from https://digitalcommons.calpoly.edu/theses/2100 ; 10.15368/theses.2019.117

Chicago Manual of Style (16th Edition):

McDowell, Journey. “Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks.” 2019. Masters Thesis, Cal Poly. Accessed January 24, 2021. https://digitalcommons.calpoly.edu/theses/2100 ; 10.15368/theses.2019.117.

MLA Handbook (7th Edition):

McDowell, Journey. “Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks.” 2019. Web. 24 Jan 2021.

Vancouver:

McDowell J. Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks. [Internet] [Masters thesis]. Cal Poly; 2019. [cited 2021 Jan 24]. Available from: https://digitalcommons.calpoly.edu/theses/2100 ; 10.15368/theses.2019.117.

Council of Science Editors:

McDowell J. Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks. [Masters Thesis]. Cal Poly; 2019. Available from: https://digitalcommons.calpoly.edu/theses/2100 ; 10.15368/theses.2019.117

20. Bhojraj, Gokul Kaisaravalli. Policy-based Reinforcement learning control for window opening and closing in an office building.

Degree: Microdata Analysis, 2020, Dalarna University

  The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will… (more)

Subjects/Keywords: Markov decision processes; Policy-based Reinforcement learning; Value-based Reinforcement learning; Q-learning; REINFORCE; policy gradient; window control; indoor comfort level; Social Sciences; Samhällsvetenskap

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Bhojraj, G. K. (2020). Policy-based Reinforcement learning control for window opening and closing in an office building. (Thesis). Dalarna University. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Bhojraj, Gokul Kaisaravalli. “Policy-based Reinforcement learning control for window opening and closing in an office building.” 2020. Thesis, Dalarna University. Accessed January 24, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Bhojraj, Gokul Kaisaravalli. “Policy-based Reinforcement learning control for window opening and closing in an office building.” 2020. Web. 24 Jan 2021.

Vancouver:

Bhojraj GK. Policy-based Reinforcement learning control for window opening and closing in an office building. [Internet] [Thesis]. Dalarna University; 2020. [cited 2021 Jan 24]. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Bhojraj GK. Policy-based Reinforcement learning control for window opening and closing in an office building. [Thesis]. Dalarna University; 2020. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

21. -2178-1988. Program analysis techniques for algorithmic complexity and relational properties.

Degree: PhD, Computer Science, 2019, University of Texas – Austin

 Analyzing standard safety properties of a given program has traditionally been the primary focus of the program analysis community. Unfortunately, there are still many interesting… (more)

Subjects/Keywords: Complexity testing; Optimal program synthesis; Fuzzing; Genetic programming; Performance bug; Vulnerability detection; Side channel; Static analysis; Relational verification; Reinforcement learning; Policy gradient

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

-2178-1988. (2019). Program analysis techniques for algorithmic complexity and relational properties. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/2181

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Chicago Manual of Style (16th Edition):

-2178-1988. “Program analysis techniques for algorithmic complexity and relational properties.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed January 24, 2021. http://dx.doi.org/10.26153/tsw/2181.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

MLA Handbook (7th Edition):

-2178-1988. “Program analysis techniques for algorithmic complexity and relational properties.” 2019. Web. 24 Jan 2021.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Vancouver:

-2178-1988. Program analysis techniques for algorithmic complexity and relational properties. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2021 Jan 24]. Available from: http://dx.doi.org/10.26153/tsw/2181.

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Council of Science Editors:

-2178-1988. Program analysis techniques for algorithmic complexity and relational properties. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/2181

Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

22. 小倉, 裕平. 運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究.

Degree: Japan Advanced Institute of Science and Technology / 北陸先端科学技術大学院大学

Supervisor:Ho Bao Tu

先端科学技術研究科

修士(知識科学)

Subjects/Keywords: 強化学習; Reinforcement Learning; ランニング経路推薦; Running Route Recommendation; 方策勾配法; Policy Gradient

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

小倉, . (n.d.). 運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究. (Thesis). Japan Advanced Institute of Science and Technology / 北陸先端科学技術大学院大学. Retrieved from http://hdl.handle.net/10119/15138

Note: this citation may be lacking information needed for this citation format:
No year of publication.
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

小倉, 裕平. “運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究.” Thesis, Japan Advanced Institute of Science and Technology / 北陸先端科学技術大学院大学. Accessed January 24, 2021. http://hdl.handle.net/10119/15138.

Note: this citation may be lacking information needed for this citation format:
No year of publication.
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

小倉, 裕平. “運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究.” Web. 24 Jan 2021.

Note: this citation may be lacking information needed for this citation format:
No year of publication.

Vancouver:

小倉 . 運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究. [Internet] [Thesis]. Japan Advanced Institute of Science and Technology / 北陸先端科学技術大学院大学; [cited 2021 Jan 24]. Available from: http://hdl.handle.net/10119/15138.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
No year of publication.

Council of Science Editors:

小倉 . 運動者に対するランニング経路推薦のための方策勾配法に基づくランニング経路生成方法の研究. [Thesis]. Japan Advanced Institute of Science and Technology / 北陸先端科学技術大学院大学; Available from: http://hdl.handle.net/10119/15138

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
No year of publication.

23. Van der Spek, I.T. (author). Imitation learning for a robotic precision placement task.

Degree: 2014, Delft University of Technology

In industrial environments robots are used for various tasks. At this moment it is not feasible for companies to deploy robots for productions with a… (more)

Subjects/Keywords: reinforcement learning; imitation learning; policy gradient; pgpe; dynamic movement primitives; precision placement; dmp; optimization

…third method is the use of a policy gradient method: Policy Gradients with Parameter based… …contributions to the existing work. The first contribution of this thesis is the policy gradient… …algorithm which has not been combined with imitation learning before. The policy gradient… …used for encoding trajectories. Subsequently in Chapter 4 the policy gradient algorithm is… …gradient approaches based on likelihood-ratio estimation, 2) policy updates inspired by… 

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Van der Spek, I. T. (. (2014). Imitation learning for a robotic precision placement task. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:86815e55-bbba-45b4-915b-6f321b485940

Chicago Manual of Style (16th Edition):

Van der Spek, I T (author). “Imitation learning for a robotic precision placement task.” 2014. Masters Thesis, Delft University of Technology. Accessed January 24, 2021. http://resolver.tudelft.nl/uuid:86815e55-bbba-45b4-915b-6f321b485940.

MLA Handbook (7th Edition):

Van der Spek, I T (author). “Imitation learning for a robotic precision placement task.” 2014. Web. 24 Jan 2021.

Vancouver:

Van der Spek IT(. Imitation learning for a robotic precision placement task. [Internet] [Masters thesis]. Delft University of Technology; 2014. [cited 2021 Jan 24]. Available from: http://resolver.tudelft.nl/uuid:86815e55-bbba-45b4-915b-6f321b485940.

Council of Science Editors:

Van der Spek IT(. Imitation learning for a robotic precision placement task. [Masters Thesis]. Delft University of Technology; 2014. Available from: http://resolver.tudelft.nl/uuid:86815e55-bbba-45b4-915b-6f321b485940

24. Masood, Muhammad Arjumand. Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning.

Degree: PhD, 2019, Harvard University

Machine Learning problems often admit a solution space that is not unique. When multiple feasible solutions exist, picking from a diverse, representative set may lead… (more)

Subjects/Keywords: machine learning; NMF; non-negative matrix factorization; reinforcement learning; policy gradient; Stein discrepancy

…augmenting existing state of the art policy gradient methods for RL with a diversity-inducing term… …NMF) as well as an extension to on-policy and off-policy Reinforcement learning (… …Learning). The goal in Reinforcement Learning is to learn a policy (action selection… …This technique requires direct access to the environment (on-policy RL setting) in… …order to simulate the performance of the policy being optimized. In Chapter 7, we introduce a… 

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Masood, M. A. (2019). Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning. (Doctoral Dissertation). Harvard University. Retrieved from http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029756

Chicago Manual of Style (16th Edition):

Masood, Muhammad Arjumand. “Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning.” 2019. Doctoral Dissertation, Harvard University. Accessed January 24, 2021. http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029756.

MLA Handbook (7th Edition):

Masood, Muhammad Arjumand. “Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning.” 2019. Web. 24 Jan 2021.

Vancouver:

Masood MA. Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning. [Internet] [Doctoral dissertation]. Harvard University; 2019. [cited 2021 Jan 24]. Available from: http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029756.

Council of Science Editors:

Masood MA. Algorithms for Discovering Collections of High-Quality and Diverse Solutions, With Applications to Bayesian Non-Negative Matrix Factorization and Reinforcement Learning. [Doctoral Dissertation]. Harvard University; 2019. Available from: http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029756

25. Liu, Stewart. Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization.

Degree: Industrial Engineering & Operations Research, 2017, University of California – Berkeley

 In initial work, we found a version of Retrospective Optimization, in which we optimize over a single randomly generated long sample path, is often effective… (more)

Subjects/Keywords: Operations research; Data-Driven Optimization; Gradient Search; Multi-Echelon Supply Chain; Retrospective Optimization; State-Dependent Inventory Policy; Supply Chain Optimization

…for Hybrid Retrospective Optimization and Gradient Search In general, we seek the policy… …Per Period, Normal(32, 8) . . . . . . . . . . Gradient Search : Init = 30 period… …Gradient Search Gap . . . . . . . . . . . . . . . . Stochastic Lead Times, σ = 0.3… …Effects of Gradient Estimation Step Sizes . . . . . Tradeoff between Step Size and Starting… …approximate solution algorithm that utilizes a combination of traditional MILP solver and gradient… 

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Liu, S. (2017). Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization. (Thesis). University of California – Berkeley. Retrieved from http://www.escholarship.org/uc/item/3n52p4zj

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Liu, Stewart. “Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization.” 2017. Thesis, University of California – Berkeley. Accessed January 24, 2021. http://www.escholarship.org/uc/item/3n52p4zj.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Liu, Stewart. “Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization.” 2017. Web. 24 Jan 2021.

Vancouver:

Liu S. Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization. [Internet] [Thesis]. University of California – Berkeley; 2017. [cited 2021 Jan 24]. Available from: http://www.escholarship.org/uc/item/3n52p4zj.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Liu S. Combining Retrospective Optimization and Gradient Search for Supply Chain Optimization. [Thesis]. University of California – Berkeley; 2017. Available from: http://www.escholarship.org/uc/item/3n52p4zj

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

.