You searched for subject:(Multi Armed Bandits)
.
Showing records 1 – 30 of
46 total matches.
◁ [1] [2] ▶

University of Illinois – Urbana-Champaign
1.
Jiang, Chong.
Parametrized Stochastic Multi-armed Bandits with Binary Rewards.
Degree: MS, 1200, 2011, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/18352
► In this thesis, we consider the problem of multi-armed bandits with a large number of correlated arms. We assume that the arms have Bernoulli distributed…
(more)
▼ In this thesis, we consider the problem of
multi-
armed bandits with a
large number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across arms
and across time, where the probabilities of success are parametrized by known
attribute vectors for each arm, as well as an unknown preference vector.
For this model, we seek an algorithm with a total regret that
is sub-linear in time and independent of the number of arms. We present
such an algorithm, which we call the Three-phase Algorithm, and analyze
its performance. We show an upper bound on the total regret which applies uniformly in time.
The asymptotics of this bound show that for any f ∈ ω(log(T)), the total
regret can be made to be O(f(T)), independent of the number of arms.
Advisors/Committee Members: Srikant, Rayadurgam (advisor).
Subjects/Keywords: multi-armed bandits; machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jiang, C. (2011). Parametrized Stochastic Multi-armed Bandits with Binary Rewards. (Thesis). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/18352
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Jiang, Chong. “Parametrized Stochastic Multi-armed Bandits with Binary Rewards.” 2011. Thesis, University of Illinois – Urbana-Champaign. Accessed April 14, 2021.
http://hdl.handle.net/2142/18352.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Jiang, Chong. “Parametrized Stochastic Multi-armed Bandits with Binary Rewards.” 2011. Web. 14 Apr 2021.
Vancouver:
Jiang C. Parametrized Stochastic Multi-armed Bandits with Binary Rewards. [Internet] [Thesis]. University of Illinois – Urbana-Champaign; 2011. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/2142/18352.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Jiang C. Parametrized Stochastic Multi-armed Bandits with Binary Rewards. [Thesis]. University of Illinois – Urbana-Champaign; 2011. Available from: http://hdl.handle.net/2142/18352
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Illinois – Urbana-Champaign
2.
Jiang, Chong.
Online advertisements and multi-armed bandits.
Degree: PhD, Electrical & Computer Engr, 2015, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/78369
► We investigate a number of multi-armed bandit problems that model different aspects of online advertising, beginning with a survey of the key techniques that are…
(more)
▼ We investigate a number of
multi-
armed bandit problems that model different aspects of online advertising, beginning with a survey of the key techniques that are commonly used to demonstrate the theoretical limitations and achievable results for the performance of
multi-
armed bandit algorithms. We then formulate variations of the basic stochastic
multi-
armed bandit problem, aimed at modeling how budget-limited advertisers should bid and how ad exchanges should choose whose ad to display, and study them using these techniques.
We first consider online ad auctions from the point of view of a single advertiser who has an average budget constraint. By modeling the rest of the bidders through a probability distribution (often referred to as the mean-field approximation), we develop a simple bidding strategy which can be implemented without any statistical knowledge of bids, valuations, and query arrival processes. The key idea is to use stochastic approximation techniques to automatically track long-term averages.
Next, we consider
multi-
armed bandits with budgets, modeling how ad exchanges select which ad to display. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged each time their ad is shown, and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms.
Finally, we consider the problem of
multi-
armed bandits with a large, possibly infinite number of correlated arms, modeling a retailer advertising a large number of related items. We assume that the arms have Bernoulli distributed rewards, where the probabilities of success are parametrized by known attribute vectors for each arm and an unknown vector which describes the preferences of the target audience. For this model, we seek an algorithm with a total regret that is sub-linear in time and independent of the number of arms. We present such an algorithm and analyze its performance, showing upper bounds on the total regret which apply uniformly in time, for both the finite and infinite arm cases.
Advisors/Committee Members: Srikant, R. (advisor), Srikant, R. (Committee Chair), Beck, Carolyn (committee member), Nedich, Angelia (committee member), Veeravalli, Venugopal V. (committee member).
Subjects/Keywords: Multi-armed bandits; online advertisements; reinforcement learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jiang, C. (2015). Online advertisements and multi-armed bandits. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/78369
Chicago Manual of Style (16th Edition):
Jiang, Chong. “Online advertisements and multi-armed bandits.” 2015. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed April 14, 2021.
http://hdl.handle.net/2142/78369.
MLA Handbook (7th Edition):
Jiang, Chong. “Online advertisements and multi-armed bandits.” 2015. Web. 14 Apr 2021.
Vancouver:
Jiang C. Online advertisements and multi-armed bandits. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2015. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/2142/78369.
Council of Science Editors:
Jiang C. Online advertisements and multi-armed bandits. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2015. Available from: http://hdl.handle.net/2142/78369
3.
Gajane, Pratik.
Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle.
Degree: Docteur es, Informatique, 2017, Lille 3
URL: http://www.theses.fr/2017LIL30045
► Dans cette thèse, nous étudions des problèmes de prise de décisions séquentielles dans lesquels, pour chacune de ses décisions, l'apprenant reçoit une information qu'il utilise…
(more)
▼ Dans cette thèse, nous étudions des problèmes de prise de décisions séquentielles dans lesquels, pour chacune de ses décisions, l'apprenant reçoit une information qu'il utilise pour guider ses décisions futures. Pour aller au-delà du retour d’information conventionnel tel qu'il a été bien étudié pour des problèmes de prise de décision séquentielle tels que les bandits multi-bras, nous considérons des formes de retour d’information partielle motivées par des applications pratiques.En premier, nous considérons le problème des bandits duellistes, dans lequel l'apprenant sélectionne deux actions à chaque pas de temps et reçoit en retour une information relative (i.e. de préférence) entre les valeurs instantanées de ces deux actions.En particulier, nous proposons un algorithme optimal qui permet à l'apprenant d'obtenir un regret cumulatif quasi-optimal (le regret est la différence entre la récompense cumulative optimale et la récompense cumulative constatée de l’apprenant). Dans un second temps, nous considérons le problème des bandits corrompus, dans lequel un processus de corruption stochastique perturbe le retour d’information. Pour ce problème aussi, nous concevons des algorithmes pour obtenir un regret cumulatif asymptotiquement optimal. En outre, nous examinons la relation entre ces deux problèmes dans le cadre du monitoring partiel qui est un paradigme générique pour la prise de décision séquentielle avec retour d'information partielle.
The multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation trade-off inherent to reinforcement learning, in which the learner chooses an action (symbolized by an arm) from a set of available actions in a sequence of trials in order to maximize their reward. In the classical MAB problem, the learner receives absolute bandit feedback i.e. it receives as feedback the reward of the arm it selects. In many practical situations however, different kind of feedback is more readily available. In this thesis, we study two of such kinds of feedbacks, namely, relative feedback and corrupt feedback.The main practical motivation behind relative feedback arises from the task of online ranker evaluation. This task involves choosing the optimal ranker from a finite set of rankers using only pairwise comparisons, while minimizing the comparisons between sub-optimal rankers. This is formalized by the MAB problem with relative feedback, in which the learner selects two arms instead of one and receives the preference feedback. We consider the adversarial formulation of this problem which circumvents the stationarity assumption over the mean rewards for the arms. We provide a lower bound on the performance measure for any algorithm for this problem. We also provide an algorithm called "Relative Exponential-weight algorithm for Exploration and Exploitation" with performance guarantees. We present a thorough empirical study on several information retrieval datasets that confirm the validity of these theoretical results.The motivating theme behind corrupt feedback…
Advisors/Committee Members: Preux, Philippe (thesis director).
Subjects/Keywords: Bandits Multi-Bras; Retour D’information Partielle; Dueling Bandits; Corrupt Bandits; Évaluation du Ranker; Vie Privée Différentielle; Multi-Armed Bandit; Partial Feedback; Dueling Bandits; Corrupt Bandits; Ranker Evaluation; Differential Privacy
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Gajane, P. (2017). Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle. (Doctoral Dissertation). Lille 3. Retrieved from http://www.theses.fr/2017LIL30045
Chicago Manual of Style (16th Edition):
Gajane, Pratik. “Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle.” 2017. Doctoral Dissertation, Lille 3. Accessed April 14, 2021.
http://www.theses.fr/2017LIL30045.
MLA Handbook (7th Edition):
Gajane, Pratik. “Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle.” 2017. Web. 14 Apr 2021.
Vancouver:
Gajane P. Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle. [Internet] [Doctoral dissertation]. Lille 3; 2017. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2017LIL30045.
Council of Science Editors:
Gajane P. Multi-armed bandits with unconventional feedback : Bandits multi-armés avec rétroaction partielle. [Doctoral Dissertation]. Lille 3; 2017. Available from: http://www.theses.fr/2017LIL30045
4.
-4677-643X.
Online experiment design with causal structures.
Degree: PhD, Electrical and Computer Engineering, 2019, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/2950
► Modern learning systems like recommendation engines, computational advertising systems, online parameter tuning services are inherently online; i.e. these systems need to continually collect data, take…
(more)
▼ Modern learning systems like recommendation engines, computational advertising systems, online parameter tuning services are inherently online; i.e. these systems need to continually collect data, take decisions to optimize a certain objective and then collect more data with the objective of improving their predictive abilities. This leads to the well-known exploration (searching the space of possible decisions) and exploitation (choosing the optimal decision according to the learned model) dilemma. A principled way to capture this trade-off is the study of
multi-
armed bandit problems. On the other hand, these online learning systems are made up of several interacting components. Therefore, it is beneficial to study the pattern of interaction among these components, in order to explore in a sample efficient manner, which in turn leads to better exploitation. In this thesis, we will see that it is sometimes beneficial to view these online learning systems under the lens of causality; thus formalizing the pattern of interaction among the various components of the system, through causal graphical models. In our first problem, we study the contextual bandit problem with L observed contexts and K arms, with a latent low dimensional causal structure. We show that leveraging this latent low dimensional structure can lead to superior regret guarantees that are practical even for smaller time horizons. This also leads to the first regret guarantees for low-rank matrix completion where the rank is greater than one. Our second problem deals with leveraging information leakage in an online fashion in the presence of causal structures. We identify that in presence of general causal structures there is information leakage between different interventions viewed as arms of a bandit i.e. collecting data under one intervention can inform us about the statistics under other interventions. We demonstrate how to leverage this information leakage through adaptive importance sampling and apply our algorithm in biological networks and for interpretability of deep networks. This directly leads us to our third problem, where we use the idea of information leakage (explored in our second problem) in the context of stochastic contextual
bandits. We propose the contextual
bandits with stochastic experts problem and provide the first problem dependent regret bound in contextual
bandits, where the scaling of the regret bound can potentially be as low as logarithmic in the number of experts. We show that our algorithm outperforms several state of the art algorithms on progressive validation tasks on
multi-class classification data-sets. In our fourth problem, we look at conditional independence testing, which is one of the fundamental tools in causal structure learning. We reduce this problem into binary classification, through a nearest neighbor based bootstrap procedure. This enables us to use powerful supervised learning tools like gradient boosted trees or deep neural networks, that have desirable properties in higher dimensions. Finally in our…
Advisors/Committee Members: Shakkottai, Sanjay (advisor), Caramanis, Constantine (committee member), Dimakis, Georgios-Alexandros (committee member), Johari, Ramesh (committee member), Sanghavi, Sujay (committee member).
Subjects/Keywords: Online learning; Multi-armed bandits; Contextual bandits; Hyper-parameter tuning; Tree-search; CI testing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-4677-643X. (2019). Online experiment design with causal structures. (Doctoral Dissertation). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/2950
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-4677-643X. “Online experiment design with causal structures.” 2019. Doctoral Dissertation, University of Texas – Austin. Accessed April 14, 2021.
http://dx.doi.org/10.26153/tsw/2950.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-4677-643X. “Online experiment design with causal structures.” 2019. Web. 14 Apr 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-4677-643X. Online experiment design with causal structures. [Internet] [Doctoral dissertation]. University of Texas – Austin; 2019. [cited 2021 Apr 14].
Available from: http://dx.doi.org/10.26153/tsw/2950.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-4677-643X. Online experiment design with causal structures. [Doctoral Dissertation]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/2950
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
5.
L. Cella.
EFFICIENCY AND REALISM IN STOCHASTIC BANDITS.
Degree: 2021, Università degli Studi di Milano
URL: http://hdl.handle.net/2434/807862
► This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one…
(more)
▼ This manuscript is dedicated to the analysis of the application of stochastic
bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions.
In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.
Advisors/Committee Members: tutor: N. Cesa-Bianchi, coordinatore PhD program: P. Boldi, CESA BIANCHI, NICOLO&apos, ANTONIO, BOLDI, PAOLO.
Subjects/Keywords: machine learning; multi-armed bandits; stochastic bandits; online learning; Settore INF/01 - Informatica
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Cella, L. (2021). EFFICIENCY AND REALISM IN STOCHASTIC BANDITS. (Thesis). Università degli Studi di Milano. Retrieved from http://hdl.handle.net/2434/807862
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Cella, L.. “EFFICIENCY AND REALISM IN STOCHASTIC BANDITS.” 2021. Thesis, Università degli Studi di Milano. Accessed April 14, 2021.
http://hdl.handle.net/2434/807862.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Cella, L.. “EFFICIENCY AND REALISM IN STOCHASTIC BANDITS.” 2021. Web. 14 Apr 2021.
Vancouver:
Cella L. EFFICIENCY AND REALISM IN STOCHASTIC BANDITS. [Internet] [Thesis]. Università degli Studi di Milano; 2021. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/2434/807862.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Cella L. EFFICIENCY AND REALISM IN STOCHASTIC BANDITS. [Thesis]. Università degli Studi di Milano; 2021. Available from: http://hdl.handle.net/2434/807862
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
6.
Besson, Lilian.
Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets.
Degree: Docteur es, Télécommunications (STIC), 2019, CentraleSupélec
URL: http://www.theses.fr/2019CSUP0005
► Dans cette thèse de doctorat, nous étudions les réseaux sans fil et les appareils reconfigurables qui peuvent accéder à des réseaux de type radio intelligente,…
(more)
▼ Dans cette thèse de doctorat, nous étudions les réseaux sans fil et les appareils reconfigurables qui peuvent accéder à des réseaux de type radio intelligente, dans des bandes non licenciées et sans supervision centrale. Nous considérons notamment des réseaux actuels ou futurs de l’Internet des Objets (IoT), avec l’objectif d’augmenter la durée de vie de la batterie des appareils, en les équipant d’algorithmes d’apprentissage machine peu coûteux mais efficaces, qui leur permettent d’améliorer automatiquement l’efficacité de leurs communications sans fil. Nous proposons deux modèles de réseaux IoT, et nous montrons empiriquement, par des simulations numériques et une validation expérimentale réaliste, le gain que peuvent apporter nos méthodes, qui se reposent sur l’apprentissage par renforcement. Les différents problèmes d’accès au réseau sont modélisés avec des Bandits Multi-Bras (MAB), mais l’analyse de la convergence d’un grand nombre d’appareils jouant à un jeu collaboratif sans communication ni aucune coordination reste délicate, lorsque les appareils suivent tous un modèle d’activation aléatoire. Le reste de ce manuscrit étudie donc deux modèles restreints, d’abord des banditsmulti-joueurs dans des problèmes stationnaires, puis des bandits mono-joueur non stationnaires. Nous détaillons également une autre contribution, la bibliothèque Python open-source SMPyBandits, qui permet des simulations numériques de problèmes MAB, qui couvre les modèles étudiés et d’autres.
In this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more.
Advisors/Committee Members: Moy, Christophe (thesis director).
Subjects/Keywords: Internet des Objets (IdO); Radio Intelligente; Apprentissage séquentiel; Apprentissage par renforcement; Bandits multi-bras (BMB); Bandits multi-bras multi-joueurs; Bandits multi-bras non stationnaires; Internet of Things (IoT); Cognitive Radio; Sequential Learning; Reinforcement Learning; Multi-Armed Bandits (MAB); Multi-Player Multi-Armed Bandits; Non-Stationary Multi-Armed Bandits; 621.38
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Besson, L. (2019). Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets. (Doctoral Dissertation). CentraleSupélec. Retrieved from http://www.theses.fr/2019CSUP0005
Chicago Manual of Style (16th Edition):
Besson, Lilian. “Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets.” 2019. Doctoral Dissertation, CentraleSupélec. Accessed April 14, 2021.
http://www.theses.fr/2019CSUP0005.
MLA Handbook (7th Edition):
Besson, Lilian. “Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets.” 2019. Web. 14 Apr 2021.
Vancouver:
Besson L. Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets. [Internet] [Doctoral dissertation]. CentraleSupélec; 2019. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2019CSUP0005.
Council of Science Editors:
Besson L. Multi-Players Bandit Algorithms for Internet of Things Networks : Algorithmes de Bandits Multi-Joueurs pour les Réseaux de l'Internet des Objets. [Doctoral Dissertation]. CentraleSupélec; 2019. Available from: http://www.theses.fr/2019CSUP0005

University of Alberta
7.
Neufeld, James, P.
Adaptive Monte Carlo Integration.
Degree: PhD, Department of Computing Science, 2016, University of Alberta
URL: https://era.library.ualberta.ca/files/chx11xf288
► Monte Carlo methods are a simple, effective, and widely deployed way of approximating integrals that prove too challenging for deterministic approaches. This thesis presents a…
(more)
▼ Monte Carlo methods are a simple, effective, and
widely deployed way of approximating integrals that prove too
challenging for deterministic approaches. This thesis presents a
number of contributions to the field of adaptive Monte Carlo
methods. That is, approaches that automatically adjust the
behaviour of the sampling algorithm to better suit the targeted
integrand. The first of such contributions is the introduction of a
new method, antithetic Markov chain sampling, which improves
sampling efficiency through the use of simulated Markov chains.
These chains effectively guide the sampling toward more influential
regions of the integrand (modes). We demonstrate that this approach
leads to unbiased estimators and offers significant improvements
over standard approaches on challenging multi-modal integrands. We
next consider the complementary task of efficiently allocating
computation between a set of unbiased samplers through observations
of their past performance. Here, we show that this problem is
equivalent to the well known stochastic multi-armed bandit problem
and, as a result, existing algorithms and theoretical guarantees
transfer immediately which gives rise to new results for the
adaptive Monte Carlo setting. We then extend this framework to
cover an important practical condition, where each individual
sampler (bandit arm) may take a random amount of computation time
to produce a sample. Here, we again show that existing bandit
algorithms can be applied through the use of a simple sampling
trick, and prove new results which bounding the regret for any such
algorithm from above. Lastly, we consider the task of combining a
set of unbiased Monte Carlo estimators, with unique variances and
samples sizes, into a single estimator. We show that
upper-confidence approaches similar to those used in the
multi-armed bandit literature lead to estimators that improve on
existing solutions both theoretically and in practice.
Interestingly, each of these contributions may be applied in
parallel and in complement to one another to produce any number of
highly adaptable, robust, and practical Monte Carlo integration
algorithms.
Subjects/Keywords: Online Learning; Machine Learning; Monte Carlo; Multi-armed Bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Neufeld, James, P. (2016). Adaptive Monte Carlo Integration. (Doctoral Dissertation). University of Alberta. Retrieved from https://era.library.ualberta.ca/files/chx11xf288
Chicago Manual of Style (16th Edition):
Neufeld, James, P. “Adaptive Monte Carlo Integration.” 2016. Doctoral Dissertation, University of Alberta. Accessed April 14, 2021.
https://era.library.ualberta.ca/files/chx11xf288.
MLA Handbook (7th Edition):
Neufeld, James, P. “Adaptive Monte Carlo Integration.” 2016. Web. 14 Apr 2021.
Vancouver:
Neufeld, James P. Adaptive Monte Carlo Integration. [Internet] [Doctoral dissertation]. University of Alberta; 2016. [cited 2021 Apr 14].
Available from: https://era.library.ualberta.ca/files/chx11xf288.
Council of Science Editors:
Neufeld, James P. Adaptive Monte Carlo Integration. [Doctoral Dissertation]. University of Alberta; 2016. Available from: https://era.library.ualberta.ca/files/chx11xf288

Cornell University
8.
Xu, Xiao.
Multi-Armed Bandits in Large-Scale Complex Systems.
Degree: PhD, Electrical and Computer Engineering, 2020, Cornell University
URL: http://hdl.handle.net/1813/70395
► This dissertation focuses on the multi-armed bandit problem (MAB) where the objective is a sequential arm selection policy that maximizes the total reward over time.…
(more)
▼ This dissertation focuses on the
multi-
armed bandit problem (MAB) where the objective is a sequential arm selection policy that maximizes the total reward over time. In canonical formulations of MAB, the following assumptions are adopted: the size of the action space is much smaller than the length of the time horizon, computation resources such as memory are unlimited in the learning process, and the generative models of arm rewards are time-invariant. This dissertation aims to relax these assumptions, which are unrealistic in emerging applications involving large-scale complex systems, and develop corresponding techniques to address the resulting new issues. The first part of the dissertation aims to address the issue of a massive number of actions. A stochastic bandit problem with side information on arm similarity and dissimilarity is studied. The main results include a unit interval graph (UIG) representation of the action space that succinctly models the side information and a two-step learning structure that fully exploits the topological structure of the UIG to achieve an optimal scaling of the learning cost with the size of the action space. Specifically, in the UIG representation, each node represents an arm and the presence (absence) of an edge between two nodes indicates similarity (dissimilarity) between their mean rewards. Based on whether the UIG is fully revealed by the side information, two settings with complete and partial side information are considered. For each setting, a two-step learning policy consisting of an offline reduction of the action space and online aggregation of reward observations from similar arms is developed. The computation efficiency and the order optimality of the proposed strategies in terms of the size of the action space and the time length are established. Numerical experiments on both synthetic and real-world datasets are conducted to verify the performance of the proposed policies in practice. In the second part of the dissertation, the issue of limited memory during the learning process is studied in the adversarial bandit setting. Specifically, a learning policy can only store the statistics of a subset of arms summarizing their reward history. A general hierarchical learning structure that trades off the regret order with memory complexity is developed based on
multi-level partitions of the arm set into groups and the time horizon into epochs. The proposed learning policy requires only a sublinear order of memory space in terms of the number of arms. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret in expectation and/or with high probability, when appropriate learning strategies are adopted as subroutines at all levels. By properly choosing the number of levels in the adopted hierarchy, the policy adapts to different sizes of the available memory space. A memory-dependent regret bound is established to characterize the tradeoff between memory complexity and the regret performance of the policy.…
Advisors/Committee Members: Zhao, Qing (chair), Acharya, Jayadev (committee member), Chen, Yudong (committee member).
Subjects/Keywords: Large-Scale Complex Systems; Multi-Armed Bandits; No-regret Learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Xu, X. (2020). Multi-Armed Bandits in Large-Scale Complex Systems. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/70395
Chicago Manual of Style (16th Edition):
Xu, Xiao. “Multi-Armed Bandits in Large-Scale Complex Systems.” 2020. Doctoral Dissertation, Cornell University. Accessed April 14, 2021.
http://hdl.handle.net/1813/70395.
MLA Handbook (7th Edition):
Xu, Xiao. “Multi-Armed Bandits in Large-Scale Complex Systems.” 2020. Web. 14 Apr 2021.
Vancouver:
Xu X. Multi-Armed Bandits in Large-Scale Complex Systems. [Internet] [Doctoral dissertation]. Cornell University; 2020. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/1813/70395.
Council of Science Editors:
Xu X. Multi-Armed Bandits in Large-Scale Complex Systems. [Doctoral Dissertation]. Cornell University; 2020. Available from: http://hdl.handle.net/1813/70395

University of Victoria
9.
Huang, Zhiming.
Thompson sampling-based online decision making in network routing.
Degree: Department of Computer Science, 2020, University of Victoria
URL: http://hdl.handle.net/1828/12095
► Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards…
(more)
▼ Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible.
Typical examples include
multi-
armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet.
Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given.
In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance.
Advisors/Committee Members: Pan, Jianping (supervisor).
Subjects/Keywords: Online Decision Making; Multi-armed Bandits; Thompson Sampling; Network Routing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Huang, Z. (2020). Thompson sampling-based online decision making in network routing. (Masters Thesis). University of Victoria. Retrieved from http://hdl.handle.net/1828/12095
Chicago Manual of Style (16th Edition):
Huang, Zhiming. “Thompson sampling-based online decision making in network routing.” 2020. Masters Thesis, University of Victoria. Accessed April 14, 2021.
http://hdl.handle.net/1828/12095.
MLA Handbook (7th Edition):
Huang, Zhiming. “Thompson sampling-based online decision making in network routing.” 2020. Web. 14 Apr 2021.
Vancouver:
Huang Z. Thompson sampling-based online decision making in network routing. [Internet] [Masters thesis]. University of Victoria; 2020. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/1828/12095.
Council of Science Editors:
Huang Z. Thompson sampling-based online decision making in network routing. [Masters Thesis]. University of Victoria; 2020. Available from: http://hdl.handle.net/1828/12095

Princeton University
10.
Schneider, Jonathan.
Learning Algorithms in Strategic Environments
.
Degree: PhD, 2018, Princeton University
URL: http://arks.princeton.edu/ark:/88435/dsp01w0892d703
► Learning algorithms are often analyzed under the assumption their inputs are drawn from stochastic or adversarial sources. Increasingly, these algorithms are being applied in strategic…
(more)
▼ Learning algorithms are often analyzed under the assumption their inputs are drawn from stochastic or adversarial sources. Increasingly, these algorithms are being applied in strategic settings, where we can hope for stronger guarantees. This thesis aims to understand the performance of existing learning algorithms in these settings, and to design new algorithms that perform well in these settings.
This thesis is divided into three parts. In Part I, we address the question of how agents should learn to bid in repeated non-truthful auctions – and conversely, how should we design auctions whose participants are learning agents.
In Part II, we study the dynamic pricing problem: the question of how should a large retailer learn how to set prices for a sequence of disparate goods over time, based on observing demands for goods at various prices. Previous work has demonstrated how to obtain O(log T) regret for this problem. We show how to achieve regret O(log log T), which is tight. Our algorithm uses ideas from integral geometry (most notably the concept of intrinsic volumes).
Finally, in Part III, we study how to learn the ranking of a set of N items from pairwise comparisons that may be strategic or noisy. In particular, we design mechanisms for a variety of settings (choosing the winner of a round-robin tournament, aggregating the top-K items under the strong stochastic transitivity noise model) which outperform the naive rule of ranking items according to the total number of pairwise comparisons won.
Advisors/Committee Members: Braverman, Mark (advisor).
Subjects/Keywords: algorithmic mechanism design;
multi-armed bandits;
online learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Schneider, J. (2018). Learning Algorithms in Strategic Environments
. (Doctoral Dissertation). Princeton University. Retrieved from http://arks.princeton.edu/ark:/88435/dsp01w0892d703
Chicago Manual of Style (16th Edition):
Schneider, Jonathan. “Learning Algorithms in Strategic Environments
.” 2018. Doctoral Dissertation, Princeton University. Accessed April 14, 2021.
http://arks.princeton.edu/ark:/88435/dsp01w0892d703.
MLA Handbook (7th Edition):
Schneider, Jonathan. “Learning Algorithms in Strategic Environments
.” 2018. Web. 14 Apr 2021.
Vancouver:
Schneider J. Learning Algorithms in Strategic Environments
. [Internet] [Doctoral dissertation]. Princeton University; 2018. [cited 2021 Apr 14].
Available from: http://arks.princeton.edu/ark:/88435/dsp01w0892d703.
Council of Science Editors:
Schneider J. Learning Algorithms in Strategic Environments
. [Doctoral Dissertation]. Princeton University; 2018. Available from: http://arks.princeton.edu/ark:/88435/dsp01w0892d703

University of Southern California
11.
Kalathil, Dileep Manisseri.
Empirical methods in control and optimization.
Degree: PhD, Electrical Engineering, 2014, University of Southern California
URL: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll3/id/489169/rec/2323
► This dissertation addresses some problems in the area of learning, optimization and decision making in stochastic systems using empirical methods. ❧ First part of the…
(more)
▼ This dissertation addresses some problems in the area
of learning, optimization and decision making in stochastic systems
using empirical methods. ❧ First part of the dissertation addresses
sequential learning and decision making with partial and noisy
information which is an important problem in many areas like
manufacturing systems, communication networks, clinical trials etc.
Centralized sequential learning and decision making problems have
been studied extensively in the literature where such problems are
modeled as (single agent)
Multi-
Armed Bandits (MAB) problems (Lai
and Robbins, 1985). Surprisingly, a similar framework for
multi-agent systems, a decentralized MAB framework, hasn't received
much attention. In our work, we developed a theory for
decentralized learning in
multi-player
multi-
armed bandits. We
showed that a simple index based algorithm is sufficient to
maximize the team objective and explicitly designed such an
algorithm. We also showed that the convergence rate of this
algorithm is O((log²T)/T). ❧ Second part of the thesis addresses
learning and optimization in dynamical systems, with a focus on
Markov Decision Processes (MDP). Design and analysis of large and
complex systems often involve large scale and time consuming
computer simulations. This is because they are often inherently
stochastic where the dynamics may be driven by some stochastic
process whose characteristics may be unknown to the designer. A
large number of such problems, like autonomous navigation of
robots, are modeled as Markov Decision Processes (MDP) problem.
Since the underlying transition kernel which governs the stochastic
transitions of the system state are often unknown, an optimal
control policy for such systems are often found by simulations. In
our work, we addressed this problem of simulation based learning
and optimization in MDPs and we developed a theory of empirical
dynamic programming (EDP) for MDPs. We proposed simple and natural
empirical variants of the classical dynamic programming algorithms,
empirical value iteration (EVI) and empirical policy iteration
(EPI), and gave convergence and sample complexity bounds for both.
We also showed that these techniques can be used in many other
situations including the minimax equilibrium computation for
zero-sum stochastic games. We also introduce another algorithm,
Empirical Q Value Iteration (EQVI) which gives a stronger (almost
sure) convergence guarantee. Simulation results show better
convergence rate for these algorithms than stochastic
approximation/reinforcement learning schemes such as Q-learning and
actor-critic learning. ❧ We also address the problem of learning
for
multi-criterion optimization in MDPs. The problem of competing
agents with
multi-criterion performance objective was first
considered by Blackwell (Blackwell, 1956) in the context of static
games with vector-valued payoffs. Blackwell introduced the notion
of approachability: a target set is approachable for a given agent
if its average payoff vector approaches this set for any strategy
of the…
Advisors/Committee Members: Jain, Rahul (Committee Chair), Krishnamachari, Bhaskar (Committee Member), Liu, Yan (Committee Member).
Subjects/Keywords: online optimization; multi-armed bandits; MDP; approachability; spectrum sharing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Kalathil, D. M. (2014). Empirical methods in control and optimization. (Doctoral Dissertation). University of Southern California. Retrieved from http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll3/id/489169/rec/2323
Chicago Manual of Style (16th Edition):
Kalathil, Dileep Manisseri. “Empirical methods in control and optimization.” 2014. Doctoral Dissertation, University of Southern California. Accessed April 14, 2021.
http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll3/id/489169/rec/2323.
MLA Handbook (7th Edition):
Kalathil, Dileep Manisseri. “Empirical methods in control and optimization.” 2014. Web. 14 Apr 2021.
Vancouver:
Kalathil DM. Empirical methods in control and optimization. [Internet] [Doctoral dissertation]. University of Southern California; 2014. [cited 2021 Apr 14].
Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll3/id/489169/rec/2323.
Council of Science Editors:
Kalathil DM. Empirical methods in control and optimization. [Doctoral Dissertation]. University of Southern California; 2014. Available from: http://digitallibrary.usc.edu/cdm/compoundobject/collection/p15799coll3/id/489169/rec/2323

University of Texas – Austin
12.
-4028-5309.
Finding good enough coins under symmetric and asymmetric information.
Degree: MSin Engineering, Electrical and Computer Engineering, 2017, University of Texas – Austin
URL: http://hdl.handle.net/2152/68220
► We study the problem of returning m coins with biases above 0:5. These good enough coins that are returned by the agent should be acceptable…
(more)
▼ We study the problem of returning m coins with biases above 0:5. These good enough coins that are returned by the agent should be acceptable to the authority by meeting the authority's Family Wise Error Rate constraint. We design adaptive algorithms that invoke Sequential Probability Ratio Test to find these good enough coins. We consider scenarios that differ in terms of the information available about the underlying Bayesian setting. The symmetry or asymmetry of the underlying setup, i.e., the difference between what the agent and the authority know about the underlying prior and the support, presents different challenges. We also make notes on the algorithms' sample complexity.
Advisors/Committee Members: Caramanis, Constantine (advisor).
Subjects/Keywords: Multi-armed bandits; Sequential hypothesis testing; FWER control; Information asymmetry
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-4028-5309. (2017). Finding good enough coins under symmetric and asymmetric information. (Masters Thesis). University of Texas – Austin. Retrieved from http://hdl.handle.net/2152/68220
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-4028-5309. “Finding good enough coins under symmetric and asymmetric information.” 2017. Masters Thesis, University of Texas – Austin. Accessed April 14, 2021.
http://hdl.handle.net/2152/68220.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-4028-5309. “Finding good enough coins under symmetric and asymmetric information.” 2017. Web. 14 Apr 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-4028-5309. Finding good enough coins under symmetric and asymmetric information. [Internet] [Masters thesis]. University of Texas – Austin; 2017. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/2152/68220.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-4028-5309. Finding good enough coins under symmetric and asymmetric information. [Masters Thesis]. University of Texas – Austin; 2017. Available from: http://hdl.handle.net/2152/68220
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

The Ohio State University
13.
Liu, Fang.
Efficient Online Learning with Bandit Feedback.
Degree: PhD, Electrical and Computer Engineering, 2020, The Ohio State University
URL: http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268
► Online learning has been widely used in modern machine learning systems. In spite of the recent progress in online learning, many challenges remain unsolved when…
(more)
▼ Online learning has been widely used in modern machine
learning systems. In spite of the recent progress in online
learning, many challenges remain unsolved when online learning is
applied to the complex practical scenarios. The key challenges can
be summarized from the following perspectives. First,
curse-of-dimensionality of the action space: to learn the optimal
action, the learning algorithm needs to explore all the actions to
some extent. Thus, large-scale action space makes the learning
algorithm converge slowly and incur large cost in the cold-start
cases. Consider the online recommendation systems, the number of
users or the number of items is a large number compared to the
finite exploration budget. Second, non-stationary environments:
real-world learning systems are highly dynamic, which is reflected
in the fact that users' preferences change over time due to various
internal or external factors, and item popularity vary due to fast
emerging events. Failing to be adaptive to the changing environment
may lead to sub-optimal decisions. Third, computational resource
constraints: online learning algorithms need to be updated with the
arriving data stream, which may require heavy computations for each
arriving new data. The frequent updates cost a large amount of
computation resources and prevent online learning tools from
real-time applications. In this dissertation, we aim at developing
efficient online learning solutions, and more specifically
multi-
armed bandit solutions, to conquer the aforementioned
challenges. First, we study the correlations between actions in
order to reduce the action space complexity of the online learning
problem. We start with a special case of correlations, that playing
one action also generates the side observations of other actions.
Then we generalize the result to a general form of correlations
over actions. Second, we consider
multi-
armed bandit in a
non-stationary environment such that the learning algorithm can
automatically detect the potential changes and adapt its decision
making strategy accordingly. Finally, we shift our focus on the
trade-off between computational complexity and optimality of the
bandit algorithms. Our goal is to design a class of bandit
algorithms that can be computed efficiently and also behaves
closely to the optimal algorithms.By combining the proposed
research, an online learning system can be more efficient in
large-scale applications, more adaptive to the changing world, and
cost less computational resources. The proposed solutions can be
applied to a wide spectrum of applications including the
recommendation systems, viral marketing, communication networks,
clinical trials in healthcare, sequential experiment design, and
many more.
Advisors/Committee Members: Shroff, Ness (Advisor).
Subjects/Keywords: Computer Science; Electrical Engineering; machine learning; online learning; multi-armed bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Liu, F. (2020). Efficient Online Learning with Bandit Feedback. (Doctoral Dissertation). The Ohio State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268
Chicago Manual of Style (16th Edition):
Liu, Fang. “Efficient Online Learning with Bandit Feedback.” 2020. Doctoral Dissertation, The Ohio State University. Accessed April 14, 2021.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268.
MLA Handbook (7th Edition):
Liu, Fang. “Efficient Online Learning with Bandit Feedback.” 2020. Web. 14 Apr 2021.
Vancouver:
Liu F. Efficient Online Learning with Bandit Feedback. [Internet] [Doctoral dissertation]. The Ohio State University; 2020. [cited 2021 Apr 14].
Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268.
Council of Science Editors:
Liu F. Efficient Online Learning with Bandit Feedback. [Doctoral Dissertation]. The Ohio State University; 2020. Available from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268
14.
Lagrée, Paul.
Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications.
Degree: Docteur es, Informatique, 2017, Université Paris-Saclay (ComUE)
URL: http://www.theses.fr/2017SACLS341
► Lorsque les internautes naviguent sur le Web, ils laissent de nombreuses traces que nous nous proposons d’exploiter pour améliorer les applications d'accès à l'information. Nous…
(more)
▼ Lorsque les internautes naviguent sur le Web, ils laissent de nombreuses traces que nous nous proposons d’exploiter pour améliorer les applications d'accès à l'information. Nous étudions des techniques centrées sur les utilisateurs qui tirent parti des nombreux types de rétroaction pour perfectionner les services offerts aux utilisateurs. Nous nous concentrons sur des applications telles que la recommandation et le marketing d’influence dans lesquelles les utilisateurs génèrent des signaux (clics, "j'aime", etc.) que nous intégrons dans nos algorithmes afin de fournir des services fortement contextualisés. La première partie de cette thèse est consacrée à une approche interactive de la recherche d'information sur les médias sociaux. Le problème consiste à récupérer un ensemble de k résultats dans un réseau social sous la contrainte que la requête peut être incomplète (par exemple, si le dernier terme est un préfixe). Chaque fois que l'utilisateur met à jour sa requête, le système met à jour l'ensemble des résultats de recherche en conséquence. Nous adoptons une interprétation de la pertinence de l'information qui tient compte du réseau, selon laquelle l'information produite par les utilisateurs proches de l'utilisateur faisant la requête est jugée plus pertinente. Ensuite, nous étudions une version générique de la maximisation de l'influence, dans laquelle nous voulons maximiser l'influence des campagnes d'information ou de marketing en sélectionnant de manière adaptative les utilisateurs initiant la propagation de l'information parmi un petit sous-ensemble de la population. Notre approche ne fait aucune hypothèse sur le modèle de diffusion sous-jacent ni même sur la structure du réseau de diffusion. Notre méthode a d'importantes applications dans le marketing d’influence qui vise à s’appuyer sur les influenceurs de réseaux sociaux pour promouvoir des produits ou des idées. Enfin, nous abordons le problème bien connu du démarrage à froid auquel sont confrontés les systèmes de recommandation par une approche adaptative. Si aucune information n’est disponible concernant l'appréciation d’un article, le système de recommandation doit recueillir des signaux (clics, etc.) afin d'estimer la valeur de l'article. Cependant, afin de minimiser les mauvaises recommandations faites aux utilisateurs, le système ne doit pas recueillir ces signaux de façon négligente. Nous introduisons un algorithme dynamique qui vise à alterner intelligemment les recommandations visant à accumuler de l'information et celles s'appuyant sur les données déjà recueillies.
When users interact on modern Web systems, they let numerous footprints which we propose to exploit in order to develop better applications for information access. We study a family of techniques centered on users, which take advantage of the many types of feedback to adapt and improve services provided to users. We focus on applications like recommendation and influencer marketing in which users generate discrete feedback (e.g. clicks, "likes", reposts, etc.) that we incorporate…
Advisors/Committee Members: Cautis, Bogdan (thesis director).
Subjects/Keywords: Méthodes adaptatives; Réseaux sociaux; Bandits multi-bras; Recommandation; Marketing d'influence; Adaptive methods; Online social networks; Multi-armed bandits; Recommendation; Influencer marketing
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lagrée, P. (2017). Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications. (Doctoral Dissertation). Université Paris-Saclay (ComUE). Retrieved from http://www.theses.fr/2017SACLS341
Chicago Manual of Style (16th Edition):
Lagrée, Paul. “Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications.” 2017. Doctoral Dissertation, Université Paris-Saclay (ComUE). Accessed April 14, 2021.
http://www.theses.fr/2017SACLS341.
MLA Handbook (7th Edition):
Lagrée, Paul. “Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications.” 2017. Web. 14 Apr 2021.
Vancouver:
Lagrée P. Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications. [Internet] [Doctoral dissertation]. Université Paris-Saclay (ComUE); 2017. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2017SACLS341.
Council of Science Editors:
Lagrée P. Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur : Adaptive Methods for User-Centric Information Access Applications. [Doctoral Dissertation]. Université Paris-Saclay (ComUE); 2017. Available from: http://www.theses.fr/2017SACLS341

Cornell University
15.
Chen, Bangrui.
Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration.
Degree: PhD, Operations Research, 2017, Cornell University
URL: http://hdl.handle.net/1813/59050
► In this thesis, we study adaptive preference learning, in which a machine learning system learns users' preferences from feedback while simultaneously using these learned preferences…
(more)
▼ In this thesis, we study adaptive preference learning, in which a machine learning system learns users' preferences from feedback while simultaneously using these learned preferences to help them find preferred items. We study three different types of user feedback in three application setting: cardinal feedback with application in information filtering systems, ordinal feedback with application in personalized content recommender systems, and attribute feedback with application in review aggregators. We connect these settings respectively to existing work on classical
multi-
armed bandits, dueling
bandits, and incentivizing exploration. For each type of feedback and application setting, we provide an algorithm and a theoretical analysis bounding its regret. We demonstrate through numerical experiments that our algorithms outperform existing benchmarks.
Advisors/Committee Members: Frazier, Peter (chair), Topaloglu, Huseyin (committee member), Joachims, Thorsten (committee member).
Subjects/Keywords: Statistics; Operations research; Computer science; adaptive preference learning; bandit feedback; dueling bandits; incentivizing exploration; information filtering; multi-armed bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Chen, B. (2017). Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration. (Doctoral Dissertation). Cornell University. Retrieved from http://hdl.handle.net/1813/59050
Chicago Manual of Style (16th Edition):
Chen, Bangrui. “Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration.” 2017. Doctoral Dissertation, Cornell University. Accessed April 14, 2021.
http://hdl.handle.net/1813/59050.
MLA Handbook (7th Edition):
Chen, Bangrui. “Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration.” 2017. Web. 14 Apr 2021.
Vancouver:
Chen B. Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration. [Internet] [Doctoral dissertation]. Cornell University; 2017. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/1813/59050.
Council of Science Editors:
Chen B. Adaptive Preference Learning With Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration. [Doctoral Dissertation]. Cornell University; 2017. Available from: http://hdl.handle.net/1813/59050
16.
Allesiardo, Robin.
Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams.
Degree: Docteur es, Informatique, 2016, Université Paris-Saclay (ComUE)
URL: http://www.theses.fr/2016SACLS334
► Le problème des bandits manchots est un cadre théorique permettant d'étudier le compromis entre exploration et exploitation lorsque l'information observée est partielle. Dans celui-ci, un…
(more)
▼ Le problème des bandits manchots est un cadre théorique permettant d'étudier le compromis entre exploration et exploitation lorsque l'information observée est partielle. Dans celui-ci, un joueur dispose d'un ensemble de K bras (ou actions), chacun associé à une distribution de récompenses D(µk) de moyenne µk Є [0, 1] et de support [0, 1]. A chaque tour t Є [1, T], il choisit un bras kt et observe la récompense y kt tirée depuis D (µkt). La difficulté du problème vient du fait que le joueur observe uniquement la récompense associée au bras joué; il ne connaît pas celle qui aurait pu être obtenue en jouant un autre bras. À chaque choix, il est ainsi confronté au dilemme entre l'exploration et l'exploitation; explorer lui permet d'affiner sa connaissance des distributions associées aux bras explorés tandis qu'exploiter lui permet d'accumuler davantage de récompenses en jouant le meilleur bras empirique (sous réserve que le meilleur bras empirique soit effectivement le meilleur bras). Dans la première partie de la thèse nous aborderons le problème des bandits manchots lorsque les distributions générant les récompenses sont non-stationnaires. Nous étudierons dans un premier temps le cas où même si les distributions varient au cours du temps, le meilleur bras ne change pas. Nous étudierons ensuite le cas où le meilleur bras peut aussi changer au cours du temps. La seconde partie est consacrée aux algorithmes de bandits contextuels où les récompenses dépendent de l'état de l'environnement. Nous étudierons l'utilisation des réseaux de neurones et des forêts d'arbres dans le cas des bandits contextuels puis les différentes approches à base de méta-bandits permettant de sélectionner en ligne l'expert le plus performant durant son apprentissage.
The multi-armed bandit is a framework allowing the study of the trade-off between exploration and exploitation under partial feedback. At each turn t Є [1,T] of the game, a player has to choose an arm kt in a set of K and receives a reward ykt drawn from a reward distribution D(µkt) of mean µkt and support [0,1]. This is a challeging problem as the player only knows the reward associated with the played arm and does not know what would be the reward if she had played another arm. Before each play, she is confronted to the dilemma between exploration and exploitation; exploring allows to increase the confidence of the reward estimators and exploiting allows to increase the cumulative reward by playing the empirical best arm (under the assumption that the empirical best arm is indeed the actual best arm).In the first part of the thesis, we will tackle the multi-armed bandit problem when reward distributions are non-stationary. Firstly, we will study the case where, even if reward distributions change during the game, the best arm stays the same. Secondly, we will study the case where the best arm changes during the game. The second part of the thesis tacles the contextual bandit problem where means of reward distributions are now dependent of the environment's current state. We will…
Advisors/Committee Members: Sebag, Michèle (thesis director).
Subjects/Keywords: Apprentissage en ligne; Bandits Manchots; Non Stationnarité; Apprentissage automatique; Online Learning; Multi-Armed Bandits; Non stationarity; Machine learning
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Allesiardo, R. (2016). Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams. (Doctoral Dissertation). Université Paris-Saclay (ComUE). Retrieved from http://www.theses.fr/2016SACLS334
Chicago Manual of Style (16th Edition):
Allesiardo, Robin. “Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams.” 2016. Doctoral Dissertation, Université Paris-Saclay (ComUE). Accessed April 14, 2021.
http://www.theses.fr/2016SACLS334.
MLA Handbook (7th Edition):
Allesiardo, Robin. “Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams.” 2016. Web. 14 Apr 2021.
Vancouver:
Allesiardo R. Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams. [Internet] [Doctoral dissertation]. Université Paris-Saclay (ComUE); 2016. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2016SACLS334.
Council of Science Editors:
Allesiardo R. Bandits Manchots sur Flux de Données Non Stationnaires : Multi-armed Bandits on non Stationary Data Streams. [Doctoral Dissertation]. Université Paris-Saclay (ComUE); 2016. Available from: http://www.theses.fr/2016SACLS334

Boston College
17.
Baisi Hadad, Vitor.
Essays in Econometrics and Dynamic Kidney Exchange.
Degree: PhD, Economics, 2018, Boston College
URL: http://dlib.bc.edu/islandora/object/bc-ir:107962
► This dissertation is divided into two parts. Part I - Dynamic Kidney Exchange In recent years, kidney paired donation (KPD) has an emerged as an…
(more)
▼ This dissertation is divided into two parts. Part I -
Dynamic Kidney Exchange In recent years, kidney paired donation
(KPD) has an emerged as an attractive alternative for end-stage
renal disease patients with incompatible living donors. However, we
argue that the matching algorithm currently used by organ
clearinghouses is inefficient, in the sense that a larger number of
patients may be reached if kidney transplant centers take into
consideration how their pool of patients and donors will evolve
over time. In our work Two Novel Algorithms for Dynamic Kidney
Exchange, we explore this claim and propose new computational
algorithms to increase the cardinality of matchings in a
discrete-time dynamic kidney exchange model with Poisson entries
and Geometric deaths. Our algorithms are classified into direct
prediction methods and
multi-
armed bandit methods. In the direct
prediction method, we use machine learning estimator to produce a
probability that each patient-donor pair should be matched today,
as op- posed to being left for a future matching. The estimators
are trained on offline optimal solutions. In contrast, in
multi-
armed bandit methods, we use simulations to evaluate the
desirability of different matchings. Since the amount of different
matchings is enormous,
multi-
armed bandits (MAB) are employed to
decrease order to decrease the computational burden. Our methods
are evaluated using simulations in a variety of simulation
configurations. We find that the performance of at least one of our
methods, based on
multi-
armed bandit algorithms, is able to
uniformly dominate the myopic method that is used by kidney
transplants in practice. We restrict our experiments to pairwise
kidney exchange, but the methods described here are easily
extensible, computational constraints permitting. Part II -
Econometrics In our econometric paper Heterogenous Production
Functions, Panel Data, and Productivity, we present methods for
identification of moments and nonparametric marginal distributions
of endogenous random coefficient models in fixed-T linear panel
data models. Our identification strategy is constructive,
immediately leading to relatively simple estimators that can be
shown to be consistent and asymptotically normal. Because our
strategy makes use of special properties of “small” (measure-zero)
subpopulations, our estimators are irregularly identified: they can
be shown to be consistent and asymptotically Normal, but converge
at rates slower than root-n. We provide an illustration of our
methods by estimating first and second moments of random
Cobb-Douglas coefficients in production functions, using Indian
plant-level microdata.
Advisors/Committee Members: Stefan Hoderlein (Thesis advisor).
Subjects/Keywords: Correlated random coefficients; Dynamic kidney exchange; Kidney Paired Donation; Multi-armed bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Baisi Hadad, V. (2018). Essays in Econometrics and Dynamic Kidney Exchange. (Doctoral Dissertation). Boston College. Retrieved from http://dlib.bc.edu/islandora/object/bc-ir:107962
Chicago Manual of Style (16th Edition):
Baisi Hadad, Vitor. “Essays in Econometrics and Dynamic Kidney Exchange.” 2018. Doctoral Dissertation, Boston College. Accessed April 14, 2021.
http://dlib.bc.edu/islandora/object/bc-ir:107962.
MLA Handbook (7th Edition):
Baisi Hadad, Vitor. “Essays in Econometrics and Dynamic Kidney Exchange.” 2018. Web. 14 Apr 2021.
Vancouver:
Baisi Hadad V. Essays in Econometrics and Dynamic Kidney Exchange. [Internet] [Doctoral dissertation]. Boston College; 2018. [cited 2021 Apr 14].
Available from: http://dlib.bc.edu/islandora/object/bc-ir:107962.
Council of Science Editors:
Baisi Hadad V. Essays in Econometrics and Dynamic Kidney Exchange. [Doctoral Dissertation]. Boston College; 2018. Available from: http://dlib.bc.edu/islandora/object/bc-ir:107962

Delft University of Technology
18.
Dingjan, Mitchell (author).
Exploring Exploration in Recommender Systems: Where? How Much? For Whom?.
Degree: 2020, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:ec22c976-7c3f-4101-80b4-ab0f3acea0eb
► Recommender systems focus on automatically surfacing suitable items for users from digital collections that are too large for the user to oversee themselves. A considerable…
(more)
▼ Recommender systems focus on automatically surfacing suitable items for users from digital collections that are too large for the user to oversee themselves. A considerable body of work exists on surfacing items that match what a user liked in the past; this way, the recommender system will exploit its knowledge of a user's comfort zone. However, application scenarios exist in which it is explicitly important to offer the user opportunities to explore items beyond their existing comfort zones. In such cases, the recommender should include items for which there is less existing evidence that the user will like them. This calls for the recommender to explore to what extent a user would be tolerant to exploration. In this thesis, we consider that different users will likely have different preferences with respect to item exploration. We propose personalized item filtering techniques for this, modeled under a multi-armed bandit framework, that consider (1) how and where exploration vs. exploitation items should be distributed in a result overview and (2) how adventurous exploration items can be, considering a user's general willingness to explore and existing familiarity with item categories in the collection. We present the results of a survey and an online quantitative experiment with 43 users of Muziekweb, a public music library that encourages taste broadening in the Netherlands, demonstrating the effectiveness of both our proposed filtering techniques that aim for personalized exploration.
Computer Science
Advisors/Committee Members: Liem, Cynthia (mentor), Hanjalic, Alan (graduation committee), Tintarev, Nava (graduation committee), Delft University of Technology (degree granting institution).
Subjects/Keywords: Recommender Systems; Exploration; Personalization; Filtering; User Preferences; Taste Broadening; Multi-Armed Bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dingjan, M. (. (2020). Exploring Exploration in Recommender Systems: Where? How Much? For Whom?. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:ec22c976-7c3f-4101-80b4-ab0f3acea0eb
Chicago Manual of Style (16th Edition):
Dingjan, Mitchell (author). “Exploring Exploration in Recommender Systems: Where? How Much? For Whom?.” 2020. Masters Thesis, Delft University of Technology. Accessed April 14, 2021.
http://resolver.tudelft.nl/uuid:ec22c976-7c3f-4101-80b4-ab0f3acea0eb.
MLA Handbook (7th Edition):
Dingjan, Mitchell (author). “Exploring Exploration in Recommender Systems: Where? How Much? For Whom?.” 2020. Web. 14 Apr 2021.
Vancouver:
Dingjan M(. Exploring Exploration in Recommender Systems: Where? How Much? For Whom?. [Internet] [Masters thesis]. Delft University of Technology; 2020. [cited 2021 Apr 14].
Available from: http://resolver.tudelft.nl/uuid:ec22c976-7c3f-4101-80b4-ab0f3acea0eb.
Council of Science Editors:
Dingjan M(. Exploring Exploration in Recommender Systems: Where? How Much? For Whom?. [Masters Thesis]. Delft University of Technology; 2020. Available from: http://resolver.tudelft.nl/uuid:ec22c976-7c3f-4101-80b4-ab0f3acea0eb

Princeton University
19.
Han, Weidong.
Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
.
Degree: PhD, 2019, Princeton University
URL: http://arks.princeton.edu/ark:/88435/dsp019p290d227
► We consider sequential online learning problems where the response surface is described by a nonlinear parametric model. We adopt a sampled belief model which we…
(more)
▼ We consider sequential online learning problems where the response surface is described by a nonlinear parametric model. We adopt a sampled belief model which we refer to as a discrete prior. We propose
multi-period lookahead policies to overcome the non-concavity in the value of information. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policies. Forfinite-horizon problem with undiscounted reward, we analyze the proposed policies through empirical studies in three different settings: a health setting where we make medical decisions to maximize health care response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects. We also apply the modelling framework to a real world bidding problem in online advertisement auctions, and formulate it into a finite-horizon state-dependent learning problem, where we have to maximize ad-clicks while learning from noisy responses within a budget constraint. We demonstrate that the
multi-period lookahead policies perform competitively against other state-of-the-art policies.
Advisors/Committee Members: Powell, Warren B (advisor).
Subjects/Keywords: Advertisement auctions;
Dynamic programming;
Multi-armed bandits;
Online learning;
Optimal learning;
Value of information
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Han, W. (2019). Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
. (Doctoral Dissertation). Princeton University. Retrieved from http://arks.princeton.edu/ark:/88435/dsp019p290d227
Chicago Manual of Style (16th Edition):
Han, Weidong. “Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
.” 2019. Doctoral Dissertation, Princeton University. Accessed April 14, 2021.
http://arks.princeton.edu/ark:/88435/dsp019p290d227.
MLA Handbook (7th Edition):
Han, Weidong. “Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
.” 2019. Web. 14 Apr 2021.
Vancouver:
Han W. Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
. [Internet] [Doctoral dissertation]. Princeton University; 2019. [cited 2021 Apr 14].
Available from: http://arks.princeton.edu/ark:/88435/dsp019p290d227.
Council of Science Editors:
Han W. Lookahead Approximations for Online Learning with Nonlinear Parametric Belief Models
. [Doctoral Dissertation]. Princeton University; 2019. Available from: http://arks.princeton.edu/ark:/88435/dsp019p290d227
20.
Hadiji, Hédi.
On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques.
Degree: Docteur es, Mathématiques appliquées, 2020, université Paris-Saclay
URL: http://www.theses.fr/2020UPASM021
► Cette thèse s'inscrit dans le domaine des statistiques séquentielles. Le cadre principal étudié est celui des bandits stochastiques à plusieurs bras, cadre idéal qui modélise…
(more)
▼ Cette thèse s'inscrit dans le domaine des statistiques séquentielles. Le cadre principal étudié est celui des bandits stochastiques à plusieurs bras, cadre idéal qui modélise le dilemme exploration-exploitation face à des choix répétés. La thèse est composée de quatre chapitres, précédés d'une introduction. Dans la première partie du corps de la thèse, on présente un nouvel algorithme capable d'atteindre des garanties optimales à la fois d'un point de vue distribution-dépendent et distribution-free. Les deux chapitres suivants sont consacrés à des questions dites d'adaptation. D'abord, on propose un algorithme capable de s'adapter à la régularité inconnue dans des problèmes de bandits continus, mettant en évidence le coût polynomial de l'adaptation en bandits continus. Ensuite, on considère un problème d'adaptation au supports pour des problèmes de bandits à K bras, à distributions de paiements bornés dans des intervalles inconnus. Enfin, dans un dernier chapitre un peu à part, on étudie un cadre légèrement différent de bandits préservant la diversité. On montre que le regret optimal dans ce cadre croît à des vitesses différentes des vitesses classiques, avec notamment la possibilité d'atteindre un regret constant sous certaines hypothèses.
The main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses.
Advisors/Committee Members: Stoltz, Gilles (thesis director), Massart, Pascal (thesis director).
Subjects/Keywords: Bandits stochastiques à plusieurs bras; Statistiques adaptatives; Algorithme Upper Confidence Bound (UCB); Optimalité minimax; Optimalité asymptotique; Bandits à continuum de bras; Stochastic multi-armed bandits; Adaptive statistics; Upper confidence bound (UCB); Minimax optimality; Asymptotic optimality; Continuum-armed bandits
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hadiji, H. (2020). On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques. (Doctoral Dissertation). université Paris-Saclay. Retrieved from http://www.theses.fr/2020UPASM021
Chicago Manual of Style (16th Edition):
Hadiji, Hédi. “On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques.” 2020. Doctoral Dissertation, université Paris-Saclay. Accessed April 14, 2021.
http://www.theses.fr/2020UPASM021.
MLA Handbook (7th Edition):
Hadiji, Hédi. “On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques.” 2020. Web. 14 Apr 2021.
Vancouver:
Hadiji H. On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques. [Internet] [Doctoral dissertation]. université Paris-Saclay; 2020. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2020UPASM021.
Council of Science Editors:
Hadiji H. On some adaptivity questions in stochastic multi-armed bandits : Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques. [Doctoral Dissertation]. université Paris-Saclay; 2020. Available from: http://www.theses.fr/2020UPASM021
21.
Ménard, Pierre.
Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems.
Degree: Docteur es, Mathématiques appliquées, 2018, Université Toulouse III – Paul Sabatier
URL: http://www.theses.fr/2018TOU30087
► Cette thèse s'inscrit dans les domaines de l'apprentissage statistique et de la statistique séquentielle. Le cadre principal est celui des problèmes de bandit stochastique à…
(more)
▼ Cette thèse s'inscrit dans les domaines de l'apprentissage statistique et de la statistique séquentielle. Le cadre principal est celui des problèmes de bandit stochastique à plusieurs bras. Dans une première partie, on commence par revisiter les bornes inférieures sur le regret. On obtient ainsi des bornes non-asymptotiques dépendantes de la distribution que l'on prouve de manière très simple en se limitant à quelques propriétés bien connues de la divergence de Kullback-Leibler. Puis, on propose des algorithmes pour la minimisation du regret dans les problèmes de bandit stochastique paramétrique dont les bras appartiennent à une certaine famille exponentielle ou non-paramétrique en supposant seulement que les bras sont à support dans l'intervalle unité, pour lesquels on prouve l'optimalité asymptotique (au sens de la borne inférieure de Lai et Robbins) et l'optimalité minimax. On analyse aussi la complexité pour l'échantillonnage séquentielle visant à identifier la distribution ayant la moyenne la plus proche d'un seuil fixé, avec ou sans l'hypothèse que les moyennes des bras forment une suite croissante. Ce travail est motivé par l'étude des essais cliniques de phase I, où l'hypothèse de croissance est naturelle. Finalement, on étend l'inégalité de Fano qui contrôle la probabilité d'évènements disjoints avec une moyenne de divergences de Kullback-leibler à des variables aléatoires arbitraires bornées sur l'intervalle unité. Plusieurs nouvelles applications en découlent, les plus importantes étant une borne inférieure sur la vitesse de concentration de l'a posteriori Bayésien et une borne inférieure sur le regret pour un problème de bandit non-stochastique.
The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work…
Advisors/Committee Members: Garivier, Aurélien (thesis director), Stoltz, Gilles (thesis director).
Subjects/Keywords: Bandits stochastiques multi-bras; Théorie de l'information; Bornes inférieures non-asymptotiques; Analyse du regret; Optimalité asymptotique; Optimalité minimax; Borne supérieure de confiance; Stochastic multi-armed bandits; Information theory
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Ménard, P. (2018). Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems. (Doctoral Dissertation). Université Toulouse III – Paul Sabatier. Retrieved from http://www.theses.fr/2018TOU30087
Chicago Manual of Style (16th Edition):
Ménard, Pierre. “Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems.” 2018. Doctoral Dissertation, Université Toulouse III – Paul Sabatier. Accessed April 14, 2021.
http://www.theses.fr/2018TOU30087.
MLA Handbook (7th Edition):
Ménard, Pierre. “Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems.” 2018. Web. 14 Apr 2021.
Vancouver:
Ménard P. Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems. [Internet] [Doctoral dissertation]. Université Toulouse III – Paul Sabatier; 2018. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2018TOU30087.
Council of Science Editors:
Ménard P. Sur la notion d'optimalité dans les problèmes de bandit stochastique : On the notion of optimality in the stochastic multi-armed bandit problems. [Doctoral Dissertation]. Université Toulouse III – Paul Sabatier; 2018. Available from: http://www.theses.fr/2018TOU30087
22.
Jedor, Matthieu.
Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation.
Degree: Docteur es, Mathématiques appliquées, 2020, université Paris-Saclay
URL: http://www.theses.fr/2020UPASM027
► Dans cette thèse de doctorat, nous étudions l'optimisation des systèmes de recommandation dans le but de fournir des suggestions de produits plus raffinées pour un…
(more)
▼ Dans cette thèse de doctorat, nous étudions l'optimisation des systèmes de recommandation dans le but de fournir des suggestions de produits plus raffinées pour un utilisateur.La tâche est modélisée à l'aide du cadre des bandits multi-bras.Dans une première partie, nous abordons deux problèmes qui se posent fréquemment dans les systèmes de recommandation : le grand nombre d'éléments à traiter et la gestion des contenus sponsorisés.Dans une deuxième partie, nous étudions les performances empiriques des algorithmes de bandit et en particulier comment paramétrer les algorithmes traditionnels pour améliorer les résultats dans les environnements stationnaires et non stationnaires qui l'on rencontre en pratique.Cela nous amène à analyser à la fois théoriquement et empiriquement l'algorithme glouton qui, dans certains cas, est plus performant que l'état de l'art.
In this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art.
Advisors/Committee Members: Perchet, Vianney (thesis director).
Subjects/Keywords: Apprentissage par renforcement; Bandit multi-Bras; Système de recommandation; Commerce en ligne; Reinforcement learning; Multi-Armed bandits; Recommender system; E-Commerce
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Jedor, M. (2020). Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation. (Doctoral Dissertation). université Paris-Saclay. Retrieved from http://www.theses.fr/2020UPASM027
Chicago Manual of Style (16th Edition):
Jedor, Matthieu. “Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation.” 2020. Doctoral Dissertation, université Paris-Saclay. Accessed April 14, 2021.
http://www.theses.fr/2020UPASM027.
MLA Handbook (7th Edition):
Jedor, Matthieu. “Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation.” 2020. Web. 14 Apr 2021.
Vancouver:
Jedor M. Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation. [Internet] [Doctoral dissertation]. université Paris-Saclay; 2020. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2020UPASM027.
Council of Science Editors:
Jedor M. Bandit algorithms for recommender system optimization : Algorithmes de bandit pour l'optimisation des systèmes de recommandation. [Doctoral Dissertation]. université Paris-Saclay; 2020. Available from: http://www.theses.fr/2020UPASM027

Université Paris-Sud – Paris XI
23.
Galichet, Nicolas.
Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires.
Degree: Docteur es, Informatique, 2015, Université Paris-Sud – Paris XI
URL: http://www.theses.fr/2015PA112242
► Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed…
(more)
▼ Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014).
This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical…
Advisors/Committee Members: Sebag, Michèle (thesis director).
Subjects/Keywords: Prise de décision séquentielle; Apprentissage automatique; Bandits manchots; Sous-échantillonnage; Aversion au risque; CVaR; Exploration vs Exploitation vs Risque; Bandits linéaires; Bandits contextuels; Analyse de regret; Sequential decision making; Machine learning; Multi-armed bandits; Sub-Sampling; Risk-aversion; CvaR; Exploration vs Exploitation vs Safety; Linear bandits; Contextual bandits; Regret analysis
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Galichet, N. (2015). Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires. (Doctoral Dissertation). Université Paris-Sud – Paris XI. Retrieved from http://www.theses.fr/2015PA112242
Chicago Manual of Style (16th Edition):
Galichet, Nicolas. “Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires.” 2015. Doctoral Dissertation, Université Paris-Sud – Paris XI. Accessed April 14, 2021.
http://www.theses.fr/2015PA112242.
MLA Handbook (7th Edition):
Galichet, Nicolas. “Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires.” 2015. Web. 14 Apr 2021.
Vancouver:
Galichet N. Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires. [Internet] [Doctoral dissertation]. Université Paris-Sud – Paris XI; 2015. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2015PA112242.
Council of Science Editors:
Galichet N. Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits : Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires. [Doctoral Dissertation]. Université Paris-Sud – Paris XI; 2015. Available from: http://www.theses.fr/2015PA112242

University of Illinois – Urbana-Champaign
24.
Yekkehkhany, Ali.
Risk-averse multi-armed bandits and game theory.
Degree: PhD, Electrical & Computer Engr, 2020, University of Illinois – Urbana-Champaign
URL: http://hdl.handle.net/2142/108439
► The multi-armed bandit (MAB) and game theory literature is mainly focused on the expected cumulative reward and the expected payoffs in a game, respectively. In…
(more)
▼ The
multi-
armed bandit (MAB) and game theory literature is mainly focused on the expected cumulative reward and the expected payoffs in a game, respectively. In contrast, the rewards and the payoffs are often random variables whose expected values only capture a vague idea of the overall distribution. The focus of this dissertation is to study the fundamental limits of the existing
bandits and game theory problems in a risk-averse framework and propose new ideas that address the shortcomings. The author believes that human beings are mostly risk-averse, so studying
multi-
armed bandits and game theory from the point of view of risk aversion, rather than expected reward/payoff, better captures reality. In this manner, a specific class of
multi-
armed bandits, called explore-then-commit
bandits, and stochastic games are studied in this dissertation, which are based on the notion of Risk-Averse Best Action Decision with Incomplete Information (R-ABADI, Abadi is the maiden name of the author's mother). The goal of the classical
multi-
armed bandits is to exploit the arm with the maximum score defined as the expected value of the arm reward. Instead, we propose a new definition of score that is derived from the joint distribution of all arm rewards and captures the reward of an arm relative to those of all other arms. We use a similar idea for games and propose a risk-averse R-ABADI equilibrium in game theory that is possibly different from the Nash equilibrium. The payoff distributions are taken into account to derive the risk-averse equilibrium, while the expected payoffs are used to find the Nash equilibrium. The fundamental properties of games, e.g. pure and mixed risk-averse R-ABADI equilibrium and strict dominance, are studied in the new framework and the results are expanded to finite-time games. Furthermore, the stochastic congestion games are studied from a risk-averse perspective and three classes of equilibria are proposed for such games. It is shown by examples that the risk-averse behavior of travelers in a stochastic congestion game can improve the price of anarchy in Pigou and Braess networks. Furthermore, the Braess paradox does not occur to the extent proposed originally when travelers are risk-averse.
We also study an online affinity scheduling problem with no prior knowledge of the task arrival rates and processing rates of different task types on different servers. We propose the Blind GB-PANDAS algorithm that utilizes an exploration-exploitation scheme to load balance incoming tasks on servers in an online fashion. We prove that Blind GB-PANDAS is throughput optimal, i.e. it stabilizes the system as long as the task arrival rates are inside the capacity region. The Blind GB-PANDAS algorithm is compared to FCFS, Max-Weight, and c-mu-rule algorithms in terms of average task completion time through simulations, where the same exploration-exploitation approach as Blind GB-PANDAS is used for Max-Weight and c-μ-rule. The extensive simulations show that the Blind GB-PANDAS algorithm conspicuously…
Advisors/Committee Members: Nagi, Rakesh (advisor), Nagi, Rakesh (Committee Chair), Hajek, Bruce (committee member), Shomorony, Ilan (committee member), Srikant, Rayadurgam (committee member).
Subjects/Keywords: Online Learning; Multi-Armed Bandits; Exploration-Exploitation; Explore-Then-Commit Bandits; Risk-Aversion; Game Theory; Stochastic Game Theory; Congestion Games; Affinity Scheduling; MapReduce; Data Center
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Yekkehkhany, A. (2020). Risk-averse multi-armed bandits and game theory. (Doctoral Dissertation). University of Illinois – Urbana-Champaign. Retrieved from http://hdl.handle.net/2142/108439
Chicago Manual of Style (16th Edition):
Yekkehkhany, Ali. “Risk-averse multi-armed bandits and game theory.” 2020. Doctoral Dissertation, University of Illinois – Urbana-Champaign. Accessed April 14, 2021.
http://hdl.handle.net/2142/108439.
MLA Handbook (7th Edition):
Yekkehkhany, Ali. “Risk-averse multi-armed bandits and game theory.” 2020. Web. 14 Apr 2021.
Vancouver:
Yekkehkhany A. Risk-averse multi-armed bandits and game theory. [Internet] [Doctoral dissertation]. University of Illinois – Urbana-Champaign; 2020. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/2142/108439.
Council of Science Editors:
Yekkehkhany A. Risk-averse multi-armed bandits and game theory. [Doctoral Dissertation]. University of Illinois – Urbana-Champaign; 2020. Available from: http://hdl.handle.net/2142/108439

Rice University
25.
Lan, Shiting.
Machine Learning Techniques for Personalized Learning.
Degree: PhD, Engineering, 2016, Rice University
URL: http://hdl.handle.net/1911/96250
► Recent developments in personalized learning, powered by recent advances in machine learning and big data, have the potential to revamp the “one-size-fits-all” approach in today’s…
(more)
▼ Recent developments in personalized learning, powered by recent advances in
machine learning and big data, have the potential to revamp the “one-size-fits-all”
approach in today’s education by delivering a fully personalized learning experience
for each student. The key behind these developments is to create a personalized
learning system (PLS), which can automatically deliver analytics and feedback on the
students’ progress and recommend learning actions for the students to take. A PLS
presents a scalable approach to personalized learning by analyzing the data students
generate while interacting with learning resources (i.e., textbook sections, lecture
videos, assessment questions, etc). Such an approach relies on only minimal human
effort and has the ability to scale to applications with millions of students, thousands
of learning resources, and hundreds of domains. In this thesis, we develop a series of
machine learning techniques for personalized learning, building on our previous work
on sparse factor analysis (SPARFA) for learning and content analytics.
To begin with, we develop a new, nonlinear latent variable model that we call
the dealbreaker model, in which a student’s success probability is determined by their
weakest concept mastery. We develop efficient parameter inference algorithms for this
model using novel methods for nonconvex optimization. We demonstrate that the
dealbreaker model excels at prediction and the parameters learned are interpretable:
they provide key insights into which concepts are critical (i.e., the “dealbreakers”) to
answering a question correctly. We also apply the dealbreaker model to a movie rating
dataset, illustrating its broad applicability to applications other than education.
Then, we propose SPARFA-Trace, a new framework for time-varying learning and
content analytics. We develop a novel message passing-based, blind, approximate
Kalman filtering and smoothing algorithm for SPARFA that jointly traces student
concept knowledge evolution over time, analyzes student concept knowledge state
transitions (induced by studying learning resources, such as textbook sections, lecture
videos, etc., or the forgetting effect), and estimates the content organization and
difficulty of the questions in assessments. These quantities are estimated solely from
binary-valued (correct/incorrect) graded student response data and the specific actions each student performs (e.g., answering a question or studying a learning resource) at
each time instant.
Additionally, we study the problem of automatic grading and feedback generation
for the kinds of open response mathematical questions that figure prominently in
STEM (science, technology, engineering, and mathematics) courses. Our data-driven
framework for mathematical language processing (MLP) leverages solution data from
a large number of students to evaluate the correctness of their solutions, assign partialcredit
scores, and provide feedback to each student on the likely locations of any errors.
MLP takes inspiration from the success of natural…
Advisors/Committee Members: Baraniuk, Richard (advisor).
Subjects/Keywords: Personalized learning; machine learning; convex optimization; Bayesian nonparametrics; contextual multi-armed bandits; Kalman filtering; educational data mining
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lan, S. (2016). Machine Learning Techniques for Personalized Learning. (Doctoral Dissertation). Rice University. Retrieved from http://hdl.handle.net/1911/96250
Chicago Manual of Style (16th Edition):
Lan, Shiting. “Machine Learning Techniques for Personalized Learning.” 2016. Doctoral Dissertation, Rice University. Accessed April 14, 2021.
http://hdl.handle.net/1911/96250.
MLA Handbook (7th Edition):
Lan, Shiting. “Machine Learning Techniques for Personalized Learning.” 2016. Web. 14 Apr 2021.
Vancouver:
Lan S. Machine Learning Techniques for Personalized Learning. [Internet] [Doctoral dissertation]. Rice University; 2016. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/1911/96250.
Council of Science Editors:
Lan S. Machine Learning Techniques for Personalized Learning. [Doctoral Dissertation]. Rice University; 2016. Available from: http://hdl.handle.net/1911/96250

Universitat Pompeu Fabra
26.
Wilhelmi Roca, Francesc.
Towards spatial reuse in future wireless local area networks: a sequential learning approach.
Degree: Departament de Tecnologies de la Informació i les Comunicacions, 2020, Universitat Pompeu Fabra
URL: http://hdl.handle.net/10803/669970
► L'operació de reutilització espacial (SR) està guanyant impuls per a la darrera família d'estàndards IEEE 802.11 a causa dels aclaparadors requisits que presenten les xarxes…
(more)
▼ L'operació de reutilització espacial (SR) està guanyant impuls per a la darrera família d'estàndards IEEE 802.11 a causa dels aclaparadors requisits que presenten les xarxes sense fils de nova generació. En particular, la creixent necessitat de tràfic i el nombre de dispositius concurrents comprometen l'eficiència de les xarxes d'àrea local sense fils (WLANs) cada cop més concorregudes i posen en dubte la seva naturalesa descentralitzada. L'operació SR, inicialment introduïda per l'estàndard IEEE 802.11ax-2021 i estudiada posteriorment a IEEE 802.11be-2024, pretén augmentar el nombre de transmissions concurrents en un conjunt bàsic de serveis superposats (OBSS) mitjançant l'ajustament de la sensibilitat i el control de potència de transmissió, millorant així l'eficiència espectral. El nostre estudi sobre el funcionament de SR mostra un potencial destacat per millorar el nombre de transmissions simultànies en desplegaments multitudinaris, contribuint així al desenvolupament d'aplicacions de nova generació de baixa latència. Tot i això, els beneficis potencials de SR són actualment limitats per la rigidesa del mecanisme introduït per a l'11ax, i la manca de coordinació entre els BSS que ho implementen. L'operació SR evoluciona cap a esquemes coordinats on cooperen diferents BSS. En canvi, la coordinació comporta una sobrecàrrega de comunicació i sincronització, el qual té un impacte en el rendiment de les WLAN. D'altra banda, l'esquema coordinat és incompatible amb els dispositius que utilitzen versions anteriors IEEE 802.11, la qual cosa podria deteriorar el rendiment de les xarxes ja existents. Per aquests motius, en aquesta tesi s'avalua la viabilitat de mecanismes descentralitzats per a SR i s'analitzen minuciosament els principals impediments i mancances que se'n poden derivar. El nostre objectiu és donar llum a la futura forma de les WLAN pel que fa a l?optimització de SR i si s'ha de mantenir el seu caràcter descentralitzat, o bé és preferible evolucionar cap a desplegaments coordinats i centralitzats. Per abordar SR de forma descentralitzada, ens centrem en la Intel·ligència Artificial (AI) i ens proposem utilitzar una classe de mètodes seqüencials basats en l'aprenentatge, anomenats
Multi-
Armed Bandits (MAB). L'esquema MAB s'adapta al problema descentralitzat de SR perquè aborda la incertesa causada pel funcionament simultani de diversos dispositius (és a dir, un entorn
multi-jugador) i la falta d'informació que se'n deriva. Els MAB poden fer front a la complexitat darrera les interaccions espacials entre dispositius que resulten de modificar la seva sensibilitat i potència de transmissió. En aquest sentit, els nostres resultats indiquen guanys importants de rendiment (fins al 100 %) en desplegaments altament densos. Tot i això, l'aplicació d'aprenentatge automàtic amb múltiples agents planteja diversos problemes que poden comprometre el rendiment dels dispositius d'una xarxa (definició d'objectius conjunts, horitzó de convergència, aspectes d'escalabilitat o manca d'estacionarietat). A més, el nostre estudi…
Advisors/Committee Members: [email protected] (authoremail), true (authoremailshow), Bellalta, Boris (director), Cano, Cristina (director), Jonsson, Anders, 1973- (director).
Subjects/Keywords: Artificial Intelligence; IEEE 802.11ax; Multi-armed bandits; Sequential learning; Spatial reuse; WLAN; Intel·ligència artifical; Aprenentatge seqüencial; Reutilizació espacial; 62
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wilhelmi Roca, F. (2020). Towards spatial reuse in future wireless local area networks: a sequential learning approach. (Thesis). Universitat Pompeu Fabra. Retrieved from http://hdl.handle.net/10803/669970
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Wilhelmi Roca, Francesc. “Towards spatial reuse in future wireless local area networks: a sequential learning approach.” 2020. Thesis, Universitat Pompeu Fabra. Accessed April 14, 2021.
http://hdl.handle.net/10803/669970.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Wilhelmi Roca, Francesc. “Towards spatial reuse in future wireless local area networks: a sequential learning approach.” 2020. Web. 14 Apr 2021.
Vancouver:
Wilhelmi Roca F. Towards spatial reuse in future wireless local area networks: a sequential learning approach. [Internet] [Thesis]. Universitat Pompeu Fabra; 2020. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/10803/669970.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Wilhelmi Roca F. Towards spatial reuse in future wireless local area networks: a sequential learning approach. [Thesis]. Universitat Pompeu Fabra; 2020. Available from: http://hdl.handle.net/10803/669970
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
27.
Dorff, Rebecca.
Modelling Infertility with Markov Chains.
Degree: MS, 2013, Brigham Young University
URL: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5069&context=etd
Infertility affects approximately 15% of couples. Testing and interventions are costly, in time, money, and emotional energy. This paper will discuss using Markov decision and multi-armed bandit processes to identify a systematic approach of interventions that will lead to the desired baby while minimizing costs.
Subjects/Keywords: Infertility; Medical Diagnosis; Markov Decision Processes; Multi-armed Bandits; Mathematics
…multi-armed bandit
approach. Multi-armed bandits have gained popularity in simulating clinical… …failures [49].
The difficulty of using a straight multi-armed bandit problem with… …to the end of the horizon for any optimal policy.
3.4
Multi-Armed Bandit Problem
One… …class of Markov decision processes is the multi-armed bandit problem, or N-armed
bandit… …um = sm for some m = i
Multi-armed bandit problems have been used to simulate clinical…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Dorff, R. (2013). Modelling Infertility with Markov Chains. (Masters Thesis). Brigham Young University. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5069&context=etd
Chicago Manual of Style (16th Edition):
Dorff, Rebecca. “Modelling Infertility with Markov Chains.” 2013. Masters Thesis, Brigham Young University. Accessed April 14, 2021.
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5069&context=etd.
MLA Handbook (7th Edition):
Dorff, Rebecca. “Modelling Infertility with Markov Chains.” 2013. Web. 14 Apr 2021.
Vancouver:
Dorff R. Modelling Infertility with Markov Chains. [Internet] [Masters thesis]. Brigham Young University; 2013. [cited 2021 Apr 14].
Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5069&context=etd.
Council of Science Editors:
Dorff R. Modelling Infertility with Markov Chains. [Masters Thesis]. Brigham Young University; 2013. Available from: https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=5069&context=etd

Université de Lorraine
28.
Collet, Timothé.
Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification.
Degree: Docteur es, Informatique, 2016, Université de Lorraine
URL: http://www.theses.fr/2016LORR0084
► La classification se base sur un jeu de données étiquetées par un expert. Plus le jeu de données est grand, meilleure est la performance de…
(more)
▼ La classification se base sur un jeu de données étiquetées par un expert. Plus le jeu de données est grand, meilleure est la performance de classification. Pourtant, la requête à un expert peut parfois être coûteuse. Le but de l'apprentissage actif est alors de minimiser le nombre de requêtes à l'expert. La collection des données non-étiquetées reste aisée cependant et illimitée, il est donc nécessaire de faire un choix sur les données à annoter, l'idée est alors de profiter de ce choix pour maximiser les performances en ne lui fournissant que les données les plus informatives à étiqueter. Pourtant, le niveau d'informativité de chaque donnée ne peut pas être calculé exactement et ne peut être estimé qu'à une incertitude près. Améliorer la précision de l'estimation nécessite d'annoter de nouvelles données. Il y a donc un dilemme entre utiliser le budget d'annotations disponible pour améliorer la performance du classifieur selon l'estimation actuelle du critère ou pour améliorer la précision sur le critère. Ce dilemme est bien connu dans le cadre de l'optimisation en budget fini sous le nom de dilemme entre exploration et exploitation. Les solutions usuelles pour résoudre ce dilemme dans ce contexte font usage du principe d'Optimisme Face à l'Incertitude. Dans cette thèse, nous montrons donc qu'il est possible d'adapter ce principe au problème d'apprentissage actif pour la classification. Pour cela, plusieurs algorithmes ont été être développés pour des classifieurs de complexité croissante, chacun utilisant le principe de l'Optimisme Face à l'Incertitude, et leurs résultats ont été évalués empiriquement
A Classification problem makes use of a training set consisting of data labeled by an oracle. The larger the training set, the best the performance. However, requesting the oracle may be costly. The goal of Active Learning is thus to minimize the number of requests to the oracle while achieving the best performance. To do so, the data that are presented to the oracle must be carefully selected among a large number of unlabeled instances acquired at no cost. However, the true profitability of labeling a particular instance may not be known perfectly. It can therefore be estimated along with a measure of uncertainty. To Increase the precision on the estimate, we need to label more data. Thus, there is a dilemma between labeling data in order to increase the performance of the classifier or to better know how to select data. This dilemma is well studied in the context of finite budget optimization under the name of exploration versus exploitation dilemma. The most famous solutions make use of the principle of Optimism in the Face of Uncertainty. In this thesis, we show that it is possible to adapt this principle to the active learning problem for classification. Several algorithms have been developed for classifiers of increasing complexity, each one of them using the principle of Optimism in the Face of Uncertainty, and their performances have been empirically evaluated
Advisors/Committee Members: Pietquin, Olivier (thesis director).
Subjects/Keywords: Optimisme face à l'incertitude; Classification; Apprentissage actif; Bandits à bras multiples; Optimism in the Face of Uncertainty; Classification; Active Learning; Multi-armed Bandit; 006.33
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Collet, T. (2016). Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification. (Doctoral Dissertation). Université de Lorraine. Retrieved from http://www.theses.fr/2016LORR0084
Chicago Manual of Style (16th Edition):
Collet, Timothé. “Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification.” 2016. Doctoral Dissertation, Université de Lorraine. Accessed April 14, 2021.
http://www.theses.fr/2016LORR0084.
MLA Handbook (7th Edition):
Collet, Timothé. “Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification.” 2016. Web. 14 Apr 2021.
Vancouver:
Collet T. Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification. [Internet] [Doctoral dissertation]. Université de Lorraine; 2016. [cited 2021 Apr 14].
Available from: http://www.theses.fr/2016LORR0084.
Council of Science Editors:
Collet T. Méthodes optimistes d’apprentissage actif pour la classification : Optimistic Methods in Active Learning for Classification. [Doctoral Dissertation]. Université de Lorraine; 2016. Available from: http://www.theses.fr/2016LORR0084

National University of Ireland – Galway
29.
Hassan, Umair ul.
Adaptive task assignment in spatial crowdsourcing
.
Degree: 2016, National University of Ireland – Galway
URL: http://hdl.handle.net/10379/6035
► Spatial crowdsourcing has emerged as a new paradigm for solving difficult problems in the physical world. It engages a large number of human workers in…
(more)
▼ Spatial crowdsourcing has emerged as a new paradigm for solving difficult problems in the physical world. It engages a large number of human workers in scenarios such as crisis management and smart cities. The utility of spatial crowdsourcing is generally dependent on who performs the task. On one hand, the workers may not perform assigned tasks within time. On the other hand, the performed tasks may not meet the prede ned quality criteria. In the case of spatial crowdsourcing, the success of an assignment depends on factors such as task location, travel distance, or expected reward. This necessitates an appropriate task assignment process to optimize the utility of spatial crowdsourcing.
The design of the task assignment process faces three primary research challenges: dynamism, heterogeneity, and uncertainty. The goal of this thesis is to address these research challenges; therefore, it proposes a conceptual framework to analyze the dimensions of dynamic task assignment in spatial crowdsourcing. To formalize the research problem, this thesis de nes dynamic task assignment as a repeated decision-making problem under uncertainty and heterogeneity.
Uncertainty de nes the limited knowledge about the reliability of workers for assigned tasks, and heterogeneity characterizes the di erences in reliability and expertise of workers. The proposed formalization is referred to as the adaptive assignment problem in spatial crowdsourcing. The adaptive assignment problem combines online optimization with heuristic-based learning for balancing the exploration-exploitation trade-o during repeated assignments. The problem is instantiated in four di erent scenarios of spatial crowdsourcing to highlight speci c requirements of assignment algorithms. An agent-based simulation methodology is employed to evaluate the algorithms proposed for each scenario. Empirical evaluation provides evidence of the performance of algorithms using synthetic and real-world data.
Advisors/Committee Members: Curry, Edward (advisor).
Subjects/Keywords: Spatial crowdsourcing;
Crowdsourcing;
Task assignment;
Online algorithms;
Multi-armed bandit;
Combinatorial bandits;
Fractional optimization;
location diversity;
Location diversity;
Agent-based simulation;
Data analytics
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hassan, U. u. (2016). Adaptive task assignment in spatial crowdsourcing
. (Thesis). National University of Ireland – Galway. Retrieved from http://hdl.handle.net/10379/6035
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Hassan, Umair ul. “Adaptive task assignment in spatial crowdsourcing
.” 2016. Thesis, National University of Ireland – Galway. Accessed April 14, 2021.
http://hdl.handle.net/10379/6035.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Hassan, Umair ul. “Adaptive task assignment in spatial crowdsourcing
.” 2016. Web. 14 Apr 2021.
Vancouver:
Hassan Uu. Adaptive task assignment in spatial crowdsourcing
. [Internet] [Thesis]. National University of Ireland – Galway; 2016. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/10379/6035.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Hassan Uu. Adaptive task assignment in spatial crowdsourcing
. [Thesis]. National University of Ireland – Galway; 2016. Available from: http://hdl.handle.net/10379/6035
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
30.
Lattimore, Finnian Rachel.
Learning how to act: making good decisions with machine learning
.
Degree: 2017, Australian National University
URL: http://hdl.handle.net/1885/144602
► This thesis is about machine learning and statistical approaches to decision making. How can we learn from data to anticipate the consequence of, and optimally…
(more)
▼ This thesis is about machine learning and statistical approaches
to decision making. How can we learn from data to anticipate the
consequence of, and optimally select, interventions or actions?
Problems such as deciding which medication to prescribe to
patients, who should be released on bail, and how much to charge
for insurance are ubiquitous, and have far reaching impacts on
our lives. There are two fundamental approaches to learning how
to act: reinforcement learning, in which an agent directly
intervenes in a system and learns from the outcome, and
observational causal inference, whereby we seek to infer the
outcome of an intervention from observing the system.
The goal of this thesis to connect and unify these key
approaches. I introduce causal bandit problems: a synthesis that
combines causal graphical models, which were developed for
observational causal inference, with multi-armed bandit problems,
which are a subset of reinforcement learning problems that are
simple enough to admit formal analysis. I show that knowledge of
the causal structure allows us to transfer information learned
about the outcome of one action to predict the outcome of an
alternate action, yielding a novel form of structure between
bandit arms that cannot be exploited by existing algorithms. I
propose an algorithm for causal bandit problems and prove bounds
on the simple regret demonstrating it is close to mini-max
optimal and better than algorithms that do not use the additional
causal information.
Subjects/Keywords: machine learning;
causal inference;
causality;
reinforcement learning;
multi-armed bandits
…observational causal
inference problem over non-i.i.d data.
Both multi-armed bandits and observational… …encapsulated by multi-armed
bandits. This synthesis allows us to represent knowledge of how variables… …within reinforcement
learning are multi-armed bandit problems. They describe settings in which… …viewpoint, including traditional randomised experiments (§3.1) and multi-armed bandit… …bandit problems that unify causal graphical models and
multi-armed bandit problems. Bandit arms…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Lattimore, F. R. (2017). Learning how to act: making good decisions with machine learning
. (Thesis). Australian National University. Retrieved from http://hdl.handle.net/1885/144602
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Lattimore, Finnian Rachel. “Learning how to act: making good decisions with machine learning
.” 2017. Thesis, Australian National University. Accessed April 14, 2021.
http://hdl.handle.net/1885/144602.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Lattimore, Finnian Rachel. “Learning how to act: making good decisions with machine learning
.” 2017. Web. 14 Apr 2021.
Vancouver:
Lattimore FR. Learning how to act: making good decisions with machine learning
. [Internet] [Thesis]. Australian National University; 2017. [cited 2021 Apr 14].
Available from: http://hdl.handle.net/1885/144602.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Lattimore FR. Learning how to act: making good decisions with machine learning
. [Thesis]. Australian National University; 2017. Available from: http://hdl.handle.net/1885/144602
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
◁ [1] [2] ▶
.