Exploration-exploitation Dilemma

	Exploration-exploitation Dilemma The exploration-exploitation dilemma, also known as the explore-exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best-known option based on past experiences, while exploration involves trying out new options that may lead to better outcomes in the future. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making situations, where the goal is to maximize long-term benefits. Application in machine learning In the context of machine learning, the exploration-exploitation tradeoff is often encountered in reinforcement learning, a type of machine learning that involves training agents to make decisions based on feedback from the environment.Richard S. Sutton; Andrew G. Barto (2020). Reinforcement Learning: An Introduction (2nd edition). http://incompleteideas.net/book/the-book-2nd.html The agent ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Reinforcement Learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathemat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Thompson Sampling Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief. Description Consider a set of contexts \mathcal, a set of actions \mathcal, and rewards in \mathbb. In each round, the player obtains a context x \in \mathcal, plays an action a \in \mathcal and receives a reward r \in \mathbb following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards. The elements of Thompson sampling are as follows: # a likelihood function P(r, \theta,a,x); # a set \Theta of parameters \theta of the distribution of r; # a prior distribution P(\theta) on these parameters; # past observations triplets \mathcal = \; # a posterior distribution P(\theta, \mathcal) \propto P(\mathcal, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Upper Confidence Bound Upper may refer to: * Shoe upper or ''vamp'', the part of a shoe on the top of the foot * Stimulant, drugs which induce temporary improvements in either mental or physical function or both * ''Upper'', the original film title for the 2013 found footage film ''The Upper Footage ''The Upper Footage'' (also known as ''Upper'') is a 2013 found footage film written and directed by Justin Cole. First released on January 31, 2013 to a limited run of midnight theatrical screenings at Landmark’s Sunshine Cinema in New York Cit ...'' See also {{Disambiguation ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Machine Learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F.,Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning IEEE Transactions on Vehicular Technology, 2020. A subset of machine learning is closely related to computational statistics, which focuses on making pred ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Strategy Strategy (from Greek στρατηγία ''stratēgia'', "art of troop leader; office of general, command, generalship") is a general plan to achieve one or more long-term or overall goals under conditions of uncertainty. In the sense of the " art of the general", which included several subsets of skills including military tactics, siegecraft, logistics etc., the term came into use in the 6th century C.E. in Eastern Roman terminology, and was translated into Western vernacular languages only in the 18th century. From then until the 20th century, the word "strategy" came to denote "a comprehensive way to try to pursue political ends, including the threat or actual use of force, in a dialectic of wills" in a military conflict, in which both adversaries interact. Strategy is important because the resources available to achieve goals are usually limited. Strategy generally involves setting goals and priorities, determining actions to achieve the goals, and mobilizing resources to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]