HOME
*



picture info

End-to-end Reinforcement Learning
Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g. every pixel rendered to the screen in a video game) and decide what actions to perform to optimize an objective (e.g. maximizing the game score). Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and Health care, healthcare. Overview Deep learning Deep learning is a form of machine learning that utilizes a neural network to transform a set of inputs into a set of outputs ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Machine Learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F.,Voronoi-Based Multi-Robot Autonomous Exploration in Unknown Environments via Deep Reinforcement Learning IEEE Transactions on Vehicular Technology, 2020. A subset of machine learning is closely related to computational statistics, which focuses on making predicti ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Optimal Control
Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory. Optimal control is an extension of the calculus of variations, and is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s, after contributions to calcu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

AlphaGo
AlphaGo is a computer program that plays the board game Go. It was developed by DeepMind Technologies a subsidiary of Google (now Alphabet Inc.). Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules. AlphaGo and its successors use a Monte Carlo tree search algorithm to find its moves based on knowledge previously acquired by machine learning, specifically by an artificial neural network (a deep learning method) by extensive training, both from human and computer play. A neural network is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Convolutional Neural Network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation- equivariant responses known as feature maps. Counter-intuitively, most convolutional neural networks are not invariant to translation, due to the downsampling operation they apply to the input. They have applications in image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain–computer interfaces, and financial time series. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Q-learning
''Q''-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), ''Q''-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. ''Q''-learning can identify an optimal action-selection policy for any given FMDP, given infinite exploration time and a partly-random policy. "Q" refers to the function that the algorithm computes – the expected rewards for an action taken in a given state. Reinforcement learning Reinforcement learning involves an agent, a set of ''states'' , and a set of ''actions'' per state. By performing an action a \in A, the agent transitions from state to state. Executing an action i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Atari
Atari () is a brand name that has been owned by several entities since its inception in 1972. It is currently owned by French publisher Atari SA through a subsidiary named Atari Interactive. The original Atari, Inc., founded in Sunnyvale, California, in 1972 by Nolan Bushnell and Ted Dabney, was a pioneer in arcade games, home video game consoles and home computers. The company's products, such as '' Pong'' and the Atari 2600, helped define the electronic entertainment industry from the 1970s to the mid-1980s. In 1984, as a result of the video game crash of 1983, the home console and computer divisions of the original Atari Inc. were sold off, and the company was renamed Atari Games Inc. Atari Games received the rights to use the logo and brand name with appended text "Games" on arcade games, as well as the derivative coin-operated arcade rights to the original 1972–1984 arcade hardware properties. The Atari Consumer Electronics Division properties were in turn sold ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




DeepMind
DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was acquired by Google in 2014 and became a wholly owned subsidiary of Alphabet Inc, after Google's restructuring in 2015. The company is based in London, with research centres in Canada, France, and the United States. DeepMind has created a neural network that learns how to play video games in a fashion similar to that of humans, as well as a Neural Turing machine, or a neural network that may be able to access an external memory like a conventional Turing machine, resulting in a computer that mimics the short-term memory of the human brain. DeepMind made headlines in 2016 after its AlphaGo program beat a human professional Go player Lee Sedol, a world champion, in a five-game match, which was the subject of a documentary film. A more general program, AlphaZero, beat the most powerful programs playing go, chess and shogi (Japanese c ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


John Tsitsiklis
John N. Tsitsiklis ( el, Γιάννης Ν. Τσιτσικλής; born 1958) is a Clarence J. Lebel Professor of Electrical Engineering with the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology. He serves as the director of the Laboratory for Information and Decision Systems and is affiliated with the Institute for Data, Systems, and Society (IDSS), the Statistics and Data Science Center and the MIT Operations Research Center. Education Tsitsiklis received a B.S. degree in Mathematics (1980), and his B.S. (1980), M.S. (1981), and Ph.D. (1984) degrees in Electrical Engineering, all from the Massachusetts Institute of Technology in Cambridge, Massachusetts. Awards and honors Tsitsiklis was elected to the 2007 class of Fellows of the Institute for Operations Research and the Management Sciences. He won the "2016 ACM SIGMETRICS Achievement Award in recognition of his fundamental contributions to decentralized control and c ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Dimitri Bertsekas
Dimitri Panteli Bertsekas (born 1942, Athens, el, Δημήτρης Παντελής Μπερτσεκάς) is an applied mathematician, electrical engineer, and computer scientist, a McAfee Professor at the Department of Electrical Engineering and Computer Science in School of Engineering at the Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, and also a Fulton Professor of Computational Decision Making at Arizona State University, Tempe. Biography Bertsekas was born in Greece and lived his childhood there. He studied for five years at the National Technical University of Athens, Greece and studied for about a year and a half at The George Washington University, Washington, D.C., where he obtained his M.S. in electrical engineering in 1969, and for about two years at MIT, where he obtained his doctorate in system science in 1971. Prior to joining the MIT faculty in 1979, he taught for three years at the Engineering-Economic Systems Dept. of Stanford ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Andrew Barto
Andrew G. Barto (born 1948) is an American computer scientist, currently Professor Emeritus of computer science at University of Massachusetts Amherst. Barto is best known for his foundational contributions to the field of modern computational reinforcement learning. Early life and education Barto received his B.S. with distinction in mathematics from the University of Michigan in 1970, after having initially majored in naval architecture and engineering. After reading work by Michael Arbib and McCulloch and Pitts he became interested in using computers and mathematics to model the brain, and five years later was awarded a Ph.D. in computer science for a thesis on cellular automata. Career In 1977, Barto joined the College of Information and Computer Sciences at the University of Massachusetts Amherst as a postdoctoral research associate, was promoted to associate professor in 1982, and full professor in 1991. He was department chair from 2007 to 2011 and a core faculty m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Richard S
Richard is a male given name. It originates, via Old French, from Old Frankish and is a compound of the words descending from Proto-Germanic ''*rīk-'' 'ruler, leader, king' and ''*hardu-'' 'strong, brave, hardy', and it therefore means 'strong in rule'. Nicknames include "Richie", "Dick", "Dickon", " Dickie", " Rich", "Rick", " Rico", " Ricky", and more. Richard is a common English, German and French male name. It's also used in many more languages, particularly Germanic, such as Norwegian, Danish, Swedish, Icelandic, and Dutch, as well as other languages including Irish, Scottish, Welsh and Finnish. Richard is cognate with variants of the name in other European languages, such as the Swedish "Rickard", the Catalan "Ricard" and the Italian "Riccardo", among others (see comprehensive variant list below). People named Richard Multiple people with the same name * Richard Andersen (other) * Richard Anderson (other) * Richard Cartwright (other) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Temporal Difference Learning
Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment, like Monte Carlo methods, and perform updates based on current estimates, like dynamic programming methods. While Monte Carlo methods only adjust their estimates once the final outcome is known, TD methods adjust predictions to match later, more accurate, predictions about the future before the final outcome is known. (A revised version is available oRichard Sutton's publication page) This is a form of bootstrapping, as illustrated with the following example: :"Suppose you wish to predict the weather for Saturday, and you have some model that predicts Saturday's weather, given the weather of each day in the week. In the standard case, you would wait until Saturday and then adjust all your models. However, when it is, for example, Friday, you should have a pretty go ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]