Partially Observable Markov Decision Process

	Partially Observable Markov Decision Process A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model (the probability distribution of different observations given the underlying state) and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations (or belief states) to the actions. The POMDP framework is general enough to model a variety of real-world sequential decision processes. Applications include robot navigation problems, machine maintenance, and planning under uncertainty in general. The general framework of Markov decision processes with imperfect information was described by Karl Johan Åström in 1965 in the case of a discrete state space, and i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Principle Component Analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified. The principal components of a collection of points in a real coordinate space are a sequence of p unit vectors, where the i-th vector is the direction of a line that best fits the data while being orthogonal to the first i-1 vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions (i.e., principal components) constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Many studies use the first two principal components in order to plot the data in two dimensions and to visually ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Julia (programming Language) Julia is a high-level programming language, high-level, general-purpose programming language, general-purpose dynamic programming language, dynamic programming language, designed to be fast and productive, for e.g. data science, artificial intelligence, machine learning, modeling and simulation, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, a default just-in-time compilation, just-in-time (JIT) compiler (with support for ahead-of-time compilation) and an tracing garbage collection, efficient (multi-threaded) garbage collection implementation. Notably Julia does not support classes with encapsulated methods and instead it relies on structs with generic methods/functions not tied to them. By default, Julia is run similarly to scripting languages, using its runtime, and allows for read–eval–print loop, i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Parity Game A parity game is played on a colored directed graph, where each node has been colored by a priority – one of (usually) finitely many natural numbers. Two players, 0 and 1, move a (single, shared) token along the edges of the graph. The owner of the node that the token falls on selects the successor node (does the next move). The players keep moving the token, resulting in a (possibly infinite) path, called a play. The winner of a finite play is the player whose opponent is unable to move. The winner of an infinite play is determined by the priorities appearing in the play. Typically, player 0 wins an infinite play if the largest priority that occurs infinitely often in the play is even. Player 1 wins otherwise. This explains the word "parity" in the title. Parity games lie in the third level of the Borel hierarchy, and are consequently determined. Games related to parity games were implicitly used in Rabin's proof of decidability of the monadic second-order theory of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	EXPTIME In computational complexity theory, the complexity class EXPTIME (sometimes called EXP or DEXPTIME) is the set of all decision problems that are solvable by a deterministic Turing machine in exponential time, i.e., in O(2''p''(''n'')) time, where ''p''(''n'') is a polynomial function of ''n''. EXPTIME is one intuitive class in an exponential hierarchy of complexity classes with increasingly more complex oracles or quantifier alternations. For example, the class 2-EXPTIME is defined similarly to EXPTIME but with a doubly exponential time bound. This can be generalized to higher and higher time bounds. EXPTIME can also be reformulated as the space class APSPACE, the set of all problems that can be solved by an alternating Turing machine in polynomial space. EXPTIME relates to the other basic time and space complexity classes in the following way: P ⊆ NP ⊆ PSPACE ⊆ EXPTIME ⊆ NEXPTIME ⊆ EXPSPACE. Furthermore, by the time hierarchy theorem and the space hiera ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Finite-state Machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of ''State (computer science), states'' at any given time. The FSM can change from one state to another in response to some Input (computer science), inputs; the change from one state to another is called a ''transition''. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition. Finite-state machines are of two types—Deterministic finite automaton, deterministic finite-state machines and Nondeterministic finite automaton, non-deterministic finite-state machines. For any non-deterministic finite-state machine, an equivalent deterministic one can be constructed. The behavior of state machines can be observed in many devices in modern society that perform a predetermined sequence of actions d ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Parity Game A parity game is played on a colored directed graph, where each node has been colored by a priority – one of (usually) finitely many natural numbers. Two players, 0 and 1, move a (single, shared) token along the edges of the graph. The owner of the node that the token falls on selects the successor node (does the next move). The players keep moving the token, resulting in a (possibly infinite) path, called a play. The winner of a finite play is the player whose opponent is unable to move. The winner of an infinite play is determined by the priorities appearing in the play. Typically, player 0 wins an infinite play if the largest priority that occurs infinitely often in the play is even. Player 1 wins otherwise. This explains the word "parity" in the title. Parity games lie in the third level of the Borel hierarchy, and are consequently determined. Games related to parity games were implicitly used in Rabin's proof of decidability of the monadic second-order theory of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Büchi Automaton In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input. A non-deterministic Büchi automaton, later referred to just as a Büchi automaton, has a transition function which may have multiple outputs, leading to many possible paths for the same input; it accepts an infinite input if and only if some possible path is accepting. Deterministic and non-deterministic Büchi automata generalize deterministic finite automata and nondeterministic finite automata to infinite inputs. Each are types of ω-automata. Büchi automata ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Undecidable Problem In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an example: it can be proven that there is no algorithm that correctly determines whether an arbitrary program eventually halts when run. Background A decision problem is a question which, for every input in some infinite set of inputs, requires a "yes" or "no" answer. Those inputs can be numbers (for example, the decision problem "is the input a prime number?") or values of some other kind, such as strings of a formal language. The formal representation of a decision problem is a subset of the natural numbers. For decision problems on natural numbers, the set consists of those numbers that the decision problem answers "yes" to. For example, the decision problem "is the input even?" is formalized as the set of even numbers. A decision pr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Q-learning ''Q''-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment ( model-free). It can handle problems with stochastic transitions and rewards without requiring adaptations. For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time. For any finite Markov decision process, ''Q''-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. ''Q''-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy. "Q" refers to the function that the algorithm computes: th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Monte Carlo Tree Search In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree. MCTS was combined with neural networks in 2016 and has been used in multiple board games like Chess, Shogi, Checkers, Backgammon, Contract Bridge, Go, Scrabble, and Clobber as well as in turn-based-strategy video games (such as Total War: Rome II's implementation in the high level campaign AI) and applications outside of games. History Monte Carlo method The Monte Carlo method, which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. In his 1987 PhD thesis, Bruce Abramson combined minimax search with an ''expected-outcome model'' based on random game playouts to the end, instead of the usual static evaluation function. Abramson said the expected-out ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]