Rectifier (neural Networks)

picture info	Rectifier (neural Networks) In the context of Neural network (machine learning), artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the non-negative part of its argument, i.e., the ramp function: :\operatorname(x) = x^+ = \max(0, x) = \frac = \begin x & \text x > 0, \\ 0 & x \le 0 \end where x is the input to a Artificial neuron, neuron. This is analogous to half-wave rectification in electrical engineering. ReLU is one of the most popular activation functions for artificial neural networks, and finds application in computer vision and speech recognitionAndrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014)Rectifier Nonlinearities Improve Neural Network Acoustic Models using Deep learning, deep neural nets and computational neuroscience. History The ReLU was first used by Alston Scott Householder, Alston Householder in 1941 as a mathematical abstraction of biological neural networks. Kunihiko Fukushima in 1969 used R ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Ramp Function The ramp function is a unary real function, whose graph is shaped like a ramp. It can be expressed by numerous definitions, for example "0 for negative inputs, output equals input for non-negative inputs". The term "ramp" can also be used for other functions obtained by scaling and shifting, and the function in this article is the ''unit'' ramp function (slope 1, starting at 0). In mathematics, the ramp function is also known as the positive part. In machine learning, it is commonly known as a ReLU activation function or a rectifier in analogy to half-wave rectification in electrical engineering. In statistics (when used as a likelihood function) it is known as a tobit model. This function has numerous applications in mathematics and engineering, and goes by various names, depending on the context. There are differentiable variants of the ramp function. Definitions The ramp function () may be defined analytically in several ways. Possible definitions are: * A ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	ReLU And GELU In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the non-negative part of its argument, i.e., the ramp function: :\operatorname(x) = x^+ = \max(0, x) = \frac = \begin x & \text x > 0, \\ 0 & x \le 0 \end where x is the input to a neuron. This is analogous to half-wave rectification in electrical engineering. ReLU is one of the most popular activation functions for artificial neural networks, and finds application in computer vision and speech recognitionAndrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014)Rectifier Nonlinearities Improve Neural Network Acoustic Models using deep neural nets and computational neuroscience. History The ReLU was first used by Alston Householder in 1941 as a mathematical abstraction of biological neural networks. Kunihiko Fukushima in 1969 used ReLU in the context of visual feature extraction in hierarchical neural networks. Thirty years ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms of probability, axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure (mathematics), measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event (probability theory), event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of determinism, non-deterministic or uncertain processes or measured Quantity, quantities that may either be single occurrences or evolve over time in a random fashion). Although it is no ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Weight Initialization In deep learning, weight initialization or parameter initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters. The choice of weight initialization method affects the speed of convergence, the scale of neural activation within the network, the scale of gradient signals during backpropagation, and the quality of the final model. Proper initialization is necessary for avoiding issues such as vanishing and exploding gradients and activation function saturation. Note that even though this article is titled "weight initialization", both weights and biases are used in a neural network as trainable parameters, so this article describes how both of these are initialized. Similarly, trainable parameters in convolutional neural networks (CNNs) are called kernels and biases, and this article also de ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sparse Matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse but a common criterion is that the number of non-zero elements is roughly equal to the number of rows or columns. By contrast, if most of the elements are non-zero, the matrix is considered dense. The number of zero-valued elements divided by the total number of elements (e.g., ''m'' × ''n'' for an ''m'' × ''n'' matrix) is sometimes referred to as the sparsity of the matrix. Conceptually, sparsity corresponds to systems with few pairwise interactions. For example, consider a line of balls connected by springs from one to the next: this is a sparse system, as only adjacent balls are coupled. By contrast, if the same line of balls were to have springs connecting each ball to all other balls, the system would correspond to a dense matrix. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Restricted Boltzmann Machine A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBMs were initially proposed under the name Harmonium by Paul Smolensky in 1986, and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction, classification, collaborative filtering, feature learning, topic modelling,Ruslan Salakhutdinov and Geoffrey Hinton (2010)Replicated softmax: an undirected topic model. '' Neural Information Processing Systems'' 23. immunology, and even manybody quantum mechanics. They can be trained in either supervised or unsupervised ways, depending on the task. As their name implies, RBMs are a variant of Boltzmann machines, with the restrictio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Normal Distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac e^\,. The parameter is the mean or expectation of the distribution (and also its median and mode), while the parameter \sigma^2 is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. Their importance is partly due to the central limit theorem. It states that, under some conditions, the average of many samples (observations) of a random variable with finite mean and variance is itself a random variable—whose distribution c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Boltzmann Machine A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising model), named after Ludwig Boltzmann, is a spin glass, spin-glass model with an external field, i.e., a Spin glass#Sherrington–Kirkpatrick model, Sherrington–Kirkpatrick model, that is a stochastic Ising model. It is a statistical physics technique applied in the context of cognitive science. It is also classified as a Markov random field. Boltzmann machines are theoretically intriguing because of the locality and Hebbian nature of their training algorithm (being trained by Hebb's rule), and because of their Parallelism (computing), parallelism and the resemblance of their dynamics to simple physical processes. Boltzmann machines with unconstrained connectivity have not been proven useful for practical problems in machine learning or inference, but if the connectivity is properly constrained, the learning can be made efficient enough to be useful for practical problems. Th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Softplus In mathematics and machine learning, the softplus function is : f(x) = \log(1 + e^x). It is a smooth approximation (in fact, an analytic function) to the ramp function, which is known as the ''rectifier'' or ''ReLU (rectified linear unit)'' in machine learning. For large negative x it is \log(1 + e^x) = \log (1 + \epsilon) \gtrapprox \log 1 = 0, so just above 0, while for large positive x it is \log(1 + e^x) \gtrapprox \log(e^x) = x, so just above x. The names ''softplus'' and ''SmoothReLU'' are used in machine learning. The name "softplus" (2000), by analogy with the earlier softmax (1989) is presumably because it is a smooth (''soft'') approximation of the positive part of , which is sometimes denoted with a superscript ''plus'', x^+ := \max(0, x). Related functions The derivative of softplus is the logistic function: :f'(x) = \frac = \frac The logistic function or the sigmoid function is a smooth approximation of the rectifier, the Heaviside step function. LogSumExp The mu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Pooling Layer In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, reducing the amount of computation and memory required, makes the model more robust to small variations in the input, and increases the receptive field of neurons in later layers in the network. Convolutional neural network pooling Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x \in \R^, where H is height, W is width, and C is the number of channels. A pooling layer outputs a tensor y \in \R^. We define two variables f, s called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Absolute Value In mathematics, the absolute value or modulus of a real number x, is the non-negative value without regard to its sign. Namely, , x, =x if x is a positive number, and , x, =-x if x is negative (in which case negating x makes -x positive), and For example, the absolute value of 3 and the absolute value of −3 is The absolute value of a number may be thought of as its distance from zero. Generalisations of the absolute value for real numbers occur in a wide variety of mathematical settings. For example, an absolute value is also defined for the complex numbers, the quaternions, ordered rings, fields and vector spaces. The absolute value is closely related to the notions of magnitude, distance, and norm in various mathematical and physical contexts. Terminology and notation In 1806, Jean-Robert Argand introduced the term ''module'', meaning ''unit of measure'' in French, specifically for the ''complex'' absolute value,Oxford English Dictionary, Draft Revision, Ju ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hyperbolic Tangent In mathematics, hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. Just as the points form a circle with a unit radius, the points form the right half of the unit hyperbola. Also, similarly to how the derivatives of and are and respectively, the derivatives of and are and respectively. Hyperbolic functions are used to express the angle of parallelism in hyperbolic geometry. They are used to express Lorentz boosts as hyperbolic rotations in special relativity. They also occur in the solutions of many linear differential equations (such as the equation defining a catenary), cubic equations, and Laplace's equation in Cartesian coordinates. Laplace's equations are important in many areas of physics, including electromagnetic theory, heat transfer, and fluid dynamics. The basic hyperbolic functions are: * hyperbolic sine "" (), * hyperbolic cosine "" (),''Collins Concise Diction ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]