LogSumExp

	LogSumExp The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms. It is defined as the logarithm of the sum of the exponentials of the arguments: \mathrm(x_1, \dots, x_n) = \log\left( \exp(x_1) + \cdots + \exp(x_n) \right). Properties The LogSumExp function domain is \R^n, the real coordinate space, and its codomain is \R, the real line. It is an approximation to the maximum \max_i x_i with the following bounds \max \leq \mathrm(x_1, \dots, x_n) \leq \max + \log(n). The first inequality is strict unless n = 1. The second inequality is strict unless all arguments are equal. (Proof: Let m = \max_i x_i. Then \exp(m) \leq \sum_^n \exp(x_i) \leq n \exp(m). Applying the logarithm to the inequality gives the result.) In addition, we can scale the function to make the bounds tighter. Consider the function \frac 1 t \mathrm(tx_1, \dots, tx_n). Then ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Softplus In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the positive part of its argument: : f(x) = x^+ = \max(0, x), where ''x'' is the input to a neuron. This is also known as a ramp function and is analogous to half-wave rectification in electrical engineering. This activation function started showing up in the context of visual feature extraction in hierarchical neural networks starting in the late 1960s. It was later argued that it has strong biological motivations and mathematical justifications. In 2011 it was found to enable better training of deeper networks, compared to the widely used activation functions prior to 2011, e.g., the logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. The rectifier is, , the most popular activation function for deep neural networks. Rectified linear un ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Gradient In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradient of a function is non-zero at a point , the direction of the gradient is the direction in which the function increases most quickly from , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative. Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to maximize a function by gradient ascent. In coordinate-free terms, the gradient of a function f(\bf) may be defined by: :df=\nabla f \cdot d\bf where ''df'' is the total infinitesimal change in ''f'' for an infinitesimal displacement d\bf, and is seen to be maximal when d\bf is in the direction of the gradi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Smooth Maximum In mathematics, a smooth maximum of an indexed family ''x''1, ..., ''x''''n'' of numbers is a smooth approximation to the maximum function \max(x_1,\ldots,x_n), meaning a parametric family of functions m_\alpha(x_1,\ldots,x_n) such that for every , the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family. Examples For large positive values of the parameter \alpha > 0, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum. : ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Log Semiring In mathematics, in the field of tropical analysis, the log semiring is the semiring structure on the logarithmic scale, obtained by considering the extended real numbers as logarithms. That is, the operations of addition and multiplication are defined by conjugation: exponentiate the real numbers, obtaining a positive (or zero) number, add or multiply these numbers with the ordinary algebraic operations on real numbers, and then take the logarithm to reverse the initial exponentiation. Such operations are also known as, e.g., logarithmic addition, etc. As usual in tropical analysis, the operations are denoted by ⊕ and ⊗ to distinguish them from the usual addition + and multiplication × (or ⋅). These operations depend on the choice of base for the exponent and logarithm ( is a choice of logarithmic unit), which corresponds to a scale factor, and are well-defined for any positive base other than 1; using a base is equivalent to using a negative sign and using the inverse . If ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Logarithmic Mean In mathematics, the logarithmic mean is a function of two non-negative numbers which is equal to their difference divided by the logarithm of their quotient. This calculation is applicable in engineering problems involving heat and mass transfer. Definition The logarithmic mean is defined as: :\begin M_\text(x, y) &= \lim_ \frac \\ pt &= \begin x & \textx = y ,\\ \frac & \text \end \end for the positive numbers x, y. Inequalities The logarithmic mean of two numbers is smaller than the arithmetic mean and the generalized mean with exponent one-third but larger than the geometric mean, unless the numbers are the same, in which case all three means are equal to the numbers. : \sqrt \leq \frac\leq \left(\frac2\right)^3 \leq \frac \qquad \text x > 0 \text y > 0. Toyesh Prakash Sharma generalizes the arithmetic logarithmic geometric mean inequality for any n belongs to the whole number as : \sqrt (\ln(\sqrt))^ (\ln(\sqrt)+n)\leq \frac\leq ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Log Semiring In mathematics, in the field of tropical analysis, the log semiring is the semiring structure on the logarithmic scale, obtained by considering the extended real numbers as logarithms. That is, the operations of addition and multiplication are defined by conjugation: exponentiate the real numbers, obtaining a positive (or zero) number, add or multiply these numbers with the ordinary algebraic operations on real numbers, and then take the logarithm to reverse the initial exponentiation. Such operations are also known as, e.g., logarithmic addition, etc. As usual in tropical analysis, the operations are denoted by ⊕ and ⊗ to distinguish them from the usual addition + and multiplication × (or ⋅). These operations depend on the choice of base for the exponent and logarithm ( is a choice of logarithmic unit), which corresponds to a scale factor, and are well-defined for any positive base other than 1; using a base is equivalent to using a negative sign and using the inverse . If ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Tropical Analysis In the mathematical discipline of idempotent analysis, tropical analysis is the study of the tropical semiring. Applications The max tropical semiring can be used appropriately to determine marking times within a given Petri net and a vector filled with marking state at the beginning: -\infty (unit for max, tropical addition) means "never before", while 0 (unit for addition, tropical multiplication) is "no additional time". Tropical cryptography is cryptography based on the tropical semiring. Tropical geometry is an analog to algebraic geometry, using the tropical semiring. References * Further reading * * See also Lunar arithmetic Lunar arithmetic, formerly called dismal arithmetic, is a version of arithmetic in which the addition and multiplication operations on digits are defined as the max and min operations. Thus, in lunar arithmetic, :2+7=\max\=7 and 2\times 7 = \min ... External links MaxPlus algebraworking group, INRIA Rocquencourt {{Mathanalysis- ... [...More Info...] [...Related Items...] OR:* [Wikipedia] [Google] [Baidu]
picture info	Differentiable Function In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in its domain. A differentiable function is smooth (the function is locally well approximated as a linear function at each interior point) and does not contain any break, angle, or cusp. If is an interior point in the domain of a function , then is said to be ''differentiable at'' if the derivative f'(x_0) exists. In other words, the graph of has a non-vertical tangent line at the point . is said to be differentiable on if it is differentiable at every point of . is said to be ''continuously differentiable'' if its derivative is also a continuous function over the domain of the function f. Generally speaking, is said to be of class if its first k derivatives f^(x), f^(x), \ldots, f^(x) exist and are continuous over the domain of the func ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	IT++ IT++ is a C++ library of classes and functions for linear algebra, numerical optimization, signal processing, communications, and statistics. It is being developed by researchers in these areas and is widely used by researchers, both in the communications industry and universities.de Lima, C.H.M.; Stancanelli, E.M.G.; Rodrigues, E.B.; da S. Maciel, J.M.; Cavalcanti, F.R.P., A software development framework based on C++ OOP language for link-level simulation tools, Telecommunications Symposium, 2006 International, Fortaleza, Brazil, The IT++ library originates from the former Department of Information Theory at the Chalmers University of Technology, Gothenburg, Sweden. The kernel of the IT++ library is templated vector and matrix classes, and a set of accompanying functions. Such a kernel makes IT++ library similar to Matlab/ Octave. For increased functionality, speed and accuracy, IT++ can make extensive use of existing free and open source libraries, especially BLAS, LAPACK and ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Log Probability In probability theory and computer science, a log probability is simply a logarithm of a probability. The use of log probabilities means representing probabilities on a logarithmic scale, instead of the standard [0, 1] unit interval. Since the probabilities of Independence (probability theory), independent event (probability theory), events multiply, and logarithms convert multiplication to addition, log probabilities of independent events add. Log probabilities are thus practical for computations, and have an intuitive interpretation in terms of information theory: the negative of the average log probability is the information entropy of an event. Similarly, likelihoods are often transformed to the log scale, and the corresponding log-likelihood can be interpreted as the degree to which an event supports a statistical model. The log probability is widely used in implementations of computations with probability, and is studied as a concept in its own right in some applications of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Logarithmic Scale A logarithmic scale (or log scale) is a way of displaying numerical data over a very wide range of values in a compact way—typically the largest numbers in the data are hundreds or even thousands of times larger than the smallest numbers. Such a scale is nonlinear: the numbers 10 and 20, and 60 and 70, are not the same distance apart on a log scale. Rather, the numbers 10 and 100, and 60 and 600 are equally spaced. Thus moving a unit of distance along the scale means the number has been ''multiplied'' by 10 (or some other fixed factor). Often exponential growth curves are displayed on a log scale, otherwise they would increase too quickly to fit within a small graph. Another way to think about it is that the ''number of digits'' of the data grows at a constant rate. For example, the numbers 10, 100, 1000, and 10000 are equally spaced on a log scale, because their numbers of digits is going up by 1 each time: 2, 3, 4, and 5 digits. In this way, adding two digits ''multiplies'' the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Negative Entropy In information theory and statistics, negentropy is used as a measure of distance to normality. The concept and phrase "negative entropy" was introduced by Erwin Schrödinger in his 1944 popular-science book ''What is Life?'' Later, Léon Brillouin shortened the phrase to ''negentropy''. In 1974, Albert Szent-Györgyi proposed replacing the term ''negentropy'' with ''syntropy''. That term may have originated in the 1940s with the Italian mathematician Luigi Fantappiè, who tried to construct a unified theory of biology and physics. Buckminster Fuller tried to popularize this usage, but ''negentropy'' remains common. In a note to ''What is Life?'' Schrödinger explained his use of this phrase. Information theory In information theory and statistics, negentropy is used as a measure of distance to normality. Out of all distributions with a given mean and variance, the normal or Gaussian distribution is the one with the highest entropy. Negentropy measures the difference in entrop ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]