Softmax
The softmax function, also known as softargmax or normalized exponential function, converts a tuple of real numbers into a probability distribution of possible outcomes. It is a generalization of the logistic function to multiple dimensions, and is used in multinomial logistic regression. The softmax function is often used as the last activation function of a Artificial neural network, neural network to normalize the output of a network to a probability distribution over predicted output classes. Definition The softmax function takes as input a tuple of real numbers, and normalizes it into a probability distribution consisting of probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some tuple components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the Interval (mathematics), interval (0, 1), and the components will add up to 1, so that they can be i ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Multinomial Logistic Regression
In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). Multinomial logistic regression is known by a variety of other names, including polytomous LR, multiclass LR, softmax regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model. Background Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently ''categorical'', meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some example ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Logistic Function
A logistic function or logistic curve is a common S-shaped curve ( sigmoid curve) with the equation f(x) = \frac where The logistic function has domain the real numbers, the limit as x \to -\infty is 0, and the limit as x \to +\infty is L. The exponential function with negated argument (e^ ) is used to define the standard logistic function, depicted at right, where L=1, k=1, x_0=0, which has the equation f(x) = \frac and is sometimes simply called the sigmoid. It is also sometimes called the expit, being the inverse function of the logit. The logistic function finds applications in a range of fields, including biology (especially ecology), biomathematics, chemistry, demography, economics, geoscience, mathematical psychology, probability, sociology, political science, linguistics, statistics, and artificial neural networks. There are various generalizations, depending on the field. History The logistic function was introduced in a series of three papers by Pier ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Smooth Maximum
In mathematics, a smooth maximum of an indexed family ''x''1, ..., ''x''''n'' of numbers is a smooth approximation to the maximum function \max(x_1,\ldots,x_n), meaning a parametric family of functions m_\alpha(x_1,\ldots,x_n) such that for every , the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family. Examples Boltzmann operator For large positive values of the parameter \alpha > 0, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it appro ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Artificial Neural Network
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks. A neural network consists of connected units or nodes called '' artificial neurons'', which loosely model the neurons in the brain. Artificial neuron models that mimic biological neurons more closely have also been recently investigated and shown to significantly improve performance. These are connected by ''edges'', which model the synapses in the brain. Each artificial neuron receives signals from connected neurons, then processes them and sends a signal to other connected neurons. The "signal" is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs, called the '' activation function''. The strength of the signal at each connection is determined by a ''weight'', which adjusts during the learning process. Typically, ne ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
LogSumExp
The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms. It is defined as the logarithm of the sum of the exponentials of the arguments: \mathrm(x_1, \dots, x_n) = \log\left( \exp(x_1) + \cdots + \exp(x_n) \right). Properties The LogSumExp function domain is \R^n, the real coordinate space, and its codomain is \R, the real line. It is an approximation to the maximum \max_i x_i with the following bounds \max \leq \mathrm(x_1, \dots, x_n) \leq \max + \log(n). The first inequality is strict unless n = 1. The second inequality is strict unless all arguments are equal. (Proof: Let m = \max_i x_i. Then \exp(m) \leq \sum_^n \exp(x_i) \leq n \exp(m). Applying the logarithm to the inequality gives the result.) In addition, we can scale the function to make the bounds tighter. Consider the function \frac 1 t \mathrm(tx_1, \dots, tx_n). Then ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Gibbs Distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability that a system will be in a certain state as a function of that state's energy and the temperature of the system. The distribution is expressed in the form: :p_i \propto \exp\left(- \frac \right) where is the probability of the system being in state , is the exponential function, is the energy of that state, and a constant of the distribution is the product of the Boltzmann constant and thermodynamic temperature . The symbol \propto denotes proportionality (see for the proportionality constant). The term ''system'' here has a wide meaning; it can range from a collection of 'sufficient number' of atoms or a single atom to a macroscopic system such as a natural gas storage tank. Therefore, the Boltzmann distribution can be used to sol ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Activation Function
The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is ''nonlinear''. Modern activation functions include the logistic ( sigmoid) function used in the 2012 speech recognition model developed by Hinton et al; the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model; and the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model. Comparison of activation functions Aside from their empirical performance, activation functions also have different mathematical properties: ; Nonlinear: When the activation function is non-linear, then a two-layer neural network can be proven to be a universal function approximator. This is known as the Universal Approximation Theorem. The identity activation function does not satisfy this property. W ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Max-plus Semiring
In idempotent analysis, the tropical semiring is a semiring of extended real numbers with the operations of minimum (or maximum) and addition replacing the usual ("classical") operations of addition and multiplication, respectively. The tropical semiring has various applications (see tropical analysis), and forms the basis of tropical geometry. The name ''tropical'' is a reference to the Hungarian-born computer scientist Imre Simon, so named because he lived and worked in Brazil. Definition The ' (or or ) is the semiring (\mathbb \cup \, \oplus, \otimes), with the operations: : x \oplus y = \min\, : x \otimes y = x + y. The operations \oplus and \otimes are referred to as ''tropical addition'' and ''tropical multiplication'' respectively. The identity element for \oplus is +\infty, and the identity element for \otimes is 0. Similarly, the ' (or or or ) is the semiring (\mathbb \cup \, \oplus, \otimes), with operations: : x \oplus y = \max\, : x \otimes y = x + y. The ident ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Log Semiring
In mathematics, in the field of tropical analysis, the log semiring is the semiring structure on the logarithmic scale, obtained by considering the extended real numbers as logarithms. That is, the operations of addition and multiplication are defined by conjugation: exponentiate the real numbers, obtaining a positive (or zero) number, add or multiply these numbers with the ordinary algebraic operations on real numbers, and then take the logarithm to reverse the initial exponentiation. Such operations are also known as, e.g., logarithmic addition, etc. As usual in tropical analysis, the operations are denoted by ⊕ and ⊗ to distinguish them from the usual addition + and multiplication × (or ⋅). These operations depend on the choice of base for the exponent and logarithm ( is a choice of logarithmic unit), which corresponds to a scale factor, and are well-defined for any positive base other than 1; using a base is equivalent to using a negative sign and using the inverse . ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Deformation Theory
In mathematics, deformation theory is the study of infinitesimal conditions associated with varying a solution ''P'' of a problem to slightly different solutions ''P''ε, where ε is a small number, or a vector of small quantities. The infinitesimal conditions are the result of applying the approach of differential calculus to solving a problem with constraints. The name is an analogy to non-rigid structures that deform slightly to accommodate external forces. Some characteristic phenomena are: the derivation of first-order equations by treating the ε quantities as having negligible squares; the possibility of ''isolated solutions'', in that varying a solution may not be possible, ''or'' does not bring anything new; and the question of whether the infinitesimal constraints actually 'integrate', so that their solution does provide small variations. In some form these considerations have a history of centuries in mathematics, but also in physics and engineering. For example, in the ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Tropical Analysis
In the mathematical discipline of idempotent analysis, tropical analysis is the study of the tropical semiring. Applications The max tropical semiring can be used appropriately to determine marking times within a given Petri net and a vector filled with marking state at the beginning: -\infty (unit for max, tropical addition) means "never before", while 0 (unit for addition, tropical multiplication) is "no additional time". Tropical cryptography is cryptography based on the tropical semiring. Tropical geometry is an analog to algebraic geometry, using the tropical semiring. References * Further reading * * See also *Lunar arithmetic Lunar most commonly means "of or relating to the Moon The Moon is Earth's only natural satellite. It Orbit of the Moon, orbits around Earth at Lunar distance, an average distance of (; about 30 times Earth diameter, Earth's diameter). Th ... External links MaxPlus algebraworking group, INRIA Rocquencourt {{Mathanalysi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Compact Convergence
In mathematics compact convergence (or uniform convergence on compact sets) is a type of convergence that generalizes the idea of uniform convergence. It is associated with the compact-open topology. Definition Let (X, \mathcal) be a topological space and (Y,d_) be a metric space. A sequence of functions :f_ : X \to Y, n \in \mathbb, is said to converge compactly as n \to \infty to some function f : X \to Y if, for every compact set K \subseteq X, :f_, _ \to f, _ uniformly on K as n \to \infty. This means that for all compact K \subseteq X, :\lim_ \sup_ d_ \left( f_ (x), f(x) \right) = 0. Examples * If X = (0, 1) \subseteq \mathbb and Y = \mathbb with their usual topologies, with f_ (x) := x^, then f_ converges compactly to the constant function with value 0, but not uniformly. * If X=(0,1], Y=\R and f_n(x)=x^n, then f_n converges pointwise convergence, pointwise to the function that is zero on (0,1) and one at 1, but the sequence does not converge compactly. * A very po ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |