Flow-based Generative Model

picture info	Flow-based Generative Model A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to transform a simple distribution into a complex one. The direct modeling of likelihood provides many advantages. For example, the negative log-likelihood can be directly computed and minimized as the loss function. Additionally, novel samples can be generated by sampling from the initial distribution, and applying the flow transformation. In contrast, many alternative generative modeling methods such as variational autoencoder (VAE) and generative adversarial network do not explicitly represent the likelihood function. Method Let z_0 be a (possibly multivariate) random variable with distribution p_0(z_0). For i = 1, ..., K, let z_i = f_i(z_) be a sequence of random variables transformed from z_0. The functions f_1, ..., f_K should be i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Generative Model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following : # A generative model is a statistical model of the joint probability distribution P(X, Y) on given observable variable ''X'' and target variable ''Y'';: "Generative classifiers learn a model of the joint probability, p(x, y), of the inputs ''x'' and the label ''y'', and make their predictions by using Bayes rules to calculate p(y\mid x), and then picking the most likely label ''y''. # A discriminative model is a model of the conditional probability P(Y\mid X = x) of the target ''Y'', given an observation ''x''; and # Classifiers computed without using a probability model are also referred to loosely as "discriminative". The distinction between these last two classes is not cons ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Kullback–Leibler Divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different from a second, reference probability distribution ''Q''. A simple interpretation of the KL divergence of ''P'' from ''Q'' is the expected excess surprise from using ''Q'' as a model when the actual distribution is ''P''. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalized Pythagorean theorem (which applies to squared distances). In the simple case, a relative entropy of 0 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bijection In mathematics, a bijection, also known as a bijective function, one-to-one correspondence, or invertible function, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other set, and each element of the other set is paired with exactly one element of the first set. There are no unpaired elements. In mathematical terms, a bijective function is a one-to-one (injective) and onto (surjective) mapping of a set ''X'' to a set ''Y''. The term ''one-to-one correspondence'' must not be confused with ''one-to-one function'' (an injective function; see figures). A bijection from the set ''X'' to the set ''Y'' has an inverse function from ''Y'' to ''X''. If ''X'' and ''Y'' are finite sets, then the existence of a bijection means they have the same number of elements. For infinite sets, the picture is more complicated, leading to the concept of cardinal number—a way to distinguish the various sizes of infinite sets. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Inverse Function In mathematics, the inverse function of a function (also called the inverse of ) is a function that undoes the operation of . The inverse of exists if and only if is bijective, and if it exists, is denoted by f^ . For a function f\colon X\to Y, its inverse f^\colon Y\to X admits an explicit description: it sends each element y\in Y to the unique element x\in X such that . As an example, consider the real-valued function of a real variable given by . One can think of as the function which multiplies its input by 5 then subtracts 7 from the result. To undo this, one adds 7 to the input, then divides the result by 5. Therefore, the inverse of is the function f^\colon \R\to\R defined by f^(y) = \frac . Definitions Let be a function whose domain is the set , and whose codomain is the set . Then is ''invertible'' if there exists a function from to such that g(f(x))=x for all x\in X and f(g(y))=y for all y\in Y. If is invertible, then there is exactly one function sat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Typical Set In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself. This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence ''X''''n'' using ''nH''(''X'') bits on average, and, hence, justifying the use of entropy as a measure of information from a source. The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases. (Weakly) typical sequences (weak typicality, entropy typicality) If a sequence ''x''1, ..., ''x''''n'' is drawn from an i.i.d. distribution ''X ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]