Categorical Data Analysis

	Categorical Data Analysis This a list of statistical procedures which can be used for the analysis of categorical data, also known as data on the nominal scale and as categorical variables. General tests * Bowker's test of symmetry * Categorical distribution, general model * Chi-squared test * Cochran–Armitage test for trend * Cochran–Mantel–Haenszel statistics * Correspondence analysis * Cronbach's alpha * Diagnostic odds ratio * G-test * Generalized estimating equations * Generalized linear models * Krichevsky–Trofimov estimator * Kuder–Richardson Formula 20 * Linear discriminant analysis * Multinomial distribution * Multinomial logit * Multinomial probit * Multiple correspondence analysis * Odds ratio * Poisson regression * Powered partial least squares discriminant analysis * Qualitative variation * Randomization test for goodness of fit * Relative risk * Stratified analysis * Tetrachoric correlation * Uncertainty coefficient * Wald test Binomial data * Bernstein inequalities (probabili ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.Dodge, Y. (2006) ''The Oxford Dictionary of Statistical Terms'', Oxford University Press. When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experim ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Multinomial Logit In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). Multinomial logistic regression is known by a variety of other names, including polytomous LR, multiclass LR, softmax regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model. Background Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently ''categorical'', meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some examples ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Binomial Proportion Confidence Interval In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion confidence interval is an interval estimate of a success probability ''p'' when only the number of experiments ''n'' and the number of successes ''nS'' are known. There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (success and failure), the probability of success is the same for each trial, and the trials are statistically independent. Because the binomial distribution is a discrete probability distribution (i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Binomial Regression In statistics, binomial regression is a regression analysis technique in which the response (often referred to as ''Y'') has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables. Binomial regression is closely related to binary regression: a binary regression can be considered a binomial regression with n = 1, or a regression on ungrouped binary data, while a binomial regression can be considered a regression on grouped binary data (see comparison). Binomial regression models are essentially the same as binary choice models, one type of discrete choice model: the primary difference is in the theoretical motivation (see comparison). In machine learning, binomial regre ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bernstein Inequalities (probability Theory) In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let ''X''1, ..., ''X''''n'' be independent Bernoulli random variables taking values +1 and −1 with probability 1/2 (this distribution is also known as the Rademacher distribution), then for every positive \varepsilon, :\mathbb\left (\left, \frac\sum_^n X_i\ > \varepsilon \right ) \leq 2\exp \left (-\frac \right). Bernstein inequalities were proved and published by Sergei Bernstein in the 1920s and 1930s.J.V.Uspensky, "Introduction to Mathematical Probability", McGraw-Hill Book Company, 1937 Later, these inequalities were rediscovered several times in various forms. Thus, special cases of the Bernstein inequalities are also known as the Chernoff bound, Hoeffding's inequality and Azuma's inequality. Some of the inequalities 1. Let X_1, \ldots, X_n be independent zero-mean random variables. Suppose that , X_i, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Wald Test In statistics, the Wald test (named after Abraham Wald) assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance. Together with the Lagrange multiplier test and the likelihood-ratio test, the Wald test is one of three classical approaches to hypothesis testing. An advantage of the Wald test over the other two is that it only requires the estimation of the unrestricted model, which lowers the computational burden as compared to the likelihood-ratio test. However, a major disadvantage is that (in finite samples) it is not i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Uncertainty Coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal association. It was first introduced by Henri Theil and is based on the concept of information entropy. Definition Suppose we have samples of two discrete random variables, ''X'' and ''Y''. By constructing the joint distribution, , from which we can calculate the conditional distributions, and , and calculating the various entropies, we can determine the degree of association between the two variables. The entropy of a single distribution is given as: : H(X)= -\sum_x P_X(x) \log P_X(x) , while the conditional entropy is given as: : H(X, Y) = -\sum_ P_(x,~y) \log P_(x, y) . The uncertainty coefficient or proficiency is defined as: : U(X, Y) = \frac = \frac , and tells us: given ''Y'', what fraction of the bits of ''X'' can we predict? In this case we can think of ''X'' as containing the total information, and of ''Y'' as allowing one to pred ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Polychoric Correlation In statistics, polychoric correlation{{Cite web, url=https://support.sas.com/documentation/cdl/en/procstat/65543/HTML/default/viewer.htm#procstat_corr_details14.htm, title=Base SAS(R) 9.3 Procedures Guide: Statistical Procedures, Second Edition, website=support.sas.com, language=en, access-date=2018-01-10 is a technique for estimating the correlation between two hypothesised normally distributed continuous latent variables, from two observed ordinal variables. Tetrachoric correlation is a special case of the polychoric correlation applicable when both observed variables are dichotomous. These names derive from the polychoric and tetrachoric series which are used for estimation of these correlations. Applications and examples This technique is frequently applied when analysing items on self-report instruments such as personality tests and surveys that often use rating scales with a small number of response options (e.g., strongly disagree to strongly agree). The smaller the numbe ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Stratified Analysis In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are ''linearly'' related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. Howev ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Relative Risk The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association between the exposure and the outcome. Statistical use and meaning Relative risk is used in the statistical analysis of the data of ecological, cohort, medical and intervention studies, to estimate the strength of the association between exposures (treatments or risk factors) and outcomes. Mathematically, it is the incidence rate of the outcome in the exposed group, I_e, divided by the rate of the unexposed group, I_u. As such, it is used to compare the risk of an adverse outcome when receiving a medical treatment versus no treatment (or placebo), or for environmental risk factors. For example, in a study examining the effect of the drug apixaban on the occurrence of thromboembolism, 8.8% of placebo-treated patients experienced the disease, b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Qualitative Variation An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while more complex indices include the information entropy. Properties There are several types of indices used for the analysis of nominal data. Several are standard statistics that are used elsewhere - range, standard deviation, variance, mean deviation, coefficient of variation, median absolute deviation, interquartile range and quartile deviation. In addition to these several statistics have been developed with nominal data in mind. A number have been summarized and devised by Wilcox , , who requires the following standardization properties to be satisfied: * Variation varies between 0 and 1. * Variation is 0 if and only if all cases belong to a single category. * Variation is 1 if and only if cases are evenly divided across all cat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Poisson Regression In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable ''Y'' has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables. Negative binomial regression is a popular generalization of Poisson regression because it loosens the highly restrictive assumption that the variance is equal to the mean made by the Poisson model. The traditional negative binomial regression model is based on the Poisson-gamma mixture distribution. This model is popular because it models the Poisson heterogeneity with a gamma distribution. Poisson regression models are generalized linear models with the logarithm as the (canonical) link function, and the Poisson distribution function ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]