HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, a population proportion, generally denoted by P or the
Greek letter The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as w ...
\pi, is a
parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
that describes a percentage value associated with a
population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
. For example, the
2010 United States Census The United States census of 2010 was the twenty-third United States national census. National Census Day, the reference day used for the census, was April 1, 2010. The census was taken via mail-in citizen self-reporting, with enumerators servin ...
showed that 83.7% of the American population was identified as not being Hispanic or Latino; the value of .837 is a population proportion. In general, the population proportion and other population parameters are unknown. A
census A census is the procedure of systematically acquiring, recording and calculating information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses incl ...
can be conducted in order to determine the actual value of a population parameter, but often a census is not practical due to its costs and time consumption. A population proportion is usually estimated through an
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...
sample statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...
obtained from an
observational study In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample (statistics), sample to a statistical population, population where the dependent and independent variables, independ ...
or
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
. For example, the National Technological Literacy Conference conducted a national survey of 2,000 adults to determine the percentage of adults who are economically illiterate. The study showed that 72% of the 2,000 adults sampled did not understand what a
gross domestic product Gross domestic product (GDP) is a money, monetary Measurement in economics, measure of the market value of all the final goods and services produced and sold (not resold) in a specific time period by countries. Due to its complex and subjec ...
is. The value of 72% is a sample proportion. The sample proportion is generally denoted by \hat and in some textbooks by p.


Mathematical definition

A ''
proportion Proportionality, proportion or proportional may refer to: Mathematics * Proportionality (mathematics), the property of two variables being in a multiplicative relation to a constant * Ratio, of one quantity to another, especially of a part compare ...
'' is mathematically defined as being the
ratio In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six (that is, 8:6, which is equivalent to the ...
of the quantity of elements (a countable quantity) in a subset S to the size of a set R: :P= \frac, where X is the count of successes in the population, and N is the size of the population. This mathematical definition can be generalized to provide the definition for the sample proportion: :\hat= \frac where x is the count of successes in the sample, and n is the size of the sample obtained from the population.


Estimation

One of the main focuses of study in
inferential statistics Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
is determining the "true" value of a parameter. Generally, the actual value for a parameter will never be found, unless a census is conducted on the population of study. However, there are statistical methods that can be used to get a reasonable estimation for a parameter. These methods include
confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
and
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
. Estimating the value of a population proportion can be of great implication in the areas of
agriculture Agriculture or farming is the practice of cultivating plants and livestock. Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to ...
,
business Business is the practice of making one's living or making money by producing or Trade, buying and selling Product (business), products (such as goods and Service (economics), services). It is also "any activity or enterprise entered into for pr ...
,
economics Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and intera ...
,
education Education is a purposeful activity directed at achieving certain aims, such as transmitting knowledge or fostering skills and character traits. These aims may include the development of understanding, rationality, kindness, and honesty. Va ...
,
engineering Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...
,
environmental studies Environmental studies is a multidisciplinary academic field which systematically studies human interaction with the environment. Environmental studies connects principles from the physical sciences, commerce/economics, the humanities, and social ...
,
medicine Medicine is the science and practice of caring for a patient, managing the diagnosis, prognosis, prevention, treatment, palliation of their injury or disease, and promoting their health. Medicine encompasses a variety of health care pract ...
,
law Law is a set of rules that are created and are enforceable by social or governmental institutions to regulate behavior,Robertson, ''Crimes against humanity'', 90. with its precise definition a matter of longstanding debate. It has been vario ...
,
political science Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and la ...
,
psychology Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
, and
sociology Sociology is a social science that focuses on society, human social behavior, patterns of Interpersonal ties, social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of Empirical ...
. A population proportion can be estimated through the usage of a confidence interval known as a one-sample proportion in the Z-interval whose formula is given below: :\hat \pm z^* \sqrt where \hat is the sample proportion, n is the sample size, and z^* is the upper \frac critical value of the standard normal distribution for a level of confidence C.


Proof

In order to derive the formula for the one-sample proportion in the Z-interval'','' a
sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were s ...
of sample proportions needs to be taken into consideration. The mean of the sampling distribution of sample proportions is usually denoted as \mu_\hat = P and its standard deviation is denoted as: :\sigma_\hat = \sqrt Since the value of P is unknown, an unbiased statistic \hat will be used for P. The mean and standard deviation are rewritten respectively as: :\mu_\hat = \hat and \sigma_\hat = \sqrt Invoking the
central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
, the sampling distribution of sample proportions is approximately
normal Normal(s) or The Normal(s) may refer to: Film and television * ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson * ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie * ''Norma ...
—provided that the sample is reasonably large and unskewed. Suppose the following probability is calculated: :P(-z^*<\frac, where 0 and \pm z^* are the standard critical values. The
inequality Inequality may refer to: Economics * Attention inequality, unequal distribution of attention across users, groups of people, issues in etc. in attention economy * Economic inequality, difference in economic well-being between population groups * ...
:-z^*<\frac can be algebraically re-written as follows: :-z^*<\frac From the algebraic work done above, it is evident from a level of certainty C thatP could fall in between the values of: :\hat \pm z^* \sqrt.


Conditions for inference

In general, the formula used for estimating a population proportion requires substitutions of known numerical values. However, these numerical values cannot be "blindly" substituted into the formula because
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
requires that the estimation of an unknown parameter be justifiable. In order for a parameter's estimation to be justifiable, there are three conditions that need to be verified: # The data's individual observation have to be obtained from a
simple random sample In statistics, a simple random sample (or SRS) is a subset of individuals (a sample) chosen from a larger set (a population) in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sample ...
of the population of interest. # The data's individual observations have to display normality. This can be verified mathematically with the following definition: #* Let n be the sample size of a given random sample and let \hat be its sample proportion. If n \hat \geq 10 and n(1-\hat)\geq10, then the data's individual observations display normality. # The data's individual observations have to be
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
of each other. This can be verified mathematically with the following definition: #* Let N be the size of the population of interest and let n be the sample size of a simple random sample of the population. If N\geq10n, then the data's individual observations are independent of each other. The conditions for SRS, normality, and independence are sometimes referred to as the conditions for the inference tool box in most statistical textbooks.


Example

Suppose a presidential election is taking place in a democracy. A random sample of 400 eligible voters in the democracy's voter population shows that 272 voters support candidate B. A political scientist wants to determine what percentage of the voter population support candidate B. To answer the political scientist's question, a one-sample proportion in the Z-interval with a confidence level of 95% can be constructed in order to determine the population proportion of eligible voters in this democracy that support candidate B.


Solution

It is known from the random sample that \hat = \frac = 0.68 with sample size n = 400. Before a confidence interval is constructed, the conditions for inference will be verified. * Since a random sample of 400 voters was obtained from the voting population, the condition for a simple random sample has been met. * Let n = 400 and \hat = 0.68, it will be checked whether n \hat \geq 10 and n(1-\hat)\geq10 :(400) (0.68) \geq 10 \Rightarrow 272 \geq 10 and (400) (1-0.68) \geq 10 \Rightarrow 128 \geq 10 :The condition for normality has been met. *Let N be the size of the voter population in this democracy, and let n = 400. If N \geq 10 n, then there is independence. :N \geq 10(400) \Rightarrow N \geq 4000 :The population size N for this democracy's voters can be assumed to be at least 4,000. Hence, the condition for independence has been met. With the conditions for inference verified, it is permissible to construct a confidence interval. Let \hat = 0.68 , n = 400 , and C = 0.95 To solve for z^*, the
expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...
\frac is used. \frac = \frac = \frac = 0.0250 By examining a standard normal bell curve, the value for z^* can be determined by identifying which standard score gives the standard normal curve an upper tail area of 0.0250 or an area of 1 - 0.0250 = 0.9750. The value for z^* can also be found through a table of standard normal probabilities. From a table of standard normal probabilities, the value of Z that gives an area of 0.9750 is 1.96. Hence, the value for z^* is 1.96. The values for \hat = 0.68, n = 400, z^* = 1.96 can now be substituted into the formula for one-sample proportion in the Z-interval: \hat \pm z^* \sqrt \Rightarrow (0.68) \pm (1.96) \sqrt \Rightarrow 0.68 \pm 1.96 \sqrt \Rightarrow \bigl(0.63429,0.72571\bigr) Based on the conditions of inference and the formula for the one-sample proportion in the Z-interval, it can be concluded with a 95% confidence level that the percentage of the voter population in this democracy supporting candidate B is between 63.429% and 72.571%.


Value of the parameter in the confidence interval range

A commonly asked question in inferential statistics is whether the parameter is included within a confidence interval. The only way to answer this question is for a census to be conducted. Referring to the example given above, the probability that the population proportion is in the range of the confidence interval is either 1 or 0. That is, the parameter is included in the interval range or it is not. The main purpose of a confidence interval is to better illustrate what the ideal value for a parameter could possibly be.


Common errors and misinterpretations from estimation

A very common error that arises from the construction of a confidence interval is the belief that the level of confidence, such as C = 95%, means 95% chance. This is incorrect. The level of confidence is based on a measure of certainty, not probability. Hence, the values of C fall between 0 and 1, exclusively.


Estimation of P using ranked set sampling

A more precise estimate of P can be obtained by choosing ranked set sampling instead of simple random sampling


See also

*
Binomial proportion confidence interval In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion conf ...
*
Confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
*
Prevalence In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
*
Statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
*
Statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
*
Statistical parameter In statistics, as opposed to its general use in mathematics, a parameter is any measured quantity of a statistical population that summarises or describes an aspect of the population, such as a mean or a standard deviation. If a population exa ...
*
Tolerance interval A tolerance interval is a statistical interval within which, with some confidence level, a specified proportion of a sampled population falls. "More specifically, a 100×p%/100×(1−α) tolerance interval provides limits within which at least a ...


References

{{Reflist Ratios