{{Unreferenced, date=October 2007 In statistics, benchmarking is a method of using auxiliary information to adjust the sampling weights used in an

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

process, in order to yield more accurate estimates of totals. Suppose we have a

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using ...

where each unit

k

has a "value"

Y(k)

associated with it. For example,

Y(k)

could be a wage of an employee

k

, or the cost of an item

k

. Suppose we want to estimate the sum

Y

of all the

Y(k)

. So we take a sample of the

k

, get a sampling weight W(k) for all sampled

k

, and then sum up

W(k) \cdot Y(k)

for all sampled

k

. One property usually common to the weights

W(k)

described here is that if we

sum Sum most commonly means the total of two or more numbers added together; see addition. Sum can also refer to: Mathematics * Sum (category theory), the generic concept of summation in mathematics * Sum, the result of summation, the additio ...

them over all sampled

k

, then this sum is an estimate of the total number of units

k

in the population (for example, the total employment, or the total number of items). Because we have a sample, this estimate of the total number of units in the population will differ from the true population total. Similarly, the estimate of total

Y

(where we sum

W(k) \cdot Y(k)

for all sampled

k

) will also differ from true population total. We do not know what the true population total

Y

value is (if we did, there would be no point in sampling!). Yet often we do know what the sum of the

W(k)

are over all units in the population. For example, we may not know the total earnings of the population or the total cost of the population, but often we know the total employment or total volume of sales. And even if we don't know these exactly, there often are surveys done by other organizations or at earlier times, with very accurate estimates of these auxiliary quantities. One important function of a population

census A census is the procedure of systematically acquiring, recording and calculating information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses in ...

is to provide data that can be used for benchmarking smaller surveys. The benchmarking procedure begins by first breaking the population into benchmarking cells. Cells are formed by grouping units together that share common characteristics, for example, similar

Y(k)

, yet anything can be used that enhances the accuracy of the final estimates. For each cell

C

, we let

W(C)

be the sum of all

W(k)

, where the sum is taken over all sampled

k

in the cell

C

. For each cell

C

, we let

T(C)

be the auxiliary value for cell

C

, which is commonly called the "benchmark target" for cell

C

. Next, we compute a benchmark factor

F(C) = T(C) / W(C)

. Then, we adjust all weights

W(k)

by multiplying it by its benchmark factor

F(C)

, for its cell

C

. The net result is that the estimated

W

ormed by summing

F(C) \cdot W(k)

will now equal the benchmark target total

T

. But the more important benefit is that the estimate of the total of

Y

ormed by summing

F(C) \cdot F(k) \cdot Y(k)

will tend to be more accurate.

Relationship to stratified sampling

Benchmarking is sometimes referred to as 'post-stratification' because of its similarities to stratified sampling. The difference between the two is that in stratified sampling, we decide ''in advance'' how many units will be sampled from each stratum (equivalent to benchmarking cells); in benchmarking, we select units from the broader population, and the number chosen from each cell is a matter of chance. The advantage of stratified sampling is that the sample numbers in each stratum can be controlled for desired accuracy outcomes. Without this control, we may end up with too much sample in one stratum and not enough in another - indeed, it's possible that a sample will contain ''no'' members from a certain cell, in which case benchmarking fails because

W(C)=0

, leading to a divide-by-zero problem. In such cases, it is necessary to 'collapse' cells together so that each remaining cell has an adequate sample size. For this reason, benchmarking is generally used in situations where stratified sampling is impractical. For instance, when selecting people from a telephone directory, we can't tell what age they are so we can't easily stratify the sample by age. However, we can collect this information from the people sampled, allowing us to benchmark against demographic information. Sampling (statistics)