The index of dissimilarity is a

demographic Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as edu ...

measure of the evenness with which two groups are distributed across component geographic areas that make up a larger area. A group is evenly distributed when each geographic unit has the same percentage of group members as the total population. The index score can also be interpreted as the

percent In mathematics, a percentage (from la, per centum, "by a hundred") is a number or ratio expressed as a fraction of 100. It is often denoted using the percent sign, "%", although the abbreviations "pct.", "pct" and sometimes "pc" are also use ...

age of one of the two groups included in the calculation that would have to move to different geographic areas in order to produce a distribution that matches that of the larger area. The index of dissimilarity can be used as a measure of

segregation Segregation may refer to: Separation of people * Geographical segregation, rates of two or more populations which are not homogenous throughout a defined space * School segregation * Housing segregation * Racial segregation, separation of humans ...

. A score of zero (0%) reflects a fully integrated environment; a score of 1 (100%) reflects full segregation. In terms of black–white segregation, a score of .60 means that 60 percent of blacks would have to exchange places with whites in other units to achieve an even geographic distribution.

Basic formula

The basic formula for the index of dissimilarity is: :

D = \frac \sum_^N  \left,  \frac - \frac \

where (comparing a black and white population, for example): :''a_i'' = the population of group A in the ''i''^th area, e.g. census tract :''A'' = the total population in group A in the large geographic entity for which the index of dissimilarity is being calculated. :''b_i'' = the population of group B in the ''i''^th area :''B'' = the total population in group B in the large geographic entity for which the index of dissimilarity is being calculated. The index of dissimilarity is applicable to any

categorical variable In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or ...

(whether demographic or not) and because of its simple properties is useful for input into multidimensional scaling and clustering programs. It has been used extensively in the study of

social mobility Social mobility is the movement of individuals, families, households or other categories of people within or between social strata in a society. It is a change in social status relative to one's current social location within a given society ...

to compare distributions of origin (or destination) occupational categories.

Linear algebra perspective

The formula for the Index of Dissimilarity can be made much more compact and meaningful by considering it from the perspective of

Linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...

. Suppose we are studying the distribution of rich and poor people in a city (e.g.

London London is the capital and largest city of England and the United Kingdom, with a population of just under 9 million. It stands on the River Thames in south-east England at the head of a estuary down to the North Sea, and has been a majo ...

). Suppose our city contains

N

blocks:

\

Let's create a vector

\mathbf

which shows the number of rich people in each block of our city:

\mathbf =_1, r_2, \cdots, r_N /math>

Similarly, let's create a vector \mathbf which shows the number of poor people in each block of our city: \mathbf =_1, p_2, \cdots, p_N /math>

Now, the L^1 -norm of a vector is simply the sum of (the magnitude of) each entry in that vector. Wolfram MathWorld: L1 Norm

/ref> That is, for a vector

\mathbf =_1, v_2, \cdots, v_N /math>, we have the L^1 -norm:, \mathbf, _1 = \sum_^ , v_i, If we denote R as the total number of rich people in our city, than a compact way to calculate R would be to use the L^1 -norm: R = , \mathbf, _1 = \sum_^ , r_i, Similarly, if we denote P as the total number of poor people in our city, then: P = , \mathbf, _1 = \sum_^ , p_i, When we divide a vector \mathbf by its norm, we get what is called the normalized vector or

Unit vector In mathematics, a unit vector in a normed vector space is a vector (often a spatial vector) of length 1. A unit vector is often denoted by a lowercase letter with a circumflex, or "hat", as in \hat (pronounced "v-hat"). The term ''direction vecto ...

\hat

\hat = \frac

Let us normalize the rich vector

\mathbf

and the poor vector

\mathbf

\hat = \frac = \frac

\hat = \frac = \frac

We finally return to the formula for the Index of Dissimilarity (

D

); it is simply equal to one-half the

L^1

-norm of the difference between the vectors

\hat

and

\hat

Numerical example

Consider a city consisting of four blocks of 2 people each. One block consists of 2 rich people. One block consists of 2 poor people. Two blocks consist of 1 rich and 1 poor person. What is the index of dissimilarity for this city? 2x2 city

Firstly, let's find the rich vector

\mathbf

and poor vector

\mathbf

\mathbf =,0,1,1 /math> \mathbf =,2,1,1 /math>

Next, let's calculate the total number of rich people and poor people in our city: R = 2 + 0 + 1 + 1 = 4 P = 0 + 2 + 1 + 1 = 4 Next, let's normalize the rich and poor vectors: \hat = \frac = \frac,0,1,1 = .5, 0, 0.25, 0.25 /math> \hat = \frac = \frac,2,1,1 =, 0.5, 0.25, 0.25 /math>

We can now calculate the difference \hat - \hat : \hat - \hat = .5, 0, 0.25, 0.25 -, 0.5, 0.25, 0.25 = .5, -0.5, 0, 0 /math>

Finally, let's find the index of dissimilarity (D): D = \frac , \hat - \hat, _1 = \frac ( , 0.5,  + , -0.5,  ) = 0.5

Equivalence between formulae

We can prove that the Linear Algebraic formula for

D

is identical to the basic formula for

D

. Let's start with the Linear Algebraic formula:

D = \frac, \hat - \hat, _1

Let's replace the normalized vectors

\mathbf

and

\mathbf

with:

D = \frac \left,  \frac - \frac \_1

Finally, from the definition of the

L^1

-norm, we know that we can replace it with the summation:

D = \frac \sum_^ , \frac - \frac,

Thus we prove that the linear algebra formula for the index of dissimilarity is equivalent to the basic formula for it:

D = \frac, \hat - \hat, _1 = \frac \sum_^ , \frac - \frac,

Zero segregation

When the Index of Dissimilarity is zero, this means that the community we are studying has zero segregation. For example, if we are studying the segregation of rich and poor people in a city, then if

D = 0

, it means that: *There are no blocks in the city which are "rich blocks", and there are no blocks in the city which are "poor blocks" *There is a homogeneous distribution of rich and poor people throughout the city If we set

D = 0

in the linear algebraic formula, we get the necessary condition for having zero segregation:

\mathbf = \mathbf

For example, suppose you have a city with 2 blocks. Each block has 4 rich people and 100 poor people:

\mathbf =,4 /math> \mathbf =

00,100 This list contains selected positive numbers in increasing order, including counts of things, dimensionless quantities and probabilities. Each number is given a name in the short scale, which is used in English-speaking countries, as well as a ...

/math> Then, the total number of rich people is

R = 4 + 4 = 8

, and the total number of poor people is

P = 100 + 100 = 200

. Thus:

\mathbf = /8, 4/8 = .5, 0.5 /math> \mathbf = 00/200, 100/200 = .5, 0.5 /math>

Because \mathbf = \mathbf, thus this city has zero segregation.

As another example, suppose you have a city with 3 blocks: \mathbf =

,2,3 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...

/math>

\mathbf = 00,200,300 /math>

Then, we have R = 1 + 2 + 3= 6 rich people in our city, and P = 100 + 200 + 300 = 600 poor people. Thus: \mathbf = /6, 2/6, 3/6 /math> \mathbf = 00/600,200/600,300/600 = /6,2/6,3/6 /math>

Again, because \mathbf = \mathbf, thus this city also has zero segregation.

References

{{Reflist

External links

*http://enceladus.isr.umich.edu/race/calculate.html Index numbers