Classical test theory (CTT) is a body of related
psychometric
Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and rela ...
theory that predicts outcomes of psychological
test
Test(s), testing, or TEST may refer to:
* Test (assessment), an educational assessment intended to measure the respondents' knowledge or other abilities
Arts and entertainment
* ''Test'' (2013 film), an American film
* ''Test'' (2014 film) ...
ing such as the difficulty of items or the ability of test-takers. It is a theory of testing based on the idea that a person's observed or obtained score on a test is the sum of a true score (error-free score) and an error score. Generally speaking, the aim of classical test theory is to understand and improve the
reliability
Reliability, reliable, or unreliable may refer to:
Science, technology, and mathematics Computing
* Data reliability (disambiguation), a property of some disk arrays in computer storage
* Reliability (computer networking), a category used to des ...
of psychological tests.
''Classical test theory'' may be regarded as roughly synonymous with ''true score theory''. The term "classical" refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as
item response theory
In psychometrics, item response theory (IRT, also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of Test (student assessment), tests, questionnaires, and sim ...
, which sometimes bear the appellation "modern" as in "modern latent trait theory".
Classical test theory as we know it today was codified by and described in classic texts such as and . The description of classical test theory below follows these seminal publications.
History
Classical test theory was born only after the following three achievements or ideas were conceptualized:
# a recognition of the presence of errors in measurements,
# a conception of that error as a random variable,
# a conception of correlation and how to index it.
In 1904,
Charles Spearman
Charles Edward Spearman, FRS (10 September 1863 – 17 September 1945) was an English psychologist known for work in statistics, as a pioneer of factor analysis, and for Spearman's rank correlation coefficient. He also did seminal work on mod ...
was responsible for figuring out how to correct a correlation coefficient for attenuation due to measurement error and how to obtain the index of reliability needed in making the correction. Spearman's finding is thought to be the beginning of Classical Test Theory by some . Others who had an influence in the Classical Test Theory's framework include:
George Udny Yule
George Udny Yule, CBE, FRS (18 February 1871 – 26 June 1951), usually known as Udny Yule, was a British statistician, particularly known for the Yule distribution and proposing the preferential attachment model for random graphs.
Person ...
,
Truman Lee Kelley Truman Lee Kelley (1884 – 1961) was an American researcher who made seminal contributions to statistics and psychology.
Life
He was born in Whitehall, Muskegon County, Michigan in 1884. He died in 1961.
Career
He received his A.M. degree ...
,
Fritz Kuder &
Marion Richardson
Marion Elaine Richardson (9 October 1892 – 12 November 1946) was a British educator and author of books on penmanship and handwriting.
Biography
Marion Richardson was born on 9 October 1892 in Ashford, Kent, the second daughter of Walter Marsh ...
involved in making the
Kuder–Richardson Formulas
In psychometrics, the Kuder–Richardson formulas, first published in 1937, are a measure of internal consistency reliability for measures with dichotomous choices. They were developed by Kuder and Richardson.
Kuder–Richardson Formula 20 (KR-2 ...
,
Louis Guttman
Louis Guttman (; February 10, 1916 – October 25, 1987) was an American sociologist and Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, known primarily for his work in social statistics.
Biography
Louis ( ...
, and, most recently,
Melvin Novick, not to mention others over the next quarter century after Spearman's initial findings.
Definitions
Classical test theory assumes that each person has a ''true score'',''T'', that would be obtained if there were no errors in measurement. A person's true score is defined as the expected number-correct score over an infinite number of independent administrations of the test. Unfortunately, test users never observe a person's true score, only an ''observed score'', ''X''. It is assumed that ''observed score'' = ''true score'' plus some ''error'':
X = T + E
observed score true score error
Classical test theory is concerned with the relations between the three variables
,
, and
in the population. These relations are used to say something about the quality of test scores. In this regard, the most important concept is that of ''reliability''. The reliability of the observed test scores
, which is denoted as
, is defined as the ratio of true score variance
to the observed score variance
:
:
Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this is equivalent to
:
This equation, which formulates a signal-to-noise ratio, has intuitive appeal: The reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in the test scores that we could explain if we knew the true scores. The square root of the reliability is the absolute value of the correlation between true and observed scores.
Evaluating tests and scores: Reliability
Reliability cannot be estimated directly since that would require one to know the true scores, which according to classical test theory is impossible. However, estimates of reliability can be acquired by diverse means. One way of estimating reliability is by constructing a so-called ''
parallel test
Parallel may refer to:
Mathematics
* Parallel (geometry), two lines in the Euclidean plane which never intersect
* Parallel (operator), mathematical operation named after the composition of electrical resistance in parallel circuits
Science a ...
''. The fundamental property of a parallel test is that it yields the same true score and the same observed score variance as the original test for every individual. If we have parallel tests x and x', then this means that
: