Psychometrics is a field of study within

psychology Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...

concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include

intelligence Intelligence has been defined in many ways: the capacity for abstraction, logic, understanding, self-awareness, learning, emotional knowledge, reasoning, planning, creativity, critical thinking, and problem-solving. It can be described as the a ...

introversion The traits of extraversion (also spelled extroversion Retrieved 2018-02-21.) and introversion are a central dimension in some human personality theories. The terms ''introversion'' and ''extraversion'' were introduced into psychology by Carl J ...

mental disorders A mental disorder, also referred to as a mental illness or psychiatric disorder, is a behavioral or mental pattern that causes significant distress or impairment of personal functioning. Such features may be persistent, relapsing and remitt ...

, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through

mathematical modeling A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, ...

based on what is observed from individuals' responses to items on tests and scales. Practitioners are described as psychometricians, although not all who engage in psychometric research go by this title. Psychometricians usually possess specific qualifications such as degrees or certifications, and most are

psychologists A psychologist is a professional who practices psychology and studies mental states, perceptual, cognitive, emotional, and social processes and behavior. Their work often involves the experimentation, observation, and interpretation of how ind ...

with advanced graduate training in psychometrics and measurement theory. In addition to traditional, academic institutions, practitioners also work for organizations such as the

Educational Testing Service Educational Testing Service (ETS), founded in 1947, is the world's largest private nonprofit educational testing and assessment organization. It is headquartered in Lawrence Township, New Jersey, but has a Princeton address. ETS develops v ...

and

Psychological Corporation Harcourt Assessment was a company that published and distributed educational and psychological assessment tools and therapy resources and provided educational assessment and data management services for national, state, district and local assessme ...

. Some psychometric researchers focus on the construction and validation of assessment instruments including surveys,

scales Scale or scales may refer to: Mathematics * Scale (descriptive set theory), an object defined on a set of points * Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original * Scale factor, a number ...

, and open- or close-ended

questionnaires A questionnaire is a research instrument that consists of a set of questions (or other types of prompts) for the purpose of gathering information from respondents through survey or statistical study. A research questionnaire is typically a mix o ...

. Others focus on research relating to measurement theory (e.g.,

item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...

;

intraclass correlation In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly ...

) or specialize as learning and development professionals.

Historical foundation

Psychological testing has come from two streams of thought: the first, from

Darwin Darwin may refer to: Common meanings * Charles Darwin (1809–1882), English naturalist and writer, best known as the originator of the theory of biological evolution by natural selection * Darwin, Northern Territory, a territorial capital city i ...

Galton Sir Francis Galton, Fellow of the Royal Society, FRS Royal Anthropological Institute of Great Britain and Ireland, FRAI (; 16 February 1822 – 17 January 1911), was an English Victorian era polymath: a statistician, sociologist, psycholo ...

, and Cattell on the measurement of individual differences, and the second, from Herbart,

Weber Weber (, or ; German: ) is a surname of German origin, derived from the noun meaning " weaver". In some cases, following migration to English-speaking countries, it has been anglicised to the English surname 'Webber' or even 'Weaver'. Notable pe ...

Fechner Fechner is a surname. Notable people with the surname include: * Carl-A. Fechner (born 1952), German documentary filmmaker * Christian Fechner (1944–2008), French film producer and screenwriter * Gino Fechner (born 1997), German footballer * Gu ...

, and Wundt and their psychophysical measurements of a similar construct. The second set of individuals and their research is what has led to the development of

experimental psychology Experimental psychology refers to work done by those who apply experimental methods to psychological study and the underlying processes. Experimental psychologists employ human participants and animal subjects to study a great many topics, in ...

and standardized testing.Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological Testing: Principles, Applications, and Issues.'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.

Victorian stream

Charles Darwin was the inspiration behind Sir Francis Galton, a scientist who advanced the development of psychometrics. In 1859, Darwin published his book ''

On the Origin of Species ''On the Origin of Species'' (or, more completely, ''On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life''),The book's full original title was ''On the Origin of Species by Me ...

''. Darwin described the role of natural selection in the emergence, over time, of different populations of species of plants and animals. The book showed how individual members of a

species In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...

differ among themselves and how they possess characteristics that are more or less adaptive to their environment. Those with more adaptive characteristics are more likely to survive to procreate and give rise to another generation. Those with less adaptive characteristics are less likely. These ideas stimulated Galton's interest in the study of human beings and how they differ one from another and, more importantly, how to measure those differences. Galton wrote a book entitled ''Hereditary Genius''. The book described different characteristics that people possess and how those characteristics make some more "fit" than others. Today these differences, such as sensory and motor functioning (reaction time, visual acuity, and physical strength), are important domains of scientific psychology. Much of the early theoretical and applied for work in psychometrics was undertaken in an attempt to measure

. Galton often referred to as "the father of psychometrics," devised and included mental tests among his

anthropometric Anthropometry () refers to the measurement of the human individual. An early tool of physical anthropology, it has been used for identification, for the purposes of understanding human physical variation, in paleoanthropology and in various att ...

measures.

James McKeen Cattell James is a common English language surname and given name: *James (name), the typically masculine first name James * James (surname), various people with the last name James James or James City may also refer to: People * King James (disambiguat ...

, a pioneer in the field of psychometrics, went on to extend Galton's work. Cattell coined the term ''mental test'', and is responsible for research and knowledge that ultimately led to the development of modern tests.Kaplan, R.M., & Saccuzzo, D.P. (2010). ''Psychological testing: Principles, applications, and issues'' (8th ed.). Belmont, CA: Wadsworth, Cengage Learning.

German stream

The origin of psychometrics also has connections to the related field of psychophysics. Around the same time that Darwin, Galton, and Cattell were making their discoveries, Herbart was also interested in "unlocking the mysteries of human consciousness" through the scientific method. Herbart was responsible for creating mathematical models of the mind, which were influential in educational practices for years to come. E.H. Weber built upon Herbart's work and tried to prove the existence of a psychological threshold, saying that a minimum stimulus was necessary to activate a

sensory system The sensory nervous system is a part of the nervous system responsible for processing sensory information. A sensory system consists of sensory neurons (including the sensory receptor cells), neural pathways, and parts of the brain involved ...

. After Weber, G.T. Fechner expanded upon the knowledge he gleaned from Herbart and Weber, to devise the law that the strength of a sensation grows as the logarithm of the stimulus intensity. A follower of Weber and Fechner,

Wilhelm Wundt Wilhelm Maximilian Wundt (; ; 16 August 1832 – 31 August 1920) was a German physiologist, philosopher, and professor, known today as one of the fathers of modern psychology. Wundt, who distinguished psychology as a science from philosophy and ...

is credited with founding the science of psychology. It is Wundt's influence that paved the way for others to develop psychological testing.

20th century

In 1936, the psychometrician L. L. Thurstone, founder and first president of the Psychometric Society, developed and applied a theoretical approach to measurement referred to as the

law of comparative judgment The law of comparative judgment was conceived by L. L. Thurstone. In modern-day terminology, it is more aptly described as a model that is used to obtain measurements from any process of pairwise comparison. Examples of such processes are the compa ...

, an approach that has close connections to the psychophysical theory of

Ernst Heinrich Weber Ernst Heinrich Weber (24 June 1795 – 26 January 1878) was a German physician who is considered one of the founders of experimental psychology. He was an influential and important figure in the areas of physiology and psychology during his li ...

and

Gustav Fechner Gustav Theodor Fechner (; ; 19 April 1801 – 18 November 1887) was a German physicist, philosopher, and experimental psychologist. A pioneer in experimental psychology and founder of psychophysics (techniques for measuring the mind), he inspir ...

. In addition, Spearman and Thurstone both made important contributions to the theory and application of

factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...

, a statistical method developed and used extensively in psychometrics. In the late 1950s,

Leopold Szondi Leopold may refer to: People * Leopold (given name) * Leopold (surname) Arts, entertainment, and media Fictional characters * Leopold (''The Simpsons''), Superintendent Chalmers' assistant on ''The Simpsons'' * Leopold Bloom, the protagonist o ...

made a historical and epistemological assessment of the impact of statistical thinking on psychology during previous few decades: "in the last decades, the specifically psychological thinking has been almost completely suppressed and removed, and replaced by a statistical thinking. Precisely here we see the cancer of testology and testomania of today." More recently, psychometric theory has been applied in the measurement of

personality Personality is the characteristic sets of behaviors, cognitions, and emotional patterns that are formed from biological and environmental factors, and which change over time. While there is no generally agreed-upon definition of personality, mos ...

, attitudes, and

belief A belief is an attitude that something is the case, or that some proposition is true. In epistemology, philosophers use the term "belief" to refer to attitudes about the world which can be either true or false. To believe something is to take ...

s, and

academic achievement Academic achievement or academic performance is the extent to which a student, teacher or institution has attained their short or long-term educational goals. Completion of educational benchmarks such as secondary school diplomas and bachelor's deg ...

. These latent constructs cannot truly be measured, and much of the research and science in this discipline has been developed in an attempt to measure these constructs as close to the true score as possible. Figures who made significant contributions to psychometrics include

Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university st ...

, Henry F. Kaiser, Carl Brigham, L. L. Thurstone, E. L. Thorndike, Georg Rasch,

Eugene Galanter Eugene Galanter (1924-2016) was one of the modern founders of cognitive psychology. He was an academic in the field of experimental psychology and an author. Dr. Galanter was Professor Emeritus of Psychology and Quondam Director of the Psychophysi ...

, Johnson O'Connor,

Frederic M. Lord Frederic Mather Lord (November 12, 1912 – February 5, 2000) was a psychometrician for Educational Testing Service. The SAT, GRE, GMAT, LSAT and TOEFL are all based on Lord's research. Early life Lord was born on November 12, 1912 in Hanover, Ne ...

, Ledyard R Tucker,

Louis Guttman Louis (Eliyahu) Guttman (February 10, 1916 – October 25, 1987; he, לואיס (אליהו) גוטמן) was an American sociologist and Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, known primarily fo ...

, and

Jane Loevinger Jane Loevinger Weissman (February 6, 1918 – January 4, 2008) was an American developmental psychologist who developed a theory of personality which emphasized the gradual internalization of social rules and the maturing conscience for the orig ...

Definition of measurement in the social sciences

The definition of measurement in the social sciences has a long history. A current widespread definition, proposed by

Stanley Smith Stevens Stanley Smith Stevens (November 4, 1906 – January 18, 1973) was an American psychologist who founded Harvard's Psycho-Acoustic Laboratory, studying psychoacoustics, and he is credited with the introduction of Stevens's power law. Stevens autho ...

, is that measurement is "the assignment of numerals to objects or events according to some rule." This definition was introduced in a 1946 ''

Science Science is a systematic endeavor that Scientific method, builds and organizes knowledge in the form of Testability, testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earli ...

'' article in which Stevens proposed four

levels of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...

. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, namely that scientific measurement entails "the estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute" (p. 358) Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement: :Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences (Reese, 1943, p. 49). These divergent responses are reflected in alternative approaches to measurement. For example, methods based on covariance matrices are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are ''assigned'' according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations. On the other hand, when measurement models such as the

Rasch model The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...

are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met.

Instruments and procedures

The first psychometric instruments were designed to measure

. One early approach to measuring intelligence was the test developed in France by

Alfred Binet Alfred Binet (; 8 July 1857 – 18 October 1911), born Alfredo Binetti, was a French psychologist who invented the first practical IQ test, the Binet–Simon test. In 1904, the French Ministry of Education asked psychologist Alfred Binet to ...

and

Theodore Simon Theodore may refer to: Places * Theodore, Alabama, United States * Theodore, Australian Capital Territory * Theodore, Queensland, a town in the Shire of Banana, Australia * Theodore, Saskatchewan, Canada * Theodore Reservoir, a lake in Saskatche ...

. That test was known as the .The French test was adapted for use in the U. S. by

Lewis Terman Lewis Madison Terman (January 15, 1877 – December 21, 1956) was an American psychologist and author. He was noted as a pioneer in educational psychology in the early 20th century at the Stanford Graduate School of Education. He is best known ...

of Stanford University, and named the Stanford-Binet IQ test. Another major focus in psychometrics has been on

personality test A personality test is a method of assessing human personality constructs. Most personality assessment instruments (despite being loosely referred to as "personality tests") are in fact introspective (i.e., subjective) self-report questionnaire ( ...

ing. There has been a range of theoretical approaches to conceptualizing and measuring personality, though there is no widely agreed upon theory. Some of the better-known instruments include the

Minnesota Multiphasic Personality Inventory The Minnesota Multiphasic Personality Inventory (MMPI) is a standardized psychometric test of adult personality and psychopathology. Psychologists and other mental health professionals use various versions of the MMPI to help develop treatment ...

, the Five-Factor Model (or "Big 5") and tools such as Personality and Preference Inventory and the

Myers–Briggs Type Indicator In Personality type, personality typology, the Myers–Briggs Type Indicator (MBTI) is an introspection, introspective self-report study, self-report questionnaire indicating differing Psychology, psychological preferences in how people perceiv ...

. Attitudes have also been studied extensively using psychometric approaches. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993).

Theoretical approaches

Psychometricians have developed a number of different measurement theories. These include

classical test theory Classical test theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. It is a theory of testing based on the idea that a person's observ ...

(CTT) and

(IRT). An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the

for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences. Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include:

, a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for determining the number of latent factors. A usual procedure is to stop factoring when

eigenvalues In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.

Multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...

is a method for finding a simple representation for data with a large number of latent dimensions.

Cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...

is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures. More recently,

structural equation modeling Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral scienc ...

and path analysis represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance."

Key concepts

Key concepts in classical test theory are

reliability Reliability, reliable, or unreliable may refer to: Science, technology, and mathematics Computing * Data reliability (disambiguation), a property of some disk arrays in computer storage * High availability * Reliability (computer networking), ...

and

validity Validity or Valid may refer to: Science/mathematics/statistics: * Validity (logic), a property of a logical argument * Scientific: ** Internal validity, the validity of causal inferences within scientific studies, usually based on experiments ...

. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity. Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called ''test-retest reliability.'' Similarly, the equivalence of different versions of the same measure can be indexed by a

Pearson correlation In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...

, and is called ''equivalent forms reliability'' or a similar term. Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed ''split-half reliability''; the value of this

Pearson product-moment correlation coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced ) ― also known as Pearson's ''r'', the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficien ...

for two half-tests is adjusted with the Spearman–Brown prediction formula to correspond to the correlation between two full-length tests. Perhaps the most commonly used index of reliability is

Cronbach's α Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient that provides a method of measuring internal consistency of tests and measures. Numero ...

, which is equivalent to the

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...

of all possible split-half coefficients. Other approaches include the intra-class correlation, which is the ratio of variance of measurements of a given target to the variance of all targets. There are a number of different forms of validity. Criterion-related validity refers to the extent to which a test or scale predicts a sample of behavior, i.e., the criterion, that is "external to the measuring instrument itself." That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish ''

concurrent validity Concurrent validity is a type of evidence that can be gathered to defend the use of a test for predicting other outcomes. It is a parameter used in sociology, psychology, and other psychometric or behavioral sciences. Concurrent validity is demonst ...

''; when the criterion is collected later the goal is to establish ''

predictive validity In psychometrics, predictive validity is the extent to which a score on a scale or test predicts scores on some criterion measure. For example, the validity of a cognitive test for job performance is the correlation between test scores and, for exa ...

''. A measure has ''

construct validity Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. ''Construct validation'' is the accumulation of evidence to support the interpretation of what a measure reflects.Polit DF Beck ...

'' if it is related to measures of other constructs as required by theory. ''

Content validity In psychometrics, content validity (also known as logical validity) refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective dim ...

'' is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a ''

job analysis Job analysis (also known as work analysis) is a family of procedures to identify the content of a job in terms of the activities it involves in addition to the attributes or requirements necessary to perform those activities. Job ''analysis'' pro ...

''.

Item response theory In psychometrics, item response theory (IRT) (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measur ...

models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.

Standards of quality

The considerations of

and

typically are viewed as essential elements for determining the

quality Quality may refer to: Concepts *Quality (business), the ''non-inferiority'' or ''superiority'' of something *Quality (philosophy), an attribute or a property * Quality (physics), in response theory *Energy quality, used in various science discipl ...

of any test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing

standards Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object t ...

and making overall judgments about the quality of any test as a whole within a given context. A consideration of concern in many applied research settings is whether or not the metric of a given psychological inventory is meaningful or arbitrary.

Testing standards

In 2014, the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) published a revision of the ''

Standards for Educational and Psychological Testing ''The Standards for Educational and Psychological Testing'' is a set of testing standards developed jointly by the American Educational Research Association (AERA), American Psychological Association (APA), and the National Council on Measurement i ...

'', which describes standards for test development, evaluation, and use. The ''Standards'' cover essential topics in testing including validity, reliability/errors of measurement, and fairness in testing. The book also establishes standards related to testing operations including test design and development, scores, scales, norms, score linking, cut scores, test administration, scoring, reporting, score interpretation, test documentation, and rights and responsibilities of test takers and test users. Finally, the ''Standards'' cover topics related to testing applications, including psychological testing and assessment, workplace testing and

credentialing Credentialing is the process of establishing the qualifications of licensed medical professionals and assessing their background and legitimacy. Credentialing is the process of granting a designation, such as a certificate or license, by asses ...

, educational testing and assessment, and testing in

program evaluation Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency. In both the public and private sectors, s ...

and public policy.

Evaluation standards

In the field of

evaluation Evaluation is a systematic determination and assessment of a subject's merit, worth and significance, using criteria governed by a set of standards. It can assist an organization, program, design, project or any other intervention or initiative ...

, and in particular

educational evaluation Educational evaluation is the evaluation process of characterizing and appraising some aspect/s of an education Education is a purposeful activity directed at achieving certain aims, such as transmitting knowledge or fostering skills a ...

, the

Joint Committee on Standards for Educational Evaluation The Joint Committee on Standards for Educational Evaluation is an American/Canadian based Standards Developer Organization (SDO). The Joint Committee, created in 1975, represents a coalition of major professional associations formed in 1975 to dev ...

has published three sets of standards for evaluations. ''The Personnel Evaluation Standards'' was published in 1988, ''The Program Evaluation Standards'' (2nd edition) was published in 1994, and ''The Student Evaluation Standards'' was published in 2003. Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing, and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.

Controversy and criticism

Because psychometrics is based on latent psychological processes measured through

correlations In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...

, there has been controversy about some psychometric measures. Critics, including practitioners in the

physical sciences Physical science is a branch of natural science that studies non-living systems, in contrast to life science. It in turn has many branches, each referred to as a "physical science", together called the "physical sciences". Definition Phy ...

, have argued that such definition and quantification is difficult, and that such measurements are often misused by laymen, such as with personality tests used in employment procedures. The Standards for Educational and Psychological Measurement gives the following statement on

test validity Test validity is the extent to which a test (such as a chemical, physical, or scholastic test) accurately measures what it is supposed to measure. In the fields of psychological testing and educational testing, "validity refers to the degree to ...

: "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests".American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999) ''Standards for educational and psychological testing''. Washington, DC: American Educational Research Association. Simply put, a test is not valid unless it is used and interpreted in the way it is intended. Two types of tools used to measure

personality traits In psychology, trait theory (also called dispositional theory) is an approach to the study of human personality. Trait theorists are primarily interested in the measurement of ''traits'', which can be defined as habitual patterns of behaviour, tho ...

are objective tests and projective measures. Examples of such tests are the: Big Five Inventory (BFI),

(MMPI-2),

Rorschach Inkblot test The Rorschach test is a projective psychological test in which subjects' perceptions of inkblots are recorded and then analyzed using psychological interpretation, complex algorithms, or both. Some psychologists use this test to examine a pe ...

, Neurotic Personality Questionnaire KON-2006, or Eysenck's Personality Questionnaire (EPQ-R). Some of these tests are helpful because they have adequate

and

, two factors that make tests consistent and accurate reflections of the underlying construct. The Myers–Briggs Type Indicator (MBTI), however, has questionable validity and has been the subject of much criticism. Psychometric specialist Robert Hogan wrote of the measure: "Most personality psychologists regard the MBTI as little more than an elaborate Chinese fortune cookie."

Lee Cronbach Lee Joseph Cronbach (April 22, 1916 – October 1, 2001) was an American educational psychologist who made contributions to psychological testing and measurement. At the University of Illinois, Urbana, Cronbach produced many of his works: the "A ...

noted in ''

American Psychologist ''American Psychologist'' is a peer-reviewed academic journal published by the American Psychological Association. The journal publishes articles of broad interest to psychologists, including empirical reports and scholarly reviews covering science ...

'' (1957) that, "correlational psychology, though fully as old as experimentation, was slower to mature. It qualifies equally as a discipline, however, because it asks a distinctive type of question and has technical methods of examining whether the question has been properly put and the data properly interpreted." He would go on to say, "The correlation method, for its part, can study what man has not learned to control or can never hope to control ... A true federation of the disciplines is required. Kept independent, they can give only wrong answers or no answers at all regarding certain important problems."

Non-human: animals and machines

Psychometrics addresses ''human'' abilities, attitudes, traits, and educational evolution. Notably, the study of behavior, mental processes, and abilities of non-human ''animals'' is usually addressed by

comparative psychology Comparative psychology refers to the scientific study of the behavior and mental processes of non-human animals, especially as these relate to the phylogenetic history, adaptive significance, and development of behavior. Research in this area addr ...

, or with a continuum between non-human animals and the rest of animals by

evolutionary psychology Evolutionary psychology is a theoretical approach in psychology that examines cognition and behavior from a modern evolutionary perspective. It seeks to identify human psychological adaptations with regards to the ancestral problems they evol ...

. Nonetheless, there are some advocators for a more gradual transition between the approach taken for humans and the approach taken for (non-human) animals. The evaluation of abilities, traits and learning evolution of ''machines'' has been mostly unrelated to the case of humans and non-human animals, with specific approaches in the area of

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

. A more integrated approach, under the name of universal psychometrics, has also been proposed.

References

Bibliography

* *Michell, J. (1999). ''Measurement in Psychology''. Cambridge: Cambridge University Press. *Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press. *Reese, T.W. (1943). The application of the theory of physical measurement to the measurement of psychological magnitudes, with three experimental examples. ''Psychological Monographs, 55'', 1–89. * * *Thurstone, L.L. (1929). The Measurement of Psychological Value. In T.V. Smith and W.K. Wright (Eds.), ''Essays in Philosophy by Seventeen Doctors of Philosophy of the University of Chicago''. Chicago: Open Court. *Thurstone, L.L. (1959). ''The Measurement of Values''. Chicago: The University of Chicago Press. * *

External links

APA Standards for Educational and Psychological TestingInternational Personality Item PoolJoint Committee on Standards for Educational EvaluationThe Psychometrics Centre, University of Cambridge

Psychometric Society and Psychometrika homepageLondon Psychometric Laboratory
{{Statistics, applications Psychometrics Applied psychology Educational research Psychological testing Metrics Educational assessment and evaluation