psychometrics Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and ...

, content validity (also known as logical validity) refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the

affective Affect, in psychology, refers to the underlying experience of feeling, emotion or mood. History The modern conception of affect developed in the 19th century with Wilhelm Wundt. The word comes from the German ''Gefühl'', meaning "feeling." ...

dimension of depression but fails to take into account the

behavioral Behavior (American English) or behaviour (British English) is the range of actions and mannerisms made by individuals, organisms, systems or artificial entities in some environment. These systems can include other systems or organisms as well ...

dimension. An element of subjectivity exists in relation to determining content validity, which requires a degree of agreement about what a particular

personality trait In psychology, trait theory (also called dispositional theory) is an approach to the study of human personality. Trait theorists are primarily interested in the measurement of ''traits'', which can be defined as habitual patterns of behaviour, tho ...

such as extraversion represents. A disagreement about a personality trait will prevent the gain of a high content validity.

Description

Content validity is different from

face validity Face validity is the extent to which a test is subjectively viewed as covering the concept it purports to measure. It refers to the transparency or relevance of a test as it appears to test participants. In other words, a test can be said to have fa ...

, which refers not to what the test actually measures, but to what it superficially appears to measure. Face validity assesses whether the test "looks valid" to the examinees who take it, the administrative personnel who decide on its use, and other technically untrained observers. Content validity requires the use of recognized subject matter experts to evaluate whether test items assess defined content and more rigorous

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

s than does the assessment of face validity. Content validity is most often addressed in academic and vocational testing, where test items need to reflect the knowledge actually required for a given topic area (e.g., history) or job skill (e.g., accounting). In clinical settings, content validity refers to the correspondence between test items and the symptom content of a syndrome.

Measurement

One widely used method of measuring content validity was developed by C. H. Lawshe. It is essentially a method for gauging agreement among raters or judges regarding how essential a particular item is. In an article regarding pre-employment testing, proposed that each of the subject matter expert raters (SMEs) on the judging panel respond to the following question for each item: "Is the skill or knowledge measured by this item 'essential,' 'useful, but not essential,' or 'not necessary' to the performance of the job?" According to Lawshe, if more than half the panelists indicate that an item is essential, that item has at least some content validity. Greater levels of content validity exist as larger numbers of panelists agree that a particular item is essential. Using these assumptions, Lawshe developed a formula termed the content validity ratio:

CVR = (n_e - N/2)/(N/2)

where

CVR=

content validity ratio,

n_e=

number of SME panelists indicating "essential",

N=

total number of SME panelists. This formula yields values which range from +1 to -1; positive values indicate that at least half the SMEs rated the item as essential. The mean CVR across items may be used as an indicator of overall test content validity. provided a table of critical values for the CVR by which a test evaluator could determine, for a pool of SMEs of a given size, the size of a calculated CVR necessary to exceed chance expectation. This table had been calculated for Lawshe by his friend, Lowell Schipper. Close examination of this published table revealed an anomaly. In Schipper's table, the critical value for the CVR increases monotonically from the case of 40 SMEs (minimum value = .29) to the case of 9 SMEs (minimum value = .78) only to unexpectedly drop at the case of 8 SMEs (minimum value = .75) before hitting its ceiling value at the case of 7 SMEs (minimum value = .99). However, when applying the formula to 8 raters, the result from 7 Essential and 1 other rating yields a CVR of .75. If .75 was not the critical value, then 8 of 8 raters of Essential would be needed that would yield a CVR of 1.00. In that case, to be consistent with the ascending order of CVRs the value for 8 raters would have to be 1.00. That would violate the same principle because you would have the "perfect" value required for 8 raters, but not for ratings at other numbers of raters at either higher or lower than 8 raters. Whether this departure from the table's otherwise monotonic progression was due to a calculation error on Schipper's part or an error in typing or typesetting is unclear. , seeking to correct the error, found no explanation in Lawshe's writings nor any publications by Schipper describing how the table of critical values was computed. Wilson and colleagues determined that the Schipper values were close approximations to the normal approximation to the binomial distribution. By comparing Schipper's values to the newly calculated binomial values, they also found that Lawshe and Schipper had erroneously labeled their published table as representing a one-tailed test when in fact the values mirrored the binomial values for a two-tailed test. Wilson and colleagues published a recalculation of critical values for the content validity ratio providing critical values in unit steps at multiple alpha levels. The table of values is the following one:

References

{{reflist

External links

* Handbook of Management Scales, a Wikibook containing previously used multi-item scales to measure constructs in the empirical management research literature. For many scales, content validity is discussed. Validity (statistics)

Description

Measurement

See also

References

External links