:''Concerning rating scales as systems of educational marks, see articles about education in different countries (named "Education in ..."), for example,
Education in Ukraine.''
:''Concerning rating scales used in the practice of medicine, see articles about diagnoses, for example,
Major depressive disorder
Major depressive disorder (MDD), also known as clinical depression, is a mental disorder characterized by at least two weeks of pervasive low mood, low self-esteem, and loss of interest or pleasure in normally enjoyable activities. Introdu ...
.''
A rating scale is a set of categories designed to elicit information about a
quantitative
Quantitative may refer to:
* Quantitative research, scientific investigation of quantitative properties
* Quantitative analysis (disambiguation)
* Quantitative verse, a metrical system in poetry
* Statistics, also known as quantitative analysis ...
or a
qualitative attribute. In the
social sciences
Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of soci ...
, particularly
psychology
Psychology is the scientific study of mind and behavior. Psychology includes the study of conscious and unconscious phenomena, including feelings and thoughts. It is an academic discipline of immense scope, crossing the boundaries betwe ...
, common examples are the
Likert response scale and 1-10 rating scales in which a person selects the number which is considered to reflect the perceived quality of a
product
Product may refer to:
Business
* Product (business), an item that serves as a solution to a specific consumer problem.
* Product (project management), a deliverable or set of deliverables that contribute to a business solution
Mathematics
* Produ ...
.
Background
A rating scale is a method that requires the rater to assign a value, sometimes numeric, to the rated object, as a measure of some rated attribute
Types of rating scales
All rating scales can be classified into one of these types:
# Numeric Rating Scale (NRS)
# Verbal Rating Scale (VRS)
# Visual Analogue Scale (VAS)
# Likert
# Graphic rating scale
# Descriptive graphic rating scale
Some data are measured at the
ordinal level
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
. Numbers indicate the relative position of items, but not the magnitude of difference. Attitude and opinion scales are usually ordinal; one example is a
Likert response scale:
; Statement: e.g. "I could not live without my computer".
; Response options:
:# Strongly disagree
:# Disagree
:# Neutral
:# Agree
:# Strongly agree
Some data are measured at the
interval level. Numbers indicate the magnitude of difference between items, but there is no absolute zero point. A good example is a Fahrenheit/Celsius temperature scale where the differences between numbers matter, but placement of zero does not.
Some data are measured at the
ratio level. Numbers indicate magnitude of difference and there is a fixed zero point. Ratios can be calculated. Examples include age, income, price, costs, sales revenue, sales volume and market share.
More than one rating scale question is required to
measure
Measure may refer to:
* Measurement, the assignment of a number to a characteristic of an object or event
Law
* Ballot measure, proposed legislation in the United States
* Church of England Measure, legislation of the Church of England
* Mea ...
an attitude or perception due to the requirement for statistical comparisons between the categories in the
polytomous Rasch model
The polytomous Rasch model is generalization of the dichotomous Rasch model. It is a measurement model that has potential application in any context in which the objective is to measure a trait or ability through a process in which responses to ...
for ordered categories. In terms of
Classical test theory
Classical test theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. It is a theory of testing based on the idea that a person's observe ...
, more than one question is required to obtain an index of internal reliability such as
Cronbach's alpha
Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient that provides a method of measuring internal consistency of tests and measures. Nume ...
, which is a basic criterion for assessing the effectiveness of a rating scale and, more generally, a psychometric instrument.
Rating scales used online
Rating scales are used widely online in an attempt to provide indications of consumer opinions of products. Examples of sites which employ ratings scales are
IMDb
IMDb (an abbreviation of Internet Movie Database) is an online database of information related to films, television series, home videos, video games, and streaming content online – including cast, production crew and personal biographies, ...
,
Epinions.com
Epinions.com was a general consumer review site established in 1999. Epinions was acquired in 2003 by DealTime, later Shopping.com, which was acquired by eBay in 2005. Epinions users could access reviews about a variety of items. On 25 March 2014 ...
,
Yahoo! Movies
Yahoo! Movies (formerly Upcoming Movies), provided by the Yahoo! network, is home to a large collection of information on movies, past and new releases, trailers and clips, box office information, and showtimes and movie theater information. Yaho ...
,
Amazon.com
Amazon.com, Inc. ( ) is an American multinational technology company focusing on e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. It has been referred to as "one of the most influential economi ...
,
BoardGameGeek
BoardGameGeek (BGG) is an online forum for board gaming hobbyists and a game database that holds reviews, images and videos for over 125,600 different tabletop games, including European-style board games, wargames, and card games. In addition to ...
and
TV.com
TV.com was a website owned by Red Ventures that covered television series and episodes with a focus on English-language shows made or broadcast in Australia, Canada, Ireland, Japan, New Zealand, the United States, and the United Kingdom. Origin ...
which use a rating scale from 0 to 100 in order to obtain "personalised film recommendations".
In almost all cases, online rating scales only allow one rating per user per product, though there are exceptions such as ''Ratings.net'', which allows users to rate products in relation to several qualities. Most online rating facilities also provide few or no qualitative descriptions of the rating categories, although again there are exceptions such as ''Yahoo! Movies'', which labels each of the categories between F and A+ and BoardGameGeek, which provides explicit descriptions of each category from 1 to 10. Often, only the top and bottom category is described, such as on ''IMDbs online rating facility.
Validity
Validity refers to how well a tool measures what it intends to measure.
With each user rating a product only once, for example in a category from 1 to 10, there is no means for evaluating internal
reliability
Reliability, reliable, or unreliable may refer to:
Science, technology, and mathematics Computing
* Data reliability (disambiguation), a property of some disk arrays in computer storage
* High availability
* Reliability (computer networking), a ...
using an index such as
Cronbach's alpha
Cronbach's alpha (Cronbach's \alpha), also known as tau-equivalent reliability (\rho_T) or coefficient alpha (coefficient \alpha), is a reliability coefficient that provides a method of measuring internal consistency of tests and measures. Nume ...
. It is therefore impossible to evaluate the
validity
Validity or Valid may refer to:
Science/mathematics/statistics:
* Validity (logic), a property of a logical argument
* Scientific:
** Internal validity, the validity of causal inferences within scientific studies, usually based on experiments
** ...
of the ratings as measures of viewer perceptions. Establishing validity would require establishing both reliability and accuracy (i.e. that the ratings represent what they are supposed to represent). The degree of validity of an instrument is determined through the application of logic/or statistical procedures. "A measurement procedure is valid to the degree that if measures what it proposes to measure."
Another fundamental issue is that online ratings usually involve convenience
sampling much like television polls, i.e. they represent only the opinions of those inclined to submit ratings.
Validity is concerned with different aspects of the measurement process. Each of these types uses logic, statistical verification or both to determine the degree of validity and has special value under certain conditions. Types of validity include content validity, predictive validity, and construct validity.
Sampling
Sampling errors can lead to results which have a specific bias, or are only relevant to a specific subgroup. Consider this example: suppose that a film only appeals to a specialist audience—90% of them are devotees of this genre, and only 10% are people with a general interest in movies. Assume the film is very popular among the audience that views it, and that only those who feel most strongly about the film are inclined to rate the film online; hence the raters are all drawn from the devotees. This combination may lead to very high ratings of the film, which do not generalize beyond the people who actually see the film (or possibly even beyond those who actually rate it).
Qualitative description
Qualitative description of categories improve the usefulness of a rating scale. For example, if only the points 1-10 are given without description, some people may select 10 rarely, whereas others may select the category often. If, instead, "10" is described as "near flawless", the category is more likely to mean the same thing to different people. This applies to all categories, not just the extreme points.
The above issues are compounded, when aggregated statistics such as averages are used for lists and rankings of products. User ratings are at best
ordinal categorizations. While it is not uncommon to calculate averages or means for such data, doing so cannot be justified because in calculating averages, equal intervals are required to represent the same difference between levels of perceived quality. The key issues with aggregate data based on the kinds of rating scales commonly used online are as follow:
*Averages should not be calculated for data of the kind collected.
*It is usually impossible to evaluate the reliability or validity of user ratings.
*Products are not compared with respect to explicit, let alone common, criteria.
*Only users inclined to submit a rating for a product do so.
*Data are not usually published in a form that permits evaluation of the product ratings.
More developed methodologies include
Choice Modelling Choice modelling attempts to model the decision process of an individual or segment via revealed preferences or stated preferences made in a particular context or contexts. Typically, it attempts to use discrete choices (A over B; B over A, B & C) ...
or
Maximum Difference methods, the latter being related to the
Rasch model
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between the respondent's abilities, ...
due to the connection between Thurstone's law of comparative judgement and the Rasch model.
Rating scale reduction
An international collaborative research effort
[ ] has introduced a data-driven algorithm for a rating scale reduction. It is based on the area under the
receiver operating characteristic
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of ...
.
See also
*
Likert scale
A Likert scale ( , commonly mispronounced as ) is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term (or more fully the ...
*
MaxDiff
The MaxDiff is a long-established mathematical theory with very specific assumptions about how people make choices: it assumes that respondents evaluate all possible pairs of items within the displayed set and choose the pair that reflects the max ...
*
Questionnaire
A questionnaire is a research
Research is "creativity, creative and systematic work undertaken to increase the stock of knowledge". It involves the collection, organization and analysis of evidence to increase understanding of a topic, ...
*
Questionnaire construction
Questionnaire construction refers to the design of a questionnaire to gather statistically useful information about a given topic. When properly constructed and responsibly administered, questionnaires can provide valuable data about any given subj ...
*
Rating scales for depression
A depression rating scale is a psychometric instrument (tool), usually a questionnaire whose wording has been validated with experimental evidence, having descriptive words and phrases that indicate the severity of depression for a time period ...
*
Semantic differential
*
Voting system
An electoral system or voting system is a set of rules that determine how elections and referendums are conducted and how their results are determined. Electoral systems are used in politics to elect governments, while non-political elections ma ...
*
Receiver operating characteristic
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of ...
References
{{reflist
External links
UEQ Semantic differential for measuring the User Experience
Psychometrics
Rating
Recommender systems