HOME

TheInfoList



OR:

Content analysis is the study of
document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
s and communication artifacts, known as texts e.g. photos, speeches or essays. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers. Practices and philosophies of content analysis vary between academic disciplines. They all involve systematic reading or observation of
texts Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
or artifacts which are assigned labels (sometimes called codes) to indicate the presence of interesting, meaningful pieces of content. By systematically labeling the content of a set of
texts Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
, researchers can analyse patterns of content
quantitatively Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philos ...
using
statistical methods Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, or use qualitative methods to analyse meanings of content within
texts Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
. Computers are increasingly used in content analysis to automate the labeling (or coding) of documents. Simple computational techniques can provide descriptive data such as word frequencies and document lengths.
Machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
classifiers can greatly increase the number of texts that can be labeled, but the scientific utility of doing so is a matter of debate. Further, numerous computer-aided text analysis (CATA) computer programs are available that analyze text for predetermined linguistic, semantic, and psychological characteristics.


Goals

Content analysis is best understood as a broad family of techniques. Effective researchers choose techniques that best help them answer their substantive questions. That said, according to Klaus Krippendorff, six questions must be addressed in every content analysis: #Which data are analyzed? #How are the data defined? #From what population are data drawn? #What is the relevant context? #What are the boundaries of the analysis? #What is to be measured? The simplest and most objective form of content analysis considers unambiguous characteristics of the text such as
word frequencies A word list is a list of words in a lexicon, generally sorted by frequency of occurrence (either by graded levels, or as a ranked list). A word list is compiled by lexical frequency analysis within a given text corpus, and is used in corpus li ...
, the page area taken by a newspaper column, or the duration of a
radio Radio is the technology of communicating using radio waves. Radio waves are electromagnetic waves of frequency between 3  hertz (Hz) and 300  gigahertz (GHz). They are generated by an electronic device called a transmitter connec ...
or
television Television (TV) is a telecommunication medium for transmitting moving images and sound. Additionally, the term can refer to a physical television set rather than the medium of transmission. Television is a mass medium for advertising, ...
program. Analysis of simple word frequencies is limited because the meaning of a word depends on surrounding text. Key Word In Context (KWIC) routines address this by placing words in their textual context. This helps resolve ambiguities such as those introduced by
synonym A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
s and
homonym In linguistics, homonyms are words which are either; '' homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or '' homophones''—words that mean different things, but have the same pronunciat ...
s. A further step in analysis is the distinction between dictionary-based (quantitative) approaches and qualitative approaches. Dictionary-based approaches set up a list of categories derived from the frequency list of words and control the distribution of words and their respective categories over the texts. While methods in quantitative content analysis in this way transform observations of found categories into quantitative statistical data, the qualitative content analysis focuses more on the intentionality and its implications. There are strong parallels between qualitative content analysis and
thematic analysis Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning (or "themes") within qualitative data. Thematic analysis is often understood as a m ...
.


Qualitative and quantitative content analysis

Quantitative content analysis highlights frequency counts and statistical analysis of these coded frequencies. Additionally, quantitative content analysis begins with a framed hypothesis with coding decided on before the analysis begins. These coding categories are strictly relevant to the researcher's hypothesis. Quantitative analysis also takes a deductive approach. Examples of content-analytical variables and constructs can be found, for example, in the open-access databas
DOCA
This database compiles, systematizes, and evaluates relevant content-analytical variables of communication and political science research areas and topics. Siegfried Kracauer provides a critique of quantitative analysis, asserting that it oversimplifies complex communications in order to be more reliable. On the other hand, qualitative analysis deals with the intricacies of latent interpretations, whereas quantitative has a focus on manifest meanings. He also acknowledges an "overlap" of qualitative and quantitative content analysis. Patterns are looked at more closely in qualitative analysis, and based on the latent meanings that the researcher may find, the course of the research could be changed. It is inductive and begins with open research questions, as opposed to a hypothesis.


Content analysis in multi-method qualitative text analysis

Content analysis is frequently combined with other
qualitative methods Qualitative research is a type of research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This ...
to produce more robust, multi-layered findings. According to the framework developed by Alejandro and Zhao, useful combinations for content analysis include pairings with discourse analysis, thematic analysis, and Foucauldian discourse analysis. ;With
discourse analysis Discourse analysis (DA), or discourse studies, is an approach to the analysis of written, spoken, or sign language, including any significant semiotic event. The objects of discourse analysis (discourse, writing, conversation, communicative sy ...
(DA) :This pairing combines the breadth of content analysis with the depth of discourse analysis. Content analysis can systematically identify patterns across a large dataset, while DA provides a close reading of select texts to unpack the
linguistic Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
mechanisms and
socio-political Political sociology is an interdisciplinary field of study concerned with exploring how governance and society interact and influence one another at the micro to macro levels of analysis. Interested in the social causes and consequences of how ...
effects. For example, a researcher might use content analysis to capture examples of identity representation in a large number of texts, and then use DA to explain how wider social issues and discourses shape that representation. ;With
thematic analysis Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning (or "themes") within qualitative data. Thematic analysis is often understood as a m ...
(TA) :This combination uses the
deductive Deductive reasoning is the process of drawing valid inferences. An inference is valid if its conclusion follows logically from its premises, meaning that it is impossible for the premises to be true and the conclusion to be false. For example, th ...
, reductive nature of content analysis to describe what is in a text, alongside the inductive, holistic approach of TA to explore underlying meanings and explanations. For instance, a study could use content analysis to generalize the characteristics of a group's self-presentation on a web platform, and then use TA in a second stage to explore the reasons for these identified strategies, such as
privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
concerns or technoculture impacts. ;With
Foucauldian discourse analysis Foucauldian discourse analysis is a form of discourse analysis, focusing on power relationships in society as expressed through language and practices, and based on the theories of Michel Foucault. Overview Subject of analysis Besides focusi ...
(FDA) :This combination uses content analysis to systematically identify and quantify markers of a discourse across many texts. They are then interpreted through the macro-level theoretical lens of FDA. Content analysis provides the
empirical evidence Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law. There is no general agreement on how the ...
for the prevalence of certain themes or categories that constitute a broader Foucauldian discourse. For example, a study could use content analysis on media coverage of
natural disasters A natural disaster is the very harmful impact on a society or community brought by natural phenomenon or Hazard#Natural hazard, hazard. Some examples of natural hazards include avalanches, droughts, earthquakes, floods, heat waves, landslides ...
, and then use FDA to interpret the identified patterns, revealing the underlying systems of meaning and institutional relations in the
social construction Social constructionism is a term used in sociology, social ontology, and communication theory. The term can serve somewhat different functions in each field; however, the foundation of this theoretical framework suggests various facets of s ...
of a topic such as ''recovery''.


Codebooks

The data collection instrument used in content analysis is the codebook or coding scheme. In qualitative content analysis the codebook is constructed and improved ''during'' coding, while in quantitative content analysis the codebook needs to be developed and pretested for reliability and validity ''before'' coding. The codebook includes detailed instructions for human coders plus clear definitions of the respective concepts or variables to be coded plus the assigned values.


Computational tools

With the rise of common computing facilities like PCs, computer-based methods of analysis are growing in popularity. Answers to open ended questions, newspaper articles, political party manifestos, medical records or systematic observations in experiments can all be subject to systematic analysis of textual data. By having contents of communication available in form of machine readable texts, the input is analyzed for frequencies and coded into categories for building up inferences. Computer-assisted analysis can help with large, electronic data sets by cutting out time and eliminating the need for multiple human coders to establish inter-coder reliability. However, human coders can still be employed for content analysis, as they are often more able to pick out nuanced and latent meanings in text. A study found that human coders were able to evaluate a broader range and make inferences based on latent meanings.


Reliability and Validity

Robert Weber notes: "To make valid inferences from the text, it is important that the classification procedure be reliable in the sense of being consistent: Different people should code the same text in the same way". The validity, inter-coder reliability and intra-coder reliability are subject to intense methodological research efforts over long years. Neuendorf suggests that when human coders are used in content analysis at least two independent coders should be used.
Reliability Reliability, reliable, or unreliable may refer to: Science, technology, and mathematics Computing * Data reliability (disambiguation), a property of some disk arrays in computer storage * Reliability (computer networking), a category used to des ...
of human coding is often measured using a statistical measure of ''inter-coder reliability'' or "the amount of agreement or correspondence among two or more coders". Lacy and Riffe identify the measurement of inter-coder reliability as a strength of quantitative content analysis, arguing that, if content analysts do not measure inter-coder reliability, their data are no more reliable than the subjective impressions of a single reader. According to today's reporting standards, quantitative content analyses should be published with complete codebooks and for all variables or measures in the codebook the appropriate inter-coder or
inter-rater reliability In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent obse ...
coefficients should be reported based on empirical pre-tests. Furthermore, the validity of all variables or measures in the codebook must be ensured. This can be achieved through the use of established measures that have proven their validity in earlier studies. Also, the
content validity In psychometrics, content validity (also known as logical validity) refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective di ...
of the measures can be checked by experts from the field who scrutinize and then approve or correct coding instructions, definitions and examples in the codebook.


Kinds of text

There are five types of texts in content analysis: #
written text Writing is the act of creating a persistent representation of language. A writing system includes a particular set of symbols called a ''script'', as well as the rules by which they encode a particular spoken language. Every written language ...
, such as books and papers # oral text, such as speech and theatrical performance # iconic text, such as drawings, paintings, and icons # audio-visual text, such as TV programs, movies, and videos #
hypertext Hypertext is E-text, text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typic ...
s, which are texts found on the Internet


History

Content analysis is research using the categorization and classification of speech, written text, interviews, images, or other forms of communication. In its beginnings, using the first newspapers at the end of the 19th century, analysis was done manually by measuring the number of columns given a subject. The approach can also be traced back to a university student studying patterns in Shakespeare's literature in 1893. Over the years, content analysis has been applied to a variety of scopes.
Hermeneutics Hermeneutics () is the theory and methodology of interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts. As necessary, hermeneutics may include the art of understanding and communication. ...
and
philology Philology () is the study of language in Oral tradition, oral and writing, written historical sources. It is the intersection of textual criticism, literary criticism, history, and linguistics with strong ties to etymology. Philology is also de ...
have long used content analysis to interpret sacred and profane texts and, in many cases, to attribute texts' authorship and authenticity. In recent times, particularly with the advent of
mass communication Mass communication is the process of imparting and exchanging information through mass media to large population segments. It utilizes various forms of media as technology has made the dissemination of information more efficient. Primary examples o ...
, content analysis has known an increasing use to deeply analyze and understand media content and media logic. The political scientist
Harold Lasswell Harold Dwight Lasswell (February 13, 1902 – December 18, 1978) was an American political scientist and communications theorist. He earned his bachelor's degree in philosophy and economics and his Ph.D. from the University of Chicago. He was a ...
formulated the core questions of content analysis in its early-mid 20th-century mainstream version: "Who says what, to whom, why, to what extent and with what effect?". The strong emphasis for a quantitative approach started up by Lasswell was finally carried out by another "father" of content analysis,
Bernard Berelson Bernard Reuben Berelson (1912–1979) was an American behavioral scientist, known for his work on communication and mass media. He was a leading proponent of the broad idea of the "behavioral sciences", a field he saw as including areas such as ...
, who proposed a definition of content analysis which, from this point of view, is emblematic: "a research technique for the objective, systematic and quantitative description of the manifest content of communication". Quantitative content analysis has enjoyed a renewed popularity in recent years thanks to technological advances, being fruitfully applied in mass and personal communication research. Content analysis of textual
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
produced by new media, particularly
social media Social media are interactive technologies that facilitate the Content creation, creation, information exchange, sharing and news aggregator, aggregation of Content (media), content (such as ideas, interests, and other forms of expression) amongs ...
and
mobile devices A mobile device or handheld device is a computer small enough to hold and operate in hand. Mobile devices are typically battery-powered and possess a flat-panel display and one or more built-in input devices, such as a touchscreen or keypad. Mod ...
has become popular. These approaches take a simplified view of language that ignores the complexity of
semiosis Semiosis (, ), or sign process, is any form of activity, conduct, or process that involves signs, including the production of meaning. A sign is anything that communicates a meaning, that is not the sign itself, to the interpreter of the sig ...
, the process by which meaning is formed out of language. Quantitative content analysts have been criticized for limiting the scope of content analysis to simple counting, and for applying the measurement methodologies of the natural sciences without reflecting critically on their appropriateness to social science. Conversely, qualitative content analysts have been criticized for being insufficiently systematic and too impressionistic. Krippendorff argues that quantitative and qualitative approaches to content analysis tend to overlap, and that there can be no generalisable conclusion as to which approach is superior. Content analysis can also be described as studying traces, which are documents from past times, and artifacts, which are non-linguistic documents. Texts are understood to be produced by communication processes in a broad sense of that phrase—often gaining mean through abduction.


Latent and manifest content

Manifest content is readily understandable at its face value. Its meaning is direct. Latent content is not as overt, and requires interpretation to uncover the meaning or implication.


Uses

Holsti groups fifteen uses of content analysis into three basic
categories Category, plural categories, may refer to: General uses *Classification, the general act of allocating things to classes/categories Philosophy *Category of being * ''Categories'' (Aristotle) *Category (Kant) *Categories (Peirce) *Category (Vais ...
: * make
inference Inferences are steps in logical reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinct ...
s about the antecedents of a
communication Communication is commonly defined as the transmission of information. Its precise definition is disputed and there are disagreements about whether Intention, unintentional or failed transmissions are included and whether communication not onl ...
*describe and make inferences about characteristics of a communication *make inferences about the
effects Effect may refer to: * A result or change of something ** List of effects ** Cause and effect, an idiom describing causality Pharmacy and pharmacology * Drug effect, a change resulting from the administration of a drug ** Therapeutic effect, ...
of a communication. He also places these uses into the context of the basic communication
paradigm In science and philosophy, a paradigm ( ) is a distinct set of concepts or thought patterns, including theories, research methods, postulates, and standards for what constitute legitimate contributions to a field. The word ''paradigm'' is Ancient ...
. The following table shows fifteen uses of content analysis in terms of their general purpose, element of the communication paradigm to which they apply, and the general question they are intended to answer. As a counterpoint, there are limits to the scope of use for the procedures that characterize content analysis. In particular, if access to the goal of analysis can be obtained by direct means without material interference, then direct measurement techniques yield better data. Thus, while content analysis attempts to quantifiably describe ''communications'' whose features are primarily categorical——limited usually to a nominal or ordinal scale——via selected conceptual units (the ''unitization'') which are assigned values (the ''categorization'') for ''enumeration'' while monitoring ''intercoder reliability'', if instead the target quantity manifestly is already directly measurable——typically on an interval or ratio scale——especially a continuous physical quantity, then such targets usually are not listed among those needing the "subjective" selections and formulations of content analysis. For example (from mixed research and clinical application), as medical images ''communicate'' diagnostic features to physicians,
neuroimaging Neuroimaging is the use of quantitative (computational) techniques to study the neuroanatomy, structure and function of the central nervous system, developed as an objective way of scientifically studying the healthy human brain in a non-invasive ...
's
stroke Stroke is a medical condition in which poor cerebral circulation, blood flow to a part of the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemor ...
(infarct) volume scale called ASPECTS is ''unitized'' as 10 qualitatively delineated (unequal) brain regions in the
middle cerebral artery The middle cerebral artery (MCA) is one of the three major paired cerebral artery, cerebral arteries that supply blood to the cerebrum. The MCA arises from the internal carotid artery and continues into the lateral sulcus where it then branches an ...
territory, which it ''categorizes'' as being at least partly versus not at all infarcted in order to ''enumerate'' the latter, with published series often assessing ''intercoder reliability'' by Cohen's kappa. The foregoing ''italicized operations'' impose the uncredited ''form'' of content analysis onto an estimation of infarct extent, which instead is easily enough and more accurately measured as a volume directly on the images. ("Accuracy ... is the highest form of reliability.") The concomitant clinical assessment, however, by the National Institutes of Health Stroke Scale (NIHSS) or the modified Rankin Scale (mRS), retains the necessary form of content analysis. Recognizing potential limits of content analysis across the contents of language and images alike, Klaus Krippendorff affirms that "comprehen
ion An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by convent ...
... may ... not conform at all to the process of classification and/or counting by which most content analyses proceed," suggesting that content analysis might materially distort a message.


Developing the initial coding scheme

The process of the initial coding scheme or approach to coding is contingent on the particular content analysis approach selected. Through a directed content analysis, the scholars draft a preliminary coding scheme from pre-existing theory or assumptions. While with the conventional content analysis approach, the initial coding scheme developed from the data.


Conventional process of coding

With either approach above, researchers may immerse themselves into the data to obtain an overall picture. A consistent and clear unit of coding is vital, with the choices ranging from a single word to several paragraphs and from texts to iconic symbols. Lastly, researchers construct the relationships between codes by sorting out them within specific categories or themes.


See also

* Donald Wayne Foster *
Hermeneutics Hermeneutics () is the theory and methodology of interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts. As necessary, hermeneutics may include the art of understanding and communication. ...
*
Text mining Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
* '' The Polish Peasant in Europe and America'' * Transition words *
Video content analysis Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events. This technical capability is used ...
*
Grounded theory Grounded theory is a systematic methodology that has been largely applied to qualitative research conducted by social scientists. The methodology involves the construction of hypotheses and theories through the collecting and analysis of data. G ...


References


Further reading

* * * * * * {{DEFAULTSORT:Content Analysis Quantitative research Qualitative research Hermeneutics