What Is Content Validity? | Definition & Examples

Content validity refers to the extent to which a test or instrument accurately represents all aspects of the theoretical concept it aims to measure. This concept, also known as a construct, often cannot be measured directly.

Content validity is critical for making informed decisions and drawing accurate conclusions based on the research data.

Content validity example
A psychology professor creates a test to measure students’ knowledge of primary sources. The test consists of 10 multiple-choice questions, and one of the questions is: “What is the difference between primary and secondary sources?”

This question contributes to the test’s high content validity because it directly addresses the construct of knowledge about primary sources, specifically the difference between primary sources and other types of information.

In contrast, if the test included a question like “What is the capital of England?” (which has nothing to do with primary sources), that would be an example of poor content validity because it isn’t relevant to the construct being measured.

The test as a whole has high content validity if:

  • The test’s questions cover every topic relevant to primary sources.
  • The test doesn’t contain questions that are irrelevant to knowledge of primary sources.

Content validity definition

Content validity is a type of validity in measurement that refers to the degree to which an assessment tool, such as a test, questionnaire, or survey, measures the specific aspects or characteristics of a more abstract concept.

In other words, content validity answers the following question: “Does the assessment tool cover all the relevant aspects of the concept or phenomenon it is supposed to measure?”

Content validity is concerned with ensuring that the assessment tool:

  1. Covers all the important aspects of the concept or phenomenon
  2. Does not include irrelevant or extraneous content
  3. Represents the concept or phenomenon accurately
Content validity example
For example, if you’re creating a test to measure someone’s knowledge of basic arithmetic operations, content validity would ensure that the test questions cover all the essential topics, such as addition, subtraction, multiplication, and division, and do not include irrelevant mathematical topics, like geometry or calculus.
Note
There are four types of validity in measurement: construct validity, face validity, criterion validity, and content validity.

  • Construct validity evaluates whether the test accurately measures the intended concept. It is often considered the primary concern of validity; other types of validity provide evidence of construct validity.
  • Face validity assesses whether the instrument content appears suitable for its intended purpose.
  • Criterion validity examines whether the test accurately measures the intended outcome. This type of validity has two subtypes: predictive validity and concurrent validity.

Content validity in psychology

Content validity is especially important for research in the field of psychology. Most psychological constructs (e.g., love, anxiety, depression) are complex and can’t be measured directly. Therefore, they have to be operationalized accurately to make sure the results are valid and unbiased.

Content validity example in psychology
A researcher wants to develop a questionnaire to measure anxiety levels in college students before creating an experimental design to investigate how to reduce these levels.

They want to ensure that the survey is a valid measure of anxiety, so they carefully select a range of questions that cover different aspects of anxiety, such as:

  • Worries about school performance
  • Fear of failure
  • Physical symptoms like rapid heartbeat or trembling
  • Avoidance behaviors

The researcher ensures that the questionnaire includes a mix of questions that assess different dimensions of anxiety. They establish content validity by reviewing the survey with experts in the field of anxiety. They also draw a systematic sample to conduct a pilot study with a small group of students.

Based on the results, the researcher removes and adds questions to further refine the new instrument before conducting large-scale survey research.

Content validity example

Instruments with low content validity can be improved by, for example, adding questions that address different relevant aspects of the construct.

Content validity example (psychology)
A researcher wants to investigate stress levels among teachers. They develop a questionnaire with Likert scale questions to assess stress levels.

They ask experts in the field of stress to review the survey questions. These experts unanimously conclude that the instrument has low content validity because all questions focus on either sleep or exercise. The construct “stress” consists of many more aspects, such as social life, financial security, and physical health.

The researcher improves the survey by adding questions related to other aspects of stress to improve its content validity.

Construct vs content validity

Construct validity and content validity are two important types of validity in research, but they are slightly different.

  • Construct validity focuses on ensuring that a combination of indicators accurately measures a construct that’s not directly measurable. Other forms of validity, including content validity, criterion validity, face validity, convergent validity (how well a test correlates to other measures of the same construct), and discriminant validity (to what extent a test negatively correlates with measures of unrelated constructs) provide evidence of construct validity.
  • Content validity also focuses on ensuring that a test is measuring what it should, but it focuses on a specific aspect of validity––whether a test measures all relevant aspects of the concept being measured.
Construct validity vs content validity example
A researcher creates a survey to assess depression in elderly people.

  • The researcher has a clinical psychologist review the survey to ensure that it captures various aspects of depression. The psychologist determines that the survey addresses various facets of depression, like physical, emotional, cognitive, and behavioral symptoms. They conclude that the survey has high content validity.
  • The researcher then determines that the survey outcomes highly correlate with intelligence, a construct unrelated to depression, but only weakly correlates with a measure of anxiety, something that should be related to depression. These findings indicate that, despite the survey’s high content validity, it’s convergent validity and discriminant validity (and thus its construct validity) are low.

Face validity vs content validity

Face validity and content validity are often confused.

  • Face validity refers to the extent to which an instrument seems to measure what it claims to measure. It’s concerned with external appearance and plausibility, but there is no formal assessment.
  • Content validity refers to the extent to which an instrument actually represents the construct it claims to represent.
Face validity vs content validity example
A school creates a test to assess students’ creativity for a paid design internship.

The test consists of two parts:

  • Part 1: A simple drawing task where participants are asked to draw a picture of their favorite animal
  • Part 2: A question asking participants to write a short essay on the importance of peer feedback in an educational setting

At first glance, this test seems to measure creativity because it involves drawing and writing, which are associated with creativity. The test has high face validity.

However, the test only measures basic artistic skills (part 1) and writing abilities (part 2), which are not necessarily related to creativity. Moreover, there are many other types of creativity (e.g., problem solving) that this test doesn’t assess. It has low content validity.

How to measure content validity

You can measure content validity by following three steps:

Step 1: Collect data from experts in the field

You establish content validity by collecting input from subject matter experts. These are people who can evaluate the content of your instrument because they have experience with the subject matter and conducting research.

The panel of experts determines whether each question is “unnecessary,” “useful but not necessary,” or “necessary” for each question. The more experts who establish a question’s necessity, the higher the content validity of that item.

Tip
In some cases, you might not be able to request input from a panel of experts. If you’re a student, you can replace the panel of experts with a panel of peers to collect peer feedback. Make sure to mention this in the discussion or limitations section of your research paper, essay, or dissertation.

You can also enlist the help of generative AI tools, such as ChatGPT. Be very cautious when interpreting any AI tool’s responses since the tools tend to make factual mistakes.

In any case, make sure to add accurate in-text citations or references. When in doubt, you can use our free Citation Generator to create error-free citations.

Step 2: Calculate the content validity ratio (CVR)

You can use the following formula to calculate the content validity ratio for each item on the test.

  • Content validity ratio = (ne – N/2) / (N/2)
    • ne = number of panelists indicating “necessary”
    • N = total number of panelists
Content validity ratio example
You ask a panel of 6 experts to assess a test consisting of 5 questions. The first item was rated as “necessary” by 5 experts.

The content validity ratio for this item is:

  • (ne – N/2) / (N/2)
  • (5 – 6/2) / (6/2)
  • 0.67

Content validity ratio example

The outcomes range from -1 to +1. Any value above 0 indicates that at least half of the experts have deemed the question necessary. A higher value indicates a higher content validity.

The agreement could be coincidental, so you need to use a critical values table to determine if the content validity ratio for a specific item falls below a minimum value (critical value). If it falls below the critical value, you can’t rule out coincidence.

Critical value table for content validity ratio
Number of experts in panel Critical value
5 0.99
6 0.99
7 0.99
8 0.75
9 0.78
10 0.62
11 0.59
12 0.56
20 0.42
30 0.33
40 0.29

Step 3: Calculate the content validity index (CVI)

The content validity index summarizes the content validity of the entire instrument. The index corresponds to the average content validity ratio of all items on the test. The closer the index is to 1, the higher the content validity.

The formula for the content validity index is:

  • ∑ratios / N
    • ∑ratios = total of all content validity ratios for the individual items
    • N = total number of items
Content validity index example
Content validity index (CVI) =

  • (0.67 + 0 + 0.33 + 1 – 0.67) / 5
  • 1.33 / 5
  • 0.27

The critical values table shows a critical value of 0.99 for a panel of 6 experts. The content validity index of 0.27 falls below this number, which means it’s too low. Since the test does not accurately measure the intended concept, the questions with a low CVR need to be adjusted to increase the overall CVI.

Frequently asked questions about content validity

What is the difference between content validity and predictive validity?

Content validity and predictive validity are two types of validity in research:

  • Content validity ensures that an instrument accurately measures all elements of the construct it intends to measure.
    • A test designed to measure anxiety has high content validity if its questions cover all relevant aspects of the construct “anxiety.”
  • Predictive validity demonstrates that a measure can forecast future behavior, performance, or outcomes. It is a subtype of criterion validity.
    • A test designed to predict student retention has high predictive validity if it accurately predicts which students still participate in the study program 2 years later.
What is the difference between content and criterion validity?

Content validity and criterion validity are two types of validity in research:

  • Content validity ensures that an instrument measures all elements of the construct it intends to measure.
    • A survey to investigate depression has high content validity if its questions cover all relevant aspects of the construct “depression.”
  • Criterion validity ensures that an instrument corresponds with other “gold standard” measures of the same construct.
    • A shortened version of an established anxiety assessment instrument has high criterion validity if the outcomes of the new version are similar to those of the original version.

Is this article helpful?
Julia Merkus, MA

Julia has a bachelor in Dutch language and culture and two masters in Linguistics and Language and speech pathology. After a few years as an editor, researcher, and teacher, she now writes articles about her specialist topics: grammar, linguistics, methodology, and statistics.