Construct Validity | Definition & Examples

Construct validity refers to how well a test or instrument measures the theoretical concept it’s supposed to. Demonstrating construct validity is central to establishing the overall validity of a method.

Construct validity tells researchers whether a measurement instrument properly reflects a construct—a phenomenon that cannot be directly measured, such as happiness or stress. Such constructs are common in psychology and other social sciences.

There is no single test to evaluate construct validity. Instead researchers accumulate evidence for it by assessing other types of validity. These can include face validity, content validity, criterion validity, convergent validity, and divergent validity.

Construct validity example
A team of researchers would like to measure cell phone addiction in teenagers. They develop a questionnaire that asks teenagers about their phone-related attitudes and behaviors. To gauge whether their questionnaire is actually measuring phone addiction (i.e., whether it has construct validity), they perform the following assessments:

  • The team evaluates face validity by reading through their questionnaire and asking themselves whether each question seems related to cell phone use.
  • The team measures criterion validity by comparing participants’ questionnaire results with their average daily screen time. They expect to see a high correlation between these two variables.
  • Finally, the researchers examine divergent validity by comparing their questionnaire results to those of a standard creativity test. Because the constructs of phone addiction and creativity should theoretically be unrelated, the researchers expect to see a low correlation between these test results.

If the researchers successfully demonstrate the face validity, criterion validity, and divergent validity of their questionnaire, they have provided compelling evidence that their new measure has high construct validity.

What is a construct?

Measurement is a key part of the research process. However, not all phenomena can be directly measured or observed. Phenomena that cannot be directly measured are called constructs.

Constructs are common in psychology and social sciences. Common examples of constructs include self-esteem, happiness, intelligence, and stress.

Studying something that cannot be directly measured is inherently challenging. To get around this issue, scientists must operationalize a construct—that is, they must clearly define how they will capture it using measurable variables that are related to their construct of interest.

Construct operationalization example
Imagine you’re trying to determine someone’s stress levels. To do so, you must operationalize (i.e., define how you’ll measure) the construct of stress. You might use the following approaches:

  • Psychological self-report: ask your participant to rate their current stress level on a scale from 1 (not stressed) to 10 (extremely stressed)
  • Physiological measure: use a saliva sample to measure your participant’s cortisol levels
  • Behavioral assessment: record how frequently your participant fidgets or exhibits other stress-related behaviors during a stressful task

Though none of these approaches directly measures stress, all capture variables that should in theory be related to it.

What is construct validity?

Construct validity is an assessment of whether a test measures the thing it’s supposed to. Construct validity is especially important in fields like psychology and other social sciences, which contain constructs that cannot be directly measured.

Instead of measuring constructs themselves, researchers must create tests to measure variables that are theoretically related to these constructs. If these tests actually measure the construct they are supposed to, they have construct validity.

Construct validity example
Lara and Ashish are both trying to measure participants’ intelligence. They each choose different operationalizations of this construct:

Lara has noticed that many of her intelligent friends read lots of books. She therefore decides to assess intelligence by asking participants how many books they have read over the past year.

Lara’s measure lacks construct validity. Though there may be an association between reading and intelligence, her measure fails to account for other facets of intelligence (e.g., logical reasoning) and potential confounds, such as spare time and access to education.

Ashish instead reviews how other studies have measured intelligence. They decide to use the Wechsler Adult Intelligence Scale, which measures various dimensions of cognitive ability. This well-established measure does have construct validity.

Many constructs have multiple, interrelated dimensions that must be considered. Well-designed measures must capture these dimensions without inadvertently measuring related concepts.

Construct validity example
A psychologist is developing a questionnaire to measure self-esteem. They develop a list of questions to measure self-esteem:

1. Do you feel you are able to do things at least as well as other people?

2. Do you often prefer being alone to being with others?

3. Do you feel you have a number of good qualities?

4. Overall, are you satisfied with yourself?

5. Are you ashamed of events in your past?

However, upon further review, the psychologist realizes that not all of these questions are relevant. Specifically, Question 2 measures introversion, and Question 5 measures shame.

Though Questions 2 and 5 may be indirectly related to self-esteem, they should be removed to improve the construct validity of this questionnaire.

How to determine construct validity

There is no direct measure of construct validity. Instead, researchers accumulate evidence that a test is measuring the construct it’s supposed to. Other forms of validity can be used as evidence of construct validity. These can be separated into subjective and quantitative  categories.

Subjective evidence of construct validity

Subjective evidence relies on expert opinions and knowledge rather than concrete data. Two types of validity fall under this category:

  • Face validity is whether an instrument or test seems to measure what it’s supposed to. In the self-esteem example above, the psychologist assesses face validity by ensuring that each question is related to self-esteem.
  • Content validity assesses whether an instrument measures all aspects of a construct. For example, a measure of intelligence must include all dimensions of this construct (verbal reasoning, spatial intelligence, critical thinking, and so on).

Face and content validity are not quantified by a number; both instead rely on expert opinions to determine whether they are satisfied.

Quantitative evidence of construct validity

Quantitative evidence is associated with a numerical score. The following forms of validity are considered quantitative and involve computing the correlation between the test being validated and some other variable:

  • Criterion validity determines whether a test corresponds to a “gold standard” measure of the same construct. It is quantified as the correlation between the unvalidated test and the gold standard. Concurrent validity and predictive validity are the two types of criterion validity.
  • Convergent validity captures how well a test correlates to other measures of the same or a similar construct. For example, a high school senior’s GPA should be highly correlated with their SAT score, as both measure the construct of academic performance.
  • Divergent validity instead assesses whether a test captures an unrelated construct. For example, though they may share similarities, a measure of introversion should not be the same as a measure of social anxiety. Measures of these constructs should only be weakly correlated.

These quantitative forms of validity can be assessed by considering the strength and statistical significance of the correlation between the test and the measure it’s being compared to.

  • A correlation coefficient greater than .7 is considered strong
  • An alpha level of p < .05 is generally considered statistically significant

Other approaches

  • Known-groups validity compares the results of two groups expected to differ on a measure or test. For example, a test of physical fitness could be validated by comparing the scores of professional athletes and nonathletes. A significant difference between group scores (as determined using a t test or something similar) would support the construct validity of the test.
  • Factor analysis is a statistical technique used to determine which dimensions a test or measure captures. Questions that are answered similarly are grouped into clusters, or factors. For example, in a personality test, all questions related to introversion would be grouped together.

It is not always possible to evaluate every form of validity for a new test. For example, criterion validity relies on the existence of a “gold standard” measure. If there is no gold standard, it is impossible to establish criterion validity.

Instead, a researcher should evaluate as many types of validity as is feasible to ensure that their test or instrument is measuring what it’s supposed to.

Note: types of construct validity
Some sources state that construct validity is one of four types of validity; the other three are content validity, face validity, and criterion validity. Convergent and divergent validity are often listed as the two subtypes of construct validity.

However, modern theories of validity consider construct validity to be the main focus when evaluating a test or measure. Within this framework, the types of validity discussed above are all considered evidence of construct validity.

Construct validity example

Consider the following example of how a psychology researcher might demonstrate the construct validity of their new measure.

Construct validity example
A psychology researcher is interested in studying how well people adapt to remote work. They create a test they call the Remote Work Flexibility Index (RWFI). It contains items like “I enjoy the challenge of working with people in different timezones” and “I am comfortable interacting with my colleagues using digital technology.”

To evaluate the construct validity of their measure, the researcher assesses several forms of validity:

Content validity: An organizational psychology expert reviews the test. They determine that the RWFI covers all relevant elements of remote work flexibility, demonstrating its  content validity.

Convergent validity: The researcher examines how the RWFI corresponds with a measure of adaptability, which should be strongly correlated to remote work flexibility. The two are strongly correlated, indicating that the RWFI has convergent validity.

Divergent validity: Remote work flexibility should not correspond to a construct like self-esteem. Ther researcher finds that the RWFI is not correlated to a self-esteem questionnaire, suggesting that the RWFI has divergent validity.

Criterion validity: The researcher assesses the correlation between RWFI scores and duration of remote work employment. The strong correlation between the RWFI and this criterion demonstrates the criterion validity of the RWFI.

By assessing these different types of validity, the researcher can conclude with some confidence that the RWFI is indeed measuring remote work flexibility and therefore has construct validity.

Threats to construct validity

Many issues can prevent a test from measuring what it’s supposed to. Common threats to construct validity include the following:

  • Poor operationalization
  • Subject bias
  • Experimenter expectancies

Poor operationalization

Poor operationalization occurs when you have not clearly or properly defined how you will measure your construct.

The operationalization of a construct should be clear and specific. If other people administer your measure in different situations, it should have consistent results.

A poorly designed measure may not capture all aspects of the construct of interest, or it may measure a different construct altogether.

Subject bias

The behaviors and responses of participants may change when they know they are being observed. For example, when asked about their drinking habits, a participant may give a response they believe is more socially acceptable. They might also have expectations about the study that bias their responses.

To reduce subject bias, researchers often hide the true purpose of a study from participants. This process, called masking or blinding, may lead to more accurate measurements. Ensuring participant anonymity can also help reduce the likelihood of biased responses.

Crucially, any degree of deceit should not harm study participants. Researchers must always get approval from an ethics board and obtain informed consent from participants before they choose to participate.

Experimenter expectancies

If someone has designed a study, they will have formed a hypothesis related to its outcome. This person may inadvertently bias the measurement process to get the results they expect.

To avoid this issue, people who don’t know the hypothesis can collect data. This is the approach taken in double-blind studies common in medical research.

Construct vs content validity

Both construct and content validity assess the validity of a construct.

However, construct validity concerns whether a test measures the thing it’s supposed to, whereas content validity concerns whether a test measures all important aspects of a construct.

Content validity has a narrower scope than construct validity. In combination with other types of validity, content validity can provide evidence for construct validity.

Frequently asked questions about construct validity

What is the difference between construct and criterion validity?

Construct validity evaluates how well a test reflects the concept it’s designed to measure.

Criterion validity captures how well a test correlates with another “gold standard” measure or outcome of the same construct.

Although both construct validity and criterion validity reflect the validity of a measure, they are not the same. Construct validity is generally considered the overarching concern of measurement validity; criterion validity can therefore be considered a form of evidence for construct validity.

How do you measure construct validity?

Construct validity assesses how well a test reflects the phenomenon it’s supposed to measure. Construct validity cannot be directly measured; instead, you must gather evidence in favor of it.

This evidence comes in the form of other types of validity, including face validity, content validity, criterion validity, convergent validity, and divergent validity. The stronger the evidence across these measures, the more confident you can be that you are measuring what you intended to.

What is the difference between construct validity and predictive validity?

Construct validity assesses how well a test measures the concept it was meant to measure, whereas predictive validity evaluates to what degree a test can predict a future outcome or behavior.

What is the difference between construct validity and internal validity?

Construct validity refers to the extent to which a study measures the underlying concept or construct that it is supposed to measure.

Internal validity refers to the extent to which observed changes in the dependent variable are caused by the manipulation of the independent variable rather than other factors, such as extraneous variables or research biases.

Construct validity vs. internal validity example
You’re studying the effect of exercise on happiness levels.

  • Construct validity would ask whether your measures of exercise and happiness levels accurately reflect the underlying concepts of physical activity and emotional state.
  • Internal validity would ask whether your study’s results are due to the exercise itself, or if some other factor (e.g., changes in diet or stress levels) might be causing changes in happiness levels.

 

What is the difference between construct validity and face validity?

Face validity refers to the extent to which a research instrument appears to measure what it’s supposed to measure. For example, a questionnaire created to measure customer loyalty has high face validity if the questions are strongly and clearly related to customer loyalty.

Construct validity refers to the extent to which a tool or instrument actually measures a construct, rather than just its surface-level appearance.

Is this article helpful?
Emily Heffernan, PhD

Emily has a bachelor's degree in electrical engineering, a master's degree in psychology, and a PhD in computational neuroscience. Her areas of expertise include data analysis and research methods.