What is Criterion Validity? | Definition & Examples
Criterion validity (or criterion-related validity) is an assessment of how accurately a test or instrument captures the outcome it was designed to measure. These outcomes are generally constructs that cannot be directly measured, such as intelligence or happiness. Such constructs occur frequently in psychology research.
Criterion validity is determined by comparing your test results to a “gold standard,” or criterion, that acts as a ground truth. If your test and the criterion are measuring the same construct, they should be highly correlated (i.e., have high criterion validity).
There are two types of criterion validity, which differ in their timelines of comparison. Concurrent validity compares two measures administered at the same time, whereas predictive validity captures how one measure correlates with a second measure taken in the future.
What is criterion validity?
Criterion validity is used to measure how well a test corresponds to a well-established measure called a criterion. If the test and criterion measure the same phenomenon, their results should be closely correlated.
In psychology and other social sciences, many phenomena of interest are constructs that cannot be directly measured. Researchers must instead create tests or measures that capture behaviors or performance related to these constructs.
To validate how well such a test captures the construct it’s supposed to measure, it should be compared to a “ground truth” measurement. But how do you determine the ground truth of something that can’t be directly measured?
Criterion validity offers a way around this issue. A new test designed to measure a construct can be validated through comparison to an existing “gold standard,” or criterion, that acts as the ground truth. Criterion variables could include any of the following:
- A clinical assessment
- A well-established questionnaire
- A measure of some related behavior
Most importantly, the criterion should measure the same or a similar construct as the test being validated.
If a clear criterion or gold-standard exists, assessing criterion validity is relatively straightforward: you simply obtain your test and criterion measures and compute the correlation between them. However, if no gold standard exists, criterion validity cannot be established.
Criterion validity is also only as good as the criterion used as a benchmark. If there are biases or errors in the criterion, high criterion validity merely indicates that the measure being validated has the same problems.
Types of criterion validity
The two types of criterion validity are concurrent validity and predictive validity. Both involve comparing the results from one test to a second target outcome or criterion. However, these types of validity differ in when the test and criterion measurements are taken:
- To assess concurrent validity, the measure of interest should be compared to a widely accepted measure obtained at the same time.
- In contrast, predictive validity is determined by comparing a measure to an outcome that occurs later in time.
Though both concurrent and predictive validity are measures of criterion validity, they provide slightly different information about a measure.
Concurrent validity is most helpful when you are evaluating a new instrument or test that offers improvements (in cost, ease of use, etc.) over an existing approach. By obtaining both measurements at the same time and comparing them, you can determine if your new measure captures the same construct as the existing criterion.
Predictive validity is useful in determining whether an instrument is correlated with some future criterion. This can be helpful if you would like to know if your instrument can be used to predict an outcome that occurs later in time.
Predictive validity may be considered more informative than concurrent validity, as it can help determine if a test provides information about a future, real-world outcome. However, predictive validity takes much longer to assess than concurrent validity, as there is a necessary delay between administering the test and criterion.
Criterion validity example
Consider the following example of why and how a psychology researcher might measure criterion validity.
How to measure criterion validity
Criterion validity is determined by comparing a test with an outcome (the criterion) that measures the same or a similar construct. This outcome may be measured at the same time as the test to be validated (in the case of concurrent validity) or at a later point in time (to determine predictive validity).
The test and outcome are compared by determining the correlation between them. The strength of correlation between two variables is determined by computing the correlation coefficient, such as Pearson’s r.
If both the test and criterion provided similar measurements, they will be highly correlated. You could therefore conclude that your test had criterion validity.
Construct vs criterion validity
Construct validity captures how well a test or instrument measures the phenomenon it’s supposed to measure. Criterion validity assesses how well a test measures a construct by comparing it to some benchmark measured at the same time (concurrent validity) or in the future (predictive validity).
Construct validity is generally considered the primary focus when validating a new test. Other measures of validity—criterion validity, content validity, and face validity—provide evidence of construct validity.
Frequently asked questions about criterion validity
- What are the two types of criterion validity?
-
Criterion validity measures how well a test corresponds to another measure, or criterion. The two types of criterion validity are concurrent and predictive validity.
- Concurrent validity compares two measures obtained at the same time.
- Predictive validity indicates how well a test correlates with a measurement taken later on.
- What is a construct?
-
A construct is a phenomenon that cannot be directly measured, such as intelligence, anxiety, or happiness. Researchers must instead approximate constructs using related, measurable variables.
The process of defining how a construct will be measured is called operationalization. Constructs are common in psychology and other social sciences.
To evaluate how well a construct measures what it’s supposed to, researchers determine construct validity. Face validity, content validity, criterion validity, convergent validity, and discriminant validity all provide evidence of construct validity.
- What is the difference between content and criterion validity?
-
Content validity and criterion validity are two types of validity in research:
- Content validity ensures that an instrument measures all elements of the construct it intends to measure.
- A survey to investigate depression has high content validity if its questions cover all relevant aspects of the construct “depression.”
- Criterion validity ensures that an instrument corresponds with other “gold standard” measures of the same construct.
- A shortened version of an established anxiety assessment instrument has high criterion validity if the outcomes of the new version are similar to those of the original version.
- Content validity ensures that an instrument measures all elements of the construct it intends to measure.