What Is Face Validity? | Definition & Example

Face validity is a type of validity that refers to the extent to which a research instrument, such as a survey, questionnaire, or test, appears to measure what it is supposed to measure.

In other words, face validity is concerned with whether the instrument looks like it is measuring what it claims to measure.

Face validity example
You’re interested in measuring participants’ weight in a medical trial.

You have thought of two methods of recording weight:

  • Participants stand on a scale, and you write down the number.
  • Participants self-report what they eat, and you estimate their weight.

Both methods have a very different level of face validity:

  • The first technique has high face validity because a scale is an appropriate instrument to measure weight.
  • The second technique has low face validity because you can’t deduce weight from information on what someone eats.

Face validity is typically evaluated by experts in the field, such as researchers or academics who have knowledge about the topic being studied. They review the instrument and assess whether it appears to measure the concepts or variables it claims to measure.

What is face validity?

Face validity is the degree to which a research method, instrument, or procedure appears to measure what it claims to measure, based on its surface-level characteristics. It’s a subjective judgment that assesses whether a measure or procedure “looks like” it’s doing what it’s supposed to do.

Note: Types of measurement validity
There are four types of measurement validity that are often confused. The other three are:

Face validity is important because it’s a quick first step to measuring the overall validity of an instrument. It can also affect the credibility and trustworthiness of the research findings. However, it’s considered a weak form of validity because its assessment is rather subjective and not supported by statistical testing.

High face validity

If an instrument has high face validity, it is more likely to be perceived as valid and reliable by participants, which can increase response rates and data quality.

There’s an increased potential for impact because other researchers or readers are more likely to trust the results and change practices, policies, or theories in light of those results.

Finally, high face validity can reduce the need for pilot testing or revisions, which might lower the total cost of your research.

Low face validity

If an instrument has low face validity, participants may be less likely to take the instrument seriously or respond accurately, which can lead to biased or inaccurate results. Some participants might refuse to participate altogether, which could lower the response rates.

Readers of your study might also not understand what you’re measuring and why you’re using that particular instrument, which might cause them to doubt the results. This could be harmful to your reputation.

A low face validity might increase the total cost of the study if additional resources are required to revise the instrument.

Face validity example

Low face validity example
A researcher is trying to measure the level of job satisfaction among employees in a large corporation.

Instead of using a standardized survey or questionnaire, the researcher decides to ask participants to write a short story about their favorite work experience. The researcher then asks a panel of judges to evaluate the stories based on how well they convey a sense of joy and fulfillment.

This method has low face validity because the task is not related to the research question. Writing a short story is not a direct measure of job satisfaction. The evaluation might also be perceived as subjective, which can harm the credibility of the results.

High face validity example
A researcher is trying to measure the level of job satisfaction among employees in a large EdTech corporation.

The researcher decides to use a standardized survey for job satisfaction in the EdTech sector. The survey consists of Likert-scale type questions about employees’ opinions of several aspects of their job, such as compensation, recognition, and growth.

This method has high face validity because the questions are related to the topic at hand, and the standardized survey was designed for this topic and industry.

At a later stage, the researcher evaluated other types of internal validity and external validity, since face validity is only considered superficial.

How to determine face validity

The best way to assess face validity is by asking other people (e.g., test participants or other researchers) to evaluate your instrument or method.

You can ask them questions such as:

  • Does the test seem appropriate for measuring this variable or topic?
  • Are all aspects of the method relevant to the thing that’s being measured?
  • Does the test seem useful for measuring this specific construct?

You can ask a test panel informally or conduct a short pilot version of your research to assess its face validity.

Who should determine face validity?

There are three strategies for determining face validity:

  1. You can ask experts to evaluate your instrument or test
  2. You can ask test participants to evaluate your instrument or test
  3. You can ask both to evaluate your instrument or test

If possible, you should ask both experts and test participants to evaluate your instrument or test. This way, you get the perspective of researchers, who typically have more background knowledge about the topic and valid research methods, as well as the perspective of test participants, who resemble the participants of your study and need to understand your choice of method.

Face validity example
You send out a survey on the impact of cultural background on food preferences to two groups: other researchers and test participants.

The test participants believe that the survey has high face validity because it directly measures the variables being studied (food and cultural background). The questions are clear and straightforward, and participants understand what they’re being asked.

Other researchers in the field, however, disagree with their assessment. They argue the survey has low face validity because the questions about cultural background are too broad and superficial and may not capture the nuances of different cultural experiences.

When to determine face validity

Ideally, you assess the face validity of a new test or instrument in an early stage of your research process. This way, you can still adapt your research method. You should also assess the face validity when you’re using an existing test or instrument for new populations or in new circumstances.

Face validity for a new test

Face validity example: Designing a new test
You develop a new test to assess creativity in children. The test is designed to investigate a child’s ability to generate innovative ideas and solutions. It consists of a series of puzzles and challenges that require children to think outside the box and come up with novel solutions.

The face validity for this test is unknown as it is a new test. You recruit a small group of children to complete your new test, and then you ask a separate group of child development experts to review the responses and provide feedback on whether they think the test is measuring creativity effectively.

Face validity for a new population

Face validity example: New population
You want to assess creativity in children. There’s a standardized test available, but it’s aimed at young adults.

The face validity for this population (children) is unknown. You gather a small group of children to participate in your test. They find the language confusing and don’t understand most of the questions. According to this group, the test has low face validity.

You revise the language and scenarios used in the test and ask a new group of test participants to try your new version. They find the test easy to understand and are able to complete all the questions. The test now has high face validity.

Face validity for new circumstances

Face validity example: New circumstances
You use an existing test to assess creativity in children. However, the original test has 50 scenarios, and children come back 5 weeks in a row to complete 10 scenarios a week.

You want to collect data in 1 week and decide to transform the original questionnaire into a short-form version with only 10 scenarios.

You ask test participants and other experts in the field to evaluate the face validity of your shortened version of the test. They indicate that the test is clear, accurate, and relevant, demonstrating high face validity.

Face validity vs construct validity

Face validity and construct validity are types of validity, but they refer to different aspects of a research method or instrument.

  • Face validity, also known as surface validity, refers to the extent to which a research instrument appears to measure what it’s supposed to measure. For example, a survey designed to measure customer satisfaction might have high face validity if it includes questions that are clearly related to job satisfaction.
  • Construct validity refers to the extent to which a research instrument measures a theoretical concept or construct, rather than just its surface-level appearance. In other words, construct validity is concerned with whether the instrument is actually measuring the underlying construct. For example, a survey to measure happiness has high construct validity if its outcomes actually correlate with the construct of happiness.
Example: Face validity vs. construct validity
You’re investigating the impact of a new exercise program on physical fitness. You create a survey that asks participants about their exercise habits, including how many days a week they exercise, what types of exercises they do, and how long they’ve been exercising.

The survey appears to measure the impact of the exercise program on physical fitness, and it’s easy to understand what respondents are being asked to do.

However, when you analyze the data, you notice that the outcomes are actually more strongly correlated with participants’ self-reported motivation to exercise than with their actual physical fitness levels. Specifically, participants who report higher motivation to exercise tend to report higher levels of physical fitness, regardless of whether they participated in the exercise program or not.

In this case, the survey has high face validity because it appears to be measuring the impact of the exercise program on physical fitness. However, it has low construct validity because it’s not actually measuring the underlying construct of physical fitness. Instead, it’s capturing a proxy variable (motivation) that’s not directly related to physical fitness.

Frequently asked questions about face validity

What is the difference between content validity and face validity?

Content validity and face validity are both types of measurement validity.

  • Content validity refers to the degree to which the items or questions on a measure accurately reflect all elements of the construct or concept that’s being measured. It assesses whether the items are accurate, relevant, and comprehensive in measuring the construct.
  • Face validity refers to the degree to which a measure seems to be measuring what it claims to measure. It assesses whether the measure appears to be relevant.
What is the difference between construct validity and face validity?

Face validity refers to the extent to which a research instrument appears to measure what it’s supposed to measure. For example, a questionnaire created to measure customer loyalty has high face validity if the questions are strongly and clearly related to customer loyalty.

Construct validity refers to the extent to which a tool or instrument actually measures a construct, rather than just its surface-level appearance.

What is the best way for a researcher to judge the face validity of items on a measure?

The best way for a researcher to judge the face validity of items on a measure is by asking both other experts and test participants to evaluate the instrument.

The combination of experts with background knowledge and research experience, along with test participants who form the target audience of the instrument, provides a good idea of the instrument’s face validity.

In which ways are content and face validity similar?

Content validity and face validity are both types of measurement validity. Both aim to ensure that the instrument is measuring what it’s supposed to measure.

However, content validity focuses on how well the instrument covers the entire construct, whereas face validity focuses on the overall superficial appearance of the instrument.

Is this article helpful?
Julia Merkus, MA

Julia has a bachelor in Dutch language and culture and two masters in Linguistics and Language and speech pathology. After a few years as an editor, researcher, and teacher, she now writes articles about her specialist topics: grammar, linguistics, methodology, and statistics.