What Is Internal Validity? | Definition, Example & Threats

Internal validity refers to the extent to which a research study’s design and methods minimize the likelihood of alternative explanations for the observed effect between variables.

In other words, internal validity addresses the question: “Is the observed effect or relationship likely due to the independent variable (the variable being manipulated) and not due to other factors?”

A high level of internal validity means that the study’s conclusions are likely to be reliable. It’s one of the most important types of validity in research.

Internal validity example
A fitness instructor wants to investigate whether a new exercise program improves cognitive function in adults. The study consists of 20 participants, who are randomly assigned to either a treatment group or a control group.

  • The treatment group participates in the new exercise program for 3 months
  • The control group does not participate in any exercise.

The participants’ cognitive function is assessed using a standardized test at the beginning and end of the 3-month period.

However, during the 3-month period, the participants are also encouraged to eat a healthy diet as part of their overall health and wellness. The researcher doesn’t control for this extraneous variable, so it’s possible that any changes in cognitive function are due to the diet rather than the exercise program. The study has low internal validity

What is internal validity?

Internal validity refers to the extent to which a study’s design and methodology ensure that any observed relationships or effects between variables are due to the independent variable being manipulated and not to other factors.

A study with high internal validity is considered to be strong because it’s able to isolate the effect of the independent variable and rule out alternative explanations. It increases the chance that the study is free from flaws and biases that could affect the results.

If your study has low internal validity, you can’t claim a cause-and-effect relationship between your variables because you can’t rule out other explanations.

Internal validity example
You’re investigating the effect of a new treatment for headaches. You use simple random sampling to select your sample from the population. Then you randomly assign your participants to a treatment group and a control group:

  • The treatment group receives the new medication for headaches.
  • The control group receives no medication.

According to the survey with Likert-scale questions, the treatment group experiences significantly fewer and less intense headaches than the control group. You want to conclude that the treatment works, but a fellow researcher comments on your results, stating that they might be the result of a placebo effect.

It’s possible that the improvement wasn’t caused by the treatment but by the participants’ belief that the new treatment would work.

You decide to change your experimental design so it includes a control group that receives a placebo. This way, you can determine whether the positive effects are the result of the new treatment or a placebo effect.

Internal vs external validity

The degree of both internal and external validity influence the interpretation of study findings:

  • Internal validity refers to the extent to which a study’s design and methods allow researchers to infer causality between variables.
  • External validity, on the other hand, refers to the extent to which the results of a study can be generalized to other populations, settings, and situations beyond the specific study.
Internal validity External validity
Goal Establish causality between variables Generalize results to other contexts
Main threat Extraneous variables Sampling bias
Measures to increase validity Randomization, control groups, and blinding Representative probability sampling, larger sample size, and naturalistic settings

Trade-off between internal and external validity

There’s always a trade-off between internal and external validity. The more you control for extraneous variables (internal validity), the less you can generalize your results to different populations or contexts (external validity).

Internal vs external validity (trade-off)
You want to investigate the effect of a new language learning app on improving vocabulary retention in children.

You could conduct a laboratory study with a small sample size (e.g., 20 children) and a controlled environment. For example, you could show the children a series of words on a screen and then immediately test their ability to recall those words.

This would allow you to carefully control the variables and minimize confounding variables, such as distractions or interruptions, which contributes to high internal validity. However, this approach might not be representative of real-world learning situations, which causes low external validity.

You could also conduct a field study with a large sample size (e.g., 100 children) and collect data in real-world settings, such as schools or homes.

This would allow you to collect data in a more naturalistic setting and potentially recruit participants from a broader range of demographics, which ensures high ecological validity (a subtype of external validity). However, this approach might introduce confounding variables and research biases, as you can’t control for extraneous variables (e.g., distractions). This might harm its internal validity.

You have to find a balance to maintain acceptable levels of internal and external validity.

Internal validity example

There are several factors that can harm or strengthen a study’s internal validity.

Low internal validity example

Low internal validity example
You want to investigate the effect of playing video games on children’s attention span. You give 20 volunteers from your local primary school a new video game to play for 30 minutes and then ask them to complete a simple puzzle task. You measure the time it takes for the children to complete the puzzle task.

This research design has several threats to internal validity:

  • There’s no control group to compare the results to, so you can’t attribute any observed effects to the video game.
  • The measure of attention span (how long it takes to complete the puzzle) is subjective. It’s no direct measure of concentration.
  • The sampling method is a type of non-probability sampling, which means the sample may not be representative of the broader population. The sample size is also rather small.
  • The design doesn’t take into account any other factors that might affect attention span (e.g., amount of sleep the children got last night, their diet, or their previous experience with puzzles).

High internal validity example

High internal validity example
You want to study the effect of a new medication on blood pressure in patients with hypertension.

You conduct a randomized controlled trial with 100 patients who are recruited with cluster sampling and randomly assigned to a treatment group or control group (between-subjects factor). The control group receives a placebo that looks and tastes the same as the new medication but does not contain any active ingredients. Your mixed-subjects design has a low drop-out rate

It’s a double-blind study, so you and patients are blinded to the treatment assignment until after data collection. You collect data using a standardized protocol and instrument to measure blood pressure at multiple time points (within-subjects factor).

You control for potential confounding variables, such as age, gender identity, and body mass index, by using appropriate statistical tests to adjust for these variables.

There are several factors that contribute to this study’s high internal validity:

  • The design involves randomization, which ensures that both groups have similar characteristics at the start of the study. There was also a low drop-out rate, which minimizes attrition bias and the risk of missing data.
  • You’ve used blinding, which minimizes observer bias and reduces the risk of measurement error.
  • The standardized protocol for measuring blood pressure ensures consistent and reliable data.
  • You’ve controlled for confounding variables, which helps ensure that any observed effects can be attributed to the medication instead of other factors.

Internal validity threats and solutions

It’s essential to recognize and solve threats to internal validity in any research design. There are different threats for single-group and multi-group studies.

Internal validity threats for single-group studies

A single-group study involves a single group of participants who are assessed at one point in time and then again at a later point in time (pre-post design). The goal is to examine the change within the same group over time.

Single-group study example: Internal validity
You want to evaluate the effectiveness of a new mindfulness-based program on reducing symptoms of anxiety in a group of college students.

You assess the students’ anxiety symptoms before the program (pre-test) and at the end of the 8-week program (post-test).

Threats to internal validity for single-group studies
Internal validity threat Explanation Example
Maturation The research findings vary as a natural result of time. Most students just started college at the time of the pre-test. At the time of the post-test, they’re much more familiar with college life and feel less anxious.
Instrumentation Outcomes differ because of the use of different measures in pre- and post-test. In the pre-test, the anxiety was measured for 45 minutes. In the post-test, the anxiety was measured for only 15 minutes. This may have affected the outcomes.
Testing Results of the post-test are influenced by the pre-test. Students show decreased anxiety at the end of the research because the same test was administered. Students achieved different results because they became aware of the study’s aim.
History Outcomes are affected by an unrelated event. Two weeks before the end of the research, the students are told the rules for graduation have become more strict. The students’ anxiety has increased, which might influence the outcomes during the post-test evaluation.

Solving threats in single-group studies

Making changes to the research design can counter threats to internal validity.

  • Use of a control group. Single-group studies are often used in a quasi-experimental design when it’s not feasible or ethical to conduct randomized controlled trials. Although you might not be able to randomly assign participants to a control group, you can use existing groups as controls. For example, you can compare the outcomes of participants who have received an intervention or treatment with those of people who haven’t.
  • Use of filler-tasks. You can use filler-tasks (i.e., random other questions that have nothing to do with the purpose of your study) to obscure the goal of your research. This approach counters demand characteristics, which inadvertently prompt participants to respond in a certain way, and testing threats.
  • Use a larger sample size. A larger sample provides more precise estimates of the effect, even if assumptions about the data are not fully met (e.g., homoscedasticity, which is the assumption of similar variances in different groups, and normal distribution). This can help reduce the impact of random error and measurement error. It also makes it less likely that outliers will have a disproportionate impact on the results, which is particularly important in single-group studies with no control group.

Internal validity threats for multi-group studies

A multi-group study involves multiple groups or populations that are compared to each other. This design allows researchers to examine differences between groups, identify potential sources of variation, and control for extraneous factors.

Multi-group study example: Internal validity
You want to evaluate the effectiveness of three different methods for teaching math skills to elementary school students. The students are placed in a group based on their initial baseline score (Group A, Group B, and Group C). One group serves as a control group, receiving no intervention.

Each group is assessed at the beginning and end of the 12-week study period through a math skill test (ratio data).

Threats to internal validity for multi-group studies
Internal validity threat Explanation Example
Regression to the mean The statistical tendency for participants with extremely low or high scores on a test to score closer to the mean the next time. The students were assigned to a group based on their pre-test score. It’s difficult to attribute any change in outcomes to the teaching method instead of statistical norms.
Social desirability and social interaction Participants from different groups can interact with each other or researchers and either figure out the goal of the study, feel resentful of or better than others, or feel pressured to perform a certain way. Group A gets to play a fun math game, whereas the other groups don’t. Students from Group B or Group C might resent the students from Group A for getting to play a game. This might be demotivating, which in turn could lead to poor performance.
Attrition bias Bias caused by dropout of participants Many of the Group A students provided unusable data. This makes it hard to compare the results of the two treatment groups to the control group.
Selection bias Groups differ at the beginning of the study Students with high math skills were placed in Group A, whereas students with low math scores were placed in Group B. This caused the groups to be systematically different from the start. Because of these differences, any observed change in test scores might be due to other reasons than the manipulation of the independent variable (teaching method).

Solving threats in multi-group studies

Making changes to the research design can counter threats to internal validity.

  • Blinding. A single-blind, double-blind, or even triple-blind research design can help you counter the effects of social interaction and desirability. If participants, researchers, and data collectors are unaware of the treatment a participant has received, they’re less likely to influence the behavior and outcomes.
  • Random assignment (randomization). By randomly assigning participants to groups, you counter the effects of selection bias and regression to the mean because you’re making sure groups are comparable at the beginning of the study.

How to assess internal validity

There are three conditions that need to be met for internal validity for you to establish a cause-and-effect relationship between an independent variable (the one you manipulate) and the dependent variable (the one you measure).

  1. Your manipulation of the independent variable (treatment variable) precedes any changes in your dependent variable (outcome variable or response variable).
  2. Your independent variable and dependent variable change together.
  3. There are no extraneous variables or confounding variables that can explain the research findings.
Assessing internal validity example
You’re testing the following hypothesis:

  • Drinking a glass of water before bed increases the number of hours of sleep.

You randomly assign participants to a treatment group (drinking a glass of water) or control group (drinking a different beverage). In the morning, participants self-report on the number of hours of sleep they got.

You analyze the results and notice that the treatment group indeed gets more sleep than the control group.

In the example above, two out of three conditions have been met:

  • Drinking water before bed and the number of hours of sleep increased together.
  • Drinking water preceded sleeping.
  • The possible caffeine in the different beverages for the control group is an extraneous factor that can explain the results of the study equally well. Participants in the control group might have drunk coffee or soda with caffeine, which caused them to sleep for a shorter amount of time.

This means your study suffers from low internal validity, and you can’t establish causality between drinking a glass of water and improved sleep.

Frequently asked questions about internal validity

What is the difference between construct validity and internal validity?

Construct validity refers to the extent to which a study measures the underlying concept or construct that it is supposed to measure.

Internal validity refers to the extent to which observed changes in the dependent variable are caused by the manipulation of the independent variable rather than other factors, such as extraneous variables or research biases.

Construct validity vs. internal validity example
You’re studying the effect of exercise on happiness levels.

  • Construct validity would ask whether your measures of exercise and happiness levels accurately reflect the underlying concepts of physical activity and emotional state.
  • Internal validity would ask whether your study’s results are due to the exercise itself, or if some other factor (e.g., changes in diet or stress levels) might be causing changes in happiness levels.

 

What are the 12 threats to internal validity?

The 12 main threats to internal validity are:

  1. History: Changes in the environment or events that occur outside of the study can affect the outcome.
  2. Maturation: Changes in the participants over time (e.g., age, skill level) can affect the outcome.
  3. Testing: The act of testing or measurement itself can affect the outcome (testing effect, practice effect, or carryover effect).
  4. Instrumentation: Changes in the measuring instrument or tool used to collect data can affect the outcome.
  5. Statistical regression to the mean: The tendency of extreme scores to regress towards the mean, which can lead to a loss of statistical significance.
  6. Selection: The selection of participants for the study can affect the outcome (selection bias), especially in the case of non-probability sampling.
  7. Experimental mortality or attrition bias: The loss of participants or dropouts during the study can affect the outcome.
  8. Multiple-treatment interference: The interaction between different treatments or conditions can affect the outcome.
  9. Social desirability bias: The participants’ awareness of being in a study and their desire to be well-liked by researchers can affect the outcome.
  10. Social interaction: The participants’ awareness of being treated differently than people in other groups can affect the outcome.
  11. Residual confounding: The presence of unmeasured or uncontrolled extraneous or confounding variables that affect the outcome and are not accounted for in the analysis.
  12. Order effect: The order of the independent variable levels affects the dependent variable.

There are several ways to counter these threats to internal validity, for example, through randomization, the addition of control groups, and blinding.

Is this article helpful?
Julia Merkus, MA

Julia has a bachelor in Dutch language and culture and two masters in Linguistics and Language and speech pathology. After a few years as an editor, researcher, and teacher, she now writes articles about her specialist topics: grammar, linguistics, methodology, and statistics.