What Is Stratified Sampling? | Examples & Definition

Stratified sampling is a probability sampling method where researchers divide a population into homogeneous subpopulations (strata) based on specific characteristics, such as gender, age, or socioeconomic status. Every member of the population should be in precisely one stratum.

Another sample is then drawn for each stratum using a different random sampling method (e.g., cluster sampling). This way, researchers can estimate statistics (e.g., averages) for each subpopulation.

Stratified sampling is used when the characteristics of a population vary and researchers need to make sure that the sample is representative of the entire population. This sampling method ensures high external validity and generalizability and minimizes the risk of some research biases.

Stratified sampling example 
A university wants to survey students about their satisfaction with campus facilities. The student population is diverse, including undergraduates, graduates, and doctoral students from various departments.

To ensure all groups are represented, the university decides to use stratified sampling based on academic level and department. They use a disproportionate sample to ensure the sample size of each subgroup is large enough to draw statistical conclusions.

Stratified sampling

When do you use stratified sampling?

If you want to use stratified sampling, you need to be able to assign each member of the population to exactly one stratum (subgroup). Your groups need to be mutually exclusive (no one fits into more than one subgroup) and exhaustive (every participant fits into one subgroup).

A stratified sample is the best choice if you think subgroups will have different mean values for the variables you’re interested in.

Stratified vs cluster sampling

Stratified sampling and cluster sampling show overlap (both have subgroups), but there are also some major differences.

  • Stratified sampling is a sampling technique in which a population is split into strata (subgroups) based on a specific characteristic. Next, you choose members at random from every stratum for data collection. Units (e.g., people) in each stratum are similar to one another with respect to the variable of interest (e.g., they are of the same age). A stratum does not resemble a miniature version of the population.
Stratified sampling example
You’re interested in customer engagement for a phone store. You want to distinguish between people of different genders.

You split the population into groups based on their gender. Then, you draw a random sample from each gender group for data collection.

  • Cluster sampling is a sampling technique in which the population can be naturally divided into clusters (e.g., because of geographical differences between groups). Data can be collected from the cluster as a whole without selecting participants based on a specific criterion. Each cluster is a miniature version of the population as a whole.
Cluster sampling example
You’re investigating how happy people in your town are with gyms.

You divide the population (everyone who lives in the town) into clusters based on the neighborhood they live in. Then, you use another random sampling method to collect data.

Step-by-step guide to stratified sampling

Stratified sampling consists of four steps:

  1. Define the population and subgroups
  2. Divide the population into strata
  3. Determine the sample size for each stratum
  4. Draw a random sample from each stratum

Step 1: Define the population and subgroups

As with other random sampling methods, start by defining the population you are interested in.

You also need to decide on the characteristic you’re interested in because you’ll divide the population based on this trait. Assigning each participant to a subgroup should be a clear and unambiguous process because each member can be placed only in exactly one stratum.

If you’re interested in multiple traits, you can stratify based on multiple characteristics, provided you can still unambiguously assign each participant to one subgroup. To calculate the total number of subgroups, multiply the number of levels for each trait.

Suppose you stratify based on age and gender, with three groups for the former and four for the latter. Then you would have a total of 3 x 4 = 12 subgroups.

Stratified sampling example: Determining the population
Your population consists of all entrepreneurs who started their businesses in the last 10 years. You stratify by age and gender identity.

Step 2: Divide the population into strata

You create a list of each member of the population and assign each entrepreneur to a stratum based on their age and gender identity.

You need to make sure that every member fits into exactly one stratum. This means the strata should cover the entire population without showing any overlap (i.e., they should be mutually exclusive).

Stratified sampling example: Dividing the population 
You create a list of each entrepreneur’s name, age, and gender identity. You stratify based on two characteristics: age, with three levels (younger than 30, between 30 and 50, and older than 50), and gender identity, with three levels (female, male, other).

If you multiply the number of levels from characteristic 1 (age) with characteristic 2 (gender identity), you have a total of nine groups. Each entrepreneur should be assignable to exactly one group.

Trait Strata Groups
Age
  • Younger than 30
  • Between 30 and 50
  • Older than 50
  1. Female entrepreneurs (<30)
  2. Male entrepreneurs (<30)
  3. Other entrepreneurs (<30)
  4. Female entrepreneurs (30–50)
  5. Male entrepreneurs (30–50)
  6. Other entrepreneurs (30–50)
  7. Female entrepreneurs (>50)
  8. Male entrepreneurs (>50)
  9. Other entrepreneurs (>50)
Gender identity
  • Female
  • Male
  • Other

Step 3: Determine the sample size for all strata

You first need to decide whether your sample should be proportionate or disproportionate.

  • A proportional sample means that the sample size of each stratum is the same as the share of the subgroup in the entire population. Subgroups that are less strongly represented in the larger population will therefore also be less strongly represented in the sample. For example, women generally make up a lower portion of the IT student population, so you also include fewer women in your sample.
  • A disproportionate sample means the sample size of each stratum is not proportional to the share of the subgroups in the entire population. For example, men generally make up a lower portion of the nursing student population, but you still include an equal number of men, women, and people with another gender identity in your sample.

Researchers tend to use disproportionate samples when they want to draw statistical conclusions about an underrepresented or marginalized subgroup whose sample size would be too low if they used the actual proportions.

After deciding on the proportions, you can use a free online sample size calculator to determine the total sample size based on your estimated population size, chosen confidence level and margin of error, and estimated standard deviation. If your sample is too small, you can’t draw statistical conclusions.

Stratified sampling example: Sample size
You need to make sure your sample of entrepreneurs over 50 years old is large enough to draw statistical conclusions, so you draw a disproportionate sample.

Entrepreneurs over 50 are less strongly represented in the total entrepreneur population, but the sample is still made up of roughly ⅓ entrepreneurs younger than 30, ⅓ entrepreneurs between 30 and 50 years old, and ⅓ entrepreneurs over 50.

Step 4: Draw a random sample for each stratum

For the final step, you use a different random sampling method to draw a sample for each stratum. Popular probability sampling methods for this step are simple random sampling or systematic sampling.

If you do this correctly, the random nature of these methods will ensure you draw a representative sample for all strata.

Stratified sampling example: Drawing a random sample from each stratum
You select approximately the same number of participants for each stratum using simple random sampling.

Then, you collect data on the costs and profits of all participants to answer your research question about the relationship between age, gender, and the success of entrepreneurship.

Stratified sampling advantages

A stratified sample has the following advantages:

  • Guarantees a diverse sample

A stratified sample reflects the diversity of your population because it’s guaranteed to contain participants from every subgroup of the population. However, this is an advantage that comes with most random sampling methods.

  • Guarantees equal variance between groups

If you want a similar level of variance for each stratum, the sample size for each subgroup should be similar (e.g., equal numbers of women, men, and people with a different gender identity).

With other sampling methods, you may have many participants from one subgroup but almost no participants from another subgroup.

  • Lowers the variance in the total population

The total population may be relatively heterogeneous, but subgroups are likely more homogeneous in nature.

Suppose you are investigating how a new teaching method affects the reading test scores of adults learning a new language. There is a good chance that both the original test scores and the possibly changed test scores are strongly correlated with the number of hours they’ve studied.

A stratified sample would help to get a better picture of the variable you’re interested in because you can group participants by the number of hours they studied. This way, you lower the variance in each subgroup and thus in the population as a whole.

  • Allows for many data collection methods

In some cases, you’ll have to use different data collection methods for different subgroups.

For example, if you have limited time and money to conduct your survey research, then it might be convenient to survey elderly participants door-to-door but younger participants using email.

Stratified sampling disadvantages

Stratified sampling also comes with some disadvantages:

  • Risk of misclassification

Identifying appropriate strata and accurately classifying the population into these strata can be complex and time-consuming. If the strata are not well defined or are incorrectly identified, the sample may not accurately represent the population, leading to biased results.

Once strata are defined, it may be difficult to adapt the sampling method if new information suggests a different stratification would be more appropriate.

  • Complex and time-consuming

Determining the right proportion of samples for each stratum and ensuring each is adequately represented can be challenging, particularly in large or heterogeneous populations.

In small populations, the benefits of stratified sampling may not outweigh the additional complexity, and other sampling methods might be more efficient.

  • Need for detailed and up-to-date information

Stratified sampling requires detailed information about the population to create appropriate strata, which might not always be available or reliable. In dynamic populations where characteristics change frequently, maintaining up-to-date stratification data can be difficult.

Frequently asked questions about stratified sampling

What’s the difference between stratified and systematic sampling?

Stratified sampling and systematic sampling are both probabilistic sampling methods used to obtain representative samples from a population, but they differ significantly in their approach and execution.

  • Stratified sampling involves dividing the population into distinct subgroups (strata) based on specific characteristics (e.g., age, gender, income level) and then randomly sampling from each stratum. It ensures representation of all subgroups within the population.
  • Systematic sampling involves selecting elements from an ordered population at regular intervals, starting from a randomly chosen point. For example, you have a list of students from a school and you choose students at an interval of 5. This is a useful method when the population is homogeneous or when there is no clear stratification. It’s much easier to design and less complex than stratified sampling.
What is disproportionate stratified sampling?

Disproportionate sampling in stratified sampling is a technique where the sample sizes for each stratum are not proportional to their sizes in the overall population.

Instead, the sample size for each stratum is determined based on specific research needs, such as ensuring sufficient representation of small subgroups to draw statistical conclusions.

For example, the population you’re interested in consists of approximately 60% women, 30% men, and 10% people with a different gender identity. With disproportionate sampling, your sample would have 33% women, 33% men, and 33% people with a different gender identity. The sample’s distribution does not match the population’s.

What is proportionate stratified sampling?

Proportionate sampling in stratified sampling is a technique where the sample size from each stratum is proportional to the size of that stratum in the overall population.

This ensures that each stratum is represented in the sample in the same proportion as it is in the population, representing the population’s overall structure and diversity in the sample.

For example, the population you’re investigating consists of approximately 60% women, 30% men, and 10% people with a different gender identity. With proportionate sampling, your sample would have a similar distribution instead of equal parts.

Is stratified sampling random?

Yes, stratified sampling is a random sampling method (also known as a probability sampling method). Within each stratum, a random sample is drawn, which ensures that each member of a stratum has an equal chance of being selected.

Is this article helpful?
Julia Merkus, MA

Julia has a bachelor in Dutch language and culture and two masters in Linguistics and Language and speech pathology. After a few years as an editor, researcher, and teacher, she now writes articles about her specialist topics: grammar, linguistics, methodology, and statistics.