Common Mistakesgraduate

15 Common Mistakes When Studying Epidemiology (And How to Fix Them) | LearnByTeaching.ai

Epidemiology is the science of figuring out what makes people sick and what keeps them healthy, using observational data from populations where controlled experiments are often impossible. Rigorous methodological thinking is the core skill. Here are 15 mistakes that commonly trip up epidemiology students.

#1CriticalConceptual

Underestimating Confounding

Confounding is the central threat to valid inference in observational studies. Students can define it but consistently fail to identify and control for it when designing studies or interpreting results.

Concluding that coffee drinking causes heart disease because coffee drinkers have higher rates, without controlling for smoking — a confounder associated with both coffee consumption and heart disease.

How to fix it

For every exposure-outcome relationship, draw a DAG (directed acyclic graph) showing all plausible causal pathways. Identify variables that are associated with both the exposure and the outcome and are not on the causal pathway. These are confounders that must be controlled through design (matching, restriction) or analysis (stratification, multivariable regression).

#2CriticalConceptual

Confusing Relative Risk and Odds Ratio

Relative risk (risk ratio) and odds ratio measure association differently, apply to different study designs, and are calculated differently. Students who interchange them make fundamental analytical errors.

Reporting an odds ratio from a case-control study as if it were a relative risk, when the odds ratio only approximates the relative risk when the outcome is rare (the rare disease assumption).

How to fix it

Match the measure to the study design: relative risk is calculated from cohort studies (where you can measure incidence). Odds ratio is calculated from case-control studies (where you cannot measure incidence directly). Know the rare disease assumption and when it breaks down.

#3CriticalConceptual

Not Distinguishing Between Study Designs

Cohort, case-control, cross-sectional, and randomized controlled trials have different strengths, limitations, and appropriate measures of association. Applying the wrong interpretation to a study design produces invalid conclusions.

Interpreting a cross-sectional study showing an association between exercise and lower depression as evidence that exercise prevents depression, when the cross-sectional design cannot establish temporal sequence — perhaps depression causes decreased exercise.

How to fix it

For each study design, know: the direction of inquiry (exposure to outcome or outcome to exposure), the measure of association (RR, OR, prevalence ratio), the ability to establish temporal sequence, and the susceptibility to specific biases. Create a comparison table and reference it when reading studies.

#4MajorConceptual

Misinterpreting P-Values

Students treat p-values as the probability that the null hypothesis is true, which is incorrect. A p-value is the probability of obtaining results at least as extreme as observed, assuming the null hypothesis is true.

Saying 'the p-value of 0.03 means there is a 3% chance the null hypothesis is true,' when it actually means 'if the null hypothesis were true, there would be a 3% chance of observing an association this strong or stronger by chance.'

How to fix it

Memorize the correct definition: the p-value is the probability of the observed data (or more extreme) given that the null hypothesis is true. It is NOT the probability that the null is true. Also understand that statistical significance (p < 0.05) does not imply clinical or public health significance.

#5MajorConceptual

Ignoring Selection Bias

Selection bias occurs when the study population is not representative of the target population due to how participants were selected or retained. Students focus on confounding and forget that flawed selection can invalidate results entirely.

Conducting a study on the health effects of a workplace exposure using only current workers (the 'healthy worker effect'), missing that the most severely affected workers have already left the job, biasing results toward showing the exposure is less harmful than it actually is.

How to fix it

At the design stage, ask: who is included and excluded from this study, and could the inclusion/exclusion process be related to both exposure and outcome? Common forms include self-selection bias, loss to follow-up, and the healthy worker effect. Address it in the study design, not just the analysis.

#6MajorConceptual

Not Understanding Incidence vs Prevalence

Incidence measures new cases over time; prevalence measures existing cases at a point in time. Confusing them leads to wrong interpretations about disease burden and risk.

Claiming that a disease is becoming more common because its prevalence increased, when the increase could be due to longer survival (people living longer with the disease) rather than more new cases.

How to fix it

Prevalence = Incidence x Duration. High prevalence can mean high incidence (many new cases), long duration (people survive longer with the condition), or both. Always specify whether you are discussing incidence or prevalence, and be precise about the time frame.

#7MajorStudy Habit

Not Calculating Measures of Association by Hand

Students rely on software to compute relative risk, odds ratios, and attributable risk without understanding the calculations. This prevents them from interpreting results meaningfully or catching computational errors.

Running a logistic regression and reporting the odds ratio without being able to calculate the crude odds ratio from a 2x2 table (ad/bc) or interpret what an OR of 2.5 means in practical terms.

How to fix it

Practice calculating RR, OR, attributable risk, and NNT from 2x2 tables by hand until automatic. When you can compute these by hand, you understand what they measure. Then use software for complex analyses, but always sanity-check against your manual understanding.

#8MajorConceptual

Confusing Association with Causation

Observational studies can demonstrate association but establishing causation requires additional evidence. Students often leap from a statistically significant association to a causal claim without applying causal criteria.

Concluding that a dietary supplement prevents cancer based on a single observational study showing lower cancer rates among supplement users, without considering confounding, reverse causation, or applying Hill's criteria.

How to fix it

Use Bradford Hill's criteria for evaluating causation: strength, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy. No single study establishes causation — it requires a body of evidence evaluated against these criteria.

#9MajorConceptual

Not Understanding Information Bias

Information bias (measurement error) occurs when exposure or outcome is measured inaccurately. Differential misclassification biases results in either direction; non-differential misclassification typically biases toward the null.

Not recognizing recall bias in a case-control study of birth defects and prenatal exposures — mothers of children with birth defects may recall exposures more thoroughly than mothers of healthy children, producing a spurious association.

How to fix it

For every study, evaluate how exposure and outcome were measured. Could the measurement be inaccurate? Could the accuracy differ between groups (differential) or be equally inaccurate in all groups (non-differential)? Use validated measurement instruments and blinding to minimize information bias.

#10MinorConceptual

Ignoring Effect Modification

Effect modification (interaction) occurs when the effect of an exposure differs across levels of a third variable. Students who do not look for it miss important findings about who is most affected by an exposure.

Reporting a single overall relative risk for the association between smoking and lung cancer without checking whether the association is stronger in asbestos workers (synergistic interaction) — a finding with major implications for occupational health policy.

How to fix it

Always stratify your analysis by potential effect modifiers (age, sex, comorbidities) before reporting a single overall measure. If the stratum-specific estimates differ meaningfully, report them separately rather than providing a single misleading summary measure.

#11MinorStudy Habit

Not Learning DAGs for Causal Reasoning

Directed acyclic graphs formalize causal assumptions and identify which variables to control for. Students who skip DAGs make ad hoc adjustment decisions that may introduce bias rather than reduce it.

Adjusting for a collider variable (a common effect of both exposure and outcome) in a regression, which opens a biasing pathway and introduces an association that did not exist in the unadjusted analysis.

How to fix it

Learn DAG construction and the rules for identifying confounders, mediators, and colliders. A confounder should be adjusted for, a mediator should not be adjusted for (if you want the total effect), and a collider must never be adjusted for. DAGs prevent these errors systematically.

#12MinorStudy Habit

Not Reading Classic Epidemiological Studies

The methodology of epidemiology is best understood through its landmark studies. Students who learn methods abstractly without seeing them applied to famous investigations miss the context that makes the methods memorable.

Learning about cohort study design without reading about the Framingham Heart Study, or studying outbreak investigation without examining John Snow's cholera investigation — the studies that defined these methods.

How to fix it

Read summaries of landmark studies: John Snow (cholera), Doll and Hill (smoking and lung cancer), Framingham Heart Study (cardiovascular risk factors), and the Nurses' Health Study. For each, identify the design, measures, biases, and lasting contributions to methodology.

#13MinorConceptual

Ignoring Confidence Intervals in Favor of P-Values

Confidence intervals provide more information than p-values: they show the range of plausible effect sizes, not just whether the result is 'significant.' Students who focus only on p < 0.05 miss the magnitude and precision of the association.

Reporting that an odds ratio is 'statistically significant (p = 0.04)' without noting that the 95% confidence interval is 1.01 to 8.5 — an extremely wide interval that includes both trivially small and very large effects, indicating imprecise estimation.

How to fix it

Always report and interpret confidence intervals alongside (or instead of) p-values. A narrow CI around a clinically meaningful effect is more informative than a p-value. Ask: is the range of plausible effects all clinically important, all trivial, or mixed?

#14MinorConceptual

Not Considering External Validity

A study may be internally valid (correct within its study population) but not externally valid (generalizable to other populations). Students focus on internal validity and forget to evaluate whether findings apply beyond the study sample.

Applying the results of a clinical trial conducted exclusively in young, healthy male volunteers to elderly women with comorbidities, without questioning whether the effect would be the same in this different population.

How to fix it

After evaluating internal validity, always ask: to whom do these results apply? Consider differences in demographics, geography, healthcare systems, and time period between the study population and the population you are interested in. State the limits of generalizability explicitly.

#15MinorConceptual

Treating Screening Test Characteristics as Fixed

Sensitivity and specificity are properties of the test, but predictive values depend on prevalence. Students who do not understand this calculate predictive values incorrectly or apply them to the wrong population.

Assuming that a test with 99% sensitivity and 99% specificity has a 99% positive predictive value, when in a population with 1% disease prevalence, the PPV is only about 50% — half of positive results are false positives.

How to fix it

Always calculate predictive values using Bayes' theorem or a 2x2 table with the correct prevalence for your population. Remember: in low-prevalence populations, even highly specific tests produce many false positives relative to true positives.

Quick Self-Check

Can I calculate relative risk, odds ratio, and attributable risk from a 2x2 table by hand?
Can I draw a DAG for an exposure-outcome relationship and identify which variables are confounders, mediators, and colliders?
Can I explain why a cross-sectional study cannot establish causation?
Can I correctly define a p-value without saying 'the probability the null hypothesis is true'?
Can I calculate the positive predictive value of a screening test given its sensitivity, specificity, and the population prevalence?

Pro Tips

✓Draw a DAG before analyzing any dataset. It forces you to make your causal assumptions explicit and prevents common adjustment errors like conditioning on a collider.
✓Practice 2x2 table calculations with real-world numbers until they are automatic. The ability to compute and interpret RR, OR, and attributable risk from a table is the most fundamental quantitative skill in epidemiology.
✓Study confounding through the smoking-coffee-heart disease example until you can explain it to a non-epidemiologist. If you can teach confounding clearly, you understand it.
✓Read one published epidemiological study per week and critically evaluate its design, potential biases, and whether the conclusions are supported by the study design. This builds the critical appraisal skills that define a good epidemiologist.
✓Use the STrengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist when reading or designing observational studies — it ensures you consider all methodological elements.

15 Common Mistakes When Studying Epidemiology (And How to Fix Them) | LearnByTeaching.ai

Underestimating Confounding

Confusing Relative Risk and Odds Ratio

Not Distinguishing Between Study Designs

Misinterpreting P-Values

Ignoring Selection Bias

Not Understanding Incidence vs Prevalence

Not Calculating Measures of Association by Hand

Confusing Association with Causation

Not Understanding Information Bias

Ignoring Effect Modification

Not Learning DAGs for Causal Reasoning

Not Reading Classic Epidemiological Studies

Ignoring Confidence Intervals in Favor of P-Values

Not Considering External Validity

Treating Screening Test Characteristics as Fixed

Quick Self-Check

Pro Tips

More Epidemiology Resources

Avoid epidemiology mistakes by teaching it