Research Notes & Learning

Insights

Short-form research notes, methods explanations, and learning posts on machine learning, statistics, epidemiology, climate health, and public health.

✓ Copied to clipboard
🔍
No posts match your search.
Machine Learning 📅 10 April 2026 ⏱ 6 min
Threshold Selection in Outbreak Prediction Models
Outbreak prediction models generate probability outputs representing risk of disease occurrence. These probabilities cannot be directly used for decision making — they must be converted into binary outcomes using a threshold, which strongly affects how early we detect outbreaks and how many false alarms we generate.

What is a Threshold?

A classification threshold is the probability cutoff point above which a model predicts "outbreak" and below which it predicts "no outbreak." The default threshold of 0.5 assumes equally likely classes — a poor assumption for rare disease outbreaks where case counts are far less common than non-cases.

Why Threshold Matters More Than Model Accuracy

In outbreak surveillance, the cost of missing a true outbreak (false negative) is usually far higher than the cost of a false alarm (false positive). A model with 95% accuracy can still fail catastrophically if the threshold is wrong — triggering no alerts during the early phase of an epidemic when probabilities hover around 0.3.

Threshold StrategyEffectBest Used When
Precision-Recall optimisedBalances detection vs precisionImbalanced outbreak data
Cost-sensitiveMinimises real-world loss functionMiss cost >> false alarm cost
Seasonal adaptiveChanges threshold by seasonClimate-sensitive diseases (dengue, cholera)
Spatial adaptiveRegion-specific cutoffMulti-region surveillance systems
Percentile-basedFlags top-risk observationsEarly warning dashboards
Default 0.5Assumes balanced classesBalanced datasets only — NOT outbreaks

Practical Recommendation

For outbreak prediction: (1) always use the Precision-Recall curve rather than ROC for threshold selection; (2) compute the F-beta score with β > 1 to weight recall higher than precision; (3) validate the selected threshold on held-out surveillance data from at least two epidemic seasons before deployment. The threshold is not a model parameter — it is a policy decision that should involve epidemiologists alongside data scientists.

Public Health 📅 6 April 2026 ⏱ 5 min
Understanding GATHER and Making Health Estimates Transparent
GATHER (Guidelines for Accurate and Transparent Health Estimates Reporting), introduced in 2016, provides a structured 18-item reporting standard ensuring health estimates are reproducible, clear, and accompanied by uncertainty quantification.

Why Transparent Reporting Matters

Health estimates — national disease burden, mortality rates, future projections — directly drive policy decisions worth billions of dollars and millions of lives. Without a reporting standard, results may be unverifiable, selectively presented, or impossible to reproduce.

The 18-Item GATHER Checklist (Summary)

DomainItemsPurpose
Study DefinitionPopulation, outcomes, time period, geographyDefines the scope of the estimate
Data InputsSources, access method, inclusion/exclusion criteriaEnsures traceability and replicability
Data ProcessingCleaning, adjustment, availabilityEnables independent verification
Statistical ModelFramework, methods, assumptions, justificationExplains how estimates were produced
Validation & ResultsModel checking, point estimates, uncertainty intervalsAssesses reliability and communicates uncertainty
TransparencyCode sharing, data sharingSupports full reproducibility

The Uncertainty Imperative

GATHER specifically requires that uncertainty intervals accompany every estimate. Reporting "prevalence = 12.5%" is scientifically incomplete. The correct form is: 12.5% (95% UI: 10.2%–14.8%). The interval communicates how much confidence we can place in the point estimate, which is essential for policy-makers weighing intervention costs against disease burden.

Key TakeawayGATHER is not a statistical method — it is a scientific communication standard. Following it does not change your analysis; it makes your analysis trustworthy to others.
Machine Learning 📅 6 April 2026 ⏱ 4 min
Normalization vs. Standardization in Data Preprocessing
Features with different units and scales distort distance-based and gradient-based models. Normalization and standardization are the two primary scaling techniques — but they work differently and suit different scenarios. Choosing the wrong one quietly degrades model performance.

Normalization (Min-Max Scaling)

Rescales all values to fall in [0, 1]: X_norm = (X − X_min) / (X_max − X_min). Preserves the shape of the original distribution. Best for: k-NN, K-Means, neural networks requiring bounded inputs. Fatal weakness: a single outlier can compress all other values into a tiny range.

Standardization (Z-score Scaling)

Transforms to mean = 0, std = 1: X_std = (X − μ) / σ. Does not bound the range but is much more robust to outliers. Best for: PCA, SVM, logistic regression, linear discriminant analysis — any method that assumes or benefits from Gaussian-like feature distributions.

PropertyNormalizationStandardization
Output range[0, 1]Unbounded (typically −3 to +3)
Outlier sensitivityVery highModerate
Preserves shapeYesYes
Best algorithmsk-NN, K-Means, Neural NetsPCA, SVM, Logistic Reg, LDA
Use when Gaussian?Not requiredPreferred

The Golden Rule

Always fit the scaler on training data only. Apply the fitted scaler (same μ, σ or min, max) to test and validation data. Fitting on test data causes data leakage — your model will appear better than it actually is on new data.

Quick Decision RuleUsing tree-based models (Random Forest, XGBoost)? → No scaling needed. Distance-based? → Normalize. Statistics/regression/PCA? → Standardize. Neural networks? → Either, but normalize for image pixels.
Epidemiology 📅 18 March 2026 ⏱ 7 min
Fundamentals of Epidemiologic Research Design
Epidemiology is the study of how disease is distributed in populations and what determines that distribution. Choosing the right study design — observational or experimental — is the most consequential methodological decision a researcher makes, because different designs answer fundamentally different causal questions.

The Epidemiologic Triad

Every epidemiologic study investigates the relationship between an exposure (risk factor, treatment, environment) and an outcome (disease, death, recovery) in a defined population. The fundamental measures are incidence (new cases per person-time), prevalence (existing cases at a point in time), and risk (probability of developing disease).

Observational Study Designs

DesignDirectionMeasureBest forKey Limitation
Cohort (prospective)Exposure → OutcomeRisk Ratio, Rate RatioRare exposures, incidence estimationExpensive, long follow-up
Case-ControlOutcome → ExposureOdds RatioRare outcomes, quick & cheapRecall bias, selection bias
Cross-sectionalSimultaneousPrevalence RatioPrevalence, hypothesis generationCannot establish temporality
EcologicalGroup-levelCorrelationPolicy-level analysisEcological fallacy

Randomised Controlled Trial (RCT) — The Gold Standard

Random allocation of participants to treatment/control eliminates confounding by design — the unknown lurking variables that bias observational studies. However, RCTs are expensive, sometimes unethical (cannot randomise people to smoke), and have limited external validity (trial populations differ from real-world patients).

Measures of Association

Risk Ratio (RR): RR = Risk_exposed / Risk_unexposed. RR = 1 → no association; RR > 1 → positive association; RR < 1 → protective. Odds Ratio (OR): OR = (a/b) / (c/d) in a 2×2 table. When outcome is rare, OR ≈ RR. Attributable Risk (AR): Risk_exposed − Risk_unexposed → how much disease is attributable to the exposure in absolute terms.

Hierarchy of EvidenceSystematic review/meta-analysis → RCT → Prospective cohort → Case-control → Cross-sectional → Case report. Higher is stronger causal evidence, but observational studies remain essential for questions where trials are impossible or unethical.
Climate Health 📅 5 March 2026 ⏱ 6 min
Climate Change and Its Effects on Human Health
The World Health Organization estimates that climate change will cause 250,000 additional deaths per year between 2030–2050 from malnutrition, malaria, diarrhoea, and heat stress alone. The pathways are multiple and intersecting: temperature, extreme events, vector ecology, food systems, and mental health.

Pathways from Climate to Health

Climate change affects health through direct pathways (heat stress, extreme weather injuries, UV exposure) and indirect pathways (vector-borne diseases, food insecurity, water contamination, displacement, mental health). The indirect pathways are more complex and harder to model, but often cause greater total morbidity.

Climate DriverHealth OutcomeVulnerable Population
Rising temperaturesHeat stroke, cardiovascular stress, preterm birthElderly, outdoor workers, pregnant women
Vector habitat expansionDengue, malaria, chikungunya shifting northwardPreviously non-endemic regions
Flooding & stormsDiarrhoeal disease, leptospirosis, trauma, displacementCoastal and low-lying communities
Drought & crop failureMalnutrition, stunting, child mortalitySubsistence farmers in South Asia, Africa
Air quality degradationCOPD, asthma exacerbation, lung cancerUrban poor, children

Bangladesh as a Case Study

Bangladesh is ranked among the most climate-vulnerable countries. Cyclone Amphan (2020) displaced 2.4 million people. Annual flooding submerges 20–25% of the country, contaminating tube wells and triggering diarrhoeal outbreaks. Average temperatures have risen 0.5°C since 1960, extending the dengue transmission season by approximately 3 weeks per decade. These are not future projections — they are present realities demanding urgent public health action.

Climate-Health Research Methods

Key analytical tools include: distributed lag non-linear models (DLNM) for modelling temperature-mortality relationships; time-series analysis linking weather patterns to disease surveillance data; spatial epidemiology mapping disease burden changes with climate projections; and scenario modelling using IPCC pathways (SSP2-4.5, SSP5-8.5) to project future health impacts.

Climate Health 📅 20 February 2026 ⏱ 5 min
One Health: Bridging Human, Animal, and Ecosystem Health
Approximately 75% of emerging infectious diseases are zoonotic — originating in animals before jumping to humans. One Health is the integrated approach that unites human medicine, veterinary science, and environmental health to prevent pandemics at their source rather than respond after spillover.

What is One Health?

The One Health concept, formally endorsed by the WHO, FAO, UNEP, and WOAH (the "Quadripartite"), recognises that human health cannot be protected in isolation from animal health and healthy ecosystems. The three domains are not parallel — they overlap, interact, and mutually determine each other. Deforestation creates human-wildlife interfaces; factory farming breeds antimicrobial resistance; wetland destruction eliminates natural barriers to vector proliferation.

Classic One Health Examples

DiseaseAnimal ReservoirEcosystem DriverHuman Impact
COVID-19Bats (likely)Wildlife trade, urbanisationGlobal pandemic, millions of deaths
Nipah virusFruit batsDeforestation, mango farmingOutbreaks in Bangladesh annually
Avian influenza H5N1Wild birds, poultryLive bird markets, migration routesHigh CFR (>60%) in humans
Antimicrobial resistanceLivestock (all species)Agriculture overuse of antibiotics700,000 deaths/year; projected 10M by 2050

One Health in Bangladesh

Bangladesh has experienced multiple Nipah virus outbreaks traced to raw date palm sap contaminated by bat urine — a direct human-animal-environment interface. The response required simultaneous action in public health (case detection), veterinary surveillance (bat monitoring), and environmental management (sap collection practices). This is One Health in practice: no single sector could have solved it alone.

Research ImplicationOne Health research requires interdisciplinary teams — epidemiologists, veterinarians, ecologists, anthropologists, and data scientists. Methods include zoonotic disease modelling, genomic surveillance of pathogens across species, and network analysis of human-animal interfaces.
Explainable AI 📅 10 February 2026 ⏱ 7 min
Explainable AI: Making Black-Box Models Interpretable in Healthcare
Deep learning models achieve superhuman accuracy on medical imaging tasks — yet clinicians cannot use them safely without understanding why a prediction was made. Explainable AI (XAI) bridges this trust gap by providing model-agnostic or model-specific explanations that connect predictions to clinical reasoning.

The Black-Box Problem in Clinical AI

A neural network predicting sepsis risk from 48 hours of ICU vital signs might achieve AUROC = 0.91 — but if the model triggers an alert, the clinician needs to know: which features drove this prediction? Was it lactate, temperature, or a subtle pattern across multiple vitals? Without this, clinicians cannot verify clinical plausibility, catch spurious correlations, or take targeted action.

Key XAI Methods

MethodTypeHow it WorksBest For
SHAP (SHapley Additive exPlanations)Model-agnosticAttributes prediction to each feature using game-theoretic Shapley valuesTabular data, global + local explanations
LIMEModel-agnosticFits a local linear model in the neighbourhood of each predictionAny model, quick local explanations
Grad-CAMCNN-specificHighlights image regions driving the classification using gradient flowMedical imaging (X-ray, MRI, pathology)
Attention WeightsTransformer-specificVisualises which tokens/timepoints the model attends toNLP clinical notes, time-series EHR
Integrated GradientsDeep learningAttributes prediction to inputs by integrating gradients from baseline to inputGenomics, EHR, images

SHAP in Practice: Dengue Severity Prediction

In a study predicting severe dengue from clinical features at admission, SHAP values revealed that platelet count and haematocrit rise were the dominant predictors — consistent with clinical knowledge of dengue pathophysiology. Crucially, SHAP also flagged that the model was partly using "hospital ID" as a proxy feature — a spurious correlation that would fail catastrophically at a new site. XAI caught what accuracy metrics could not.

Regulatory and Ethical Dimensions

The EU AI Act (2024) classifies clinical AI as high-risk and requires explainability documentation. The FDA's 2021 AI/ML guidance similarly emphasises transparency and traceability. Explainability is no longer optional — it is a regulatory and ethical requirement for clinical deployment.

Mental Health 📅 15 January 2026 ⏱ 5 min
Mental Health in the Era of Climate Anxiety and Ecological Grief
A 2021 global survey of 10,000 young people found that 59% were very or extremely worried about climate change, and 45% said their feelings about it negatively affected their daily functioning. Climate distress is emerging as a significant mental health burden requiring recognition, frameworks, and clinical response.

The Emerging Taxonomy of Climate-Related Distress

Eco-anxiety: Chronic fear of environmental doom and catastrophe. Solastalgia: Grief arising from the degradation of one's home environment (coined by philosopher Glenn Albrecht). Ecological grief: Mourning the loss of species, ecosystems, and places. Climate trauma: PTSD-like symptoms following direct exposure to extreme weather events such as cyclones, floods, or wildfires. These are not disorders of irrational thinking — they are rational emotional responses to real threats.

Epidemiology of Climate Mental Health

Post-disaster mental health studies consistently show elevated rates of depression, PTSD, and anxiety following climate events. After Cyclone Sidr (2007) in Bangladesh, researchers found PTSD prevalence of 31% among directly affected individuals one year post-disaster. After the 2022 Pakistan floods — the worst in its history — mental health services were overwhelmed simultaneously with physical trauma care.

Differential Vulnerability

Climate mental health impacts are not equally distributed. Young people (who face the longest future of climate impacts) show the highest eco-anxiety rates. Farmers and fisherfolk in climate-vulnerable livelihoods experience the highest rates of chronic stress and depression. Indigenous communities suffer unique solastalgia from the loss of culturally significant landscapes. Women in disaster-affected areas face compounded risks due to caregiving burdens and reduced autonomy.

Research and Clinical Responses

The field needs validated screening tools (the Climate Change Worry Scale, the Climate Distress Scale), longitudinal cohort studies linking climate exposures to mental health trajectories, and climate-informed psychotherapy adaptations. Crucially, addressing eco-anxiety is not merely therapeutic — it is also a driver of climate action. Channelling distress into meaningful engagement reduces paralysis and builds resilience.

Health Policy 📅 5 January 2026 ⏱ 6 min
Health Inequality: Understanding the Social Determinants of Disease
The social determinants of health — income, education, housing, employment, neighbourhood environment — explain more of the variation in population health outcomes than medical care does. Addressing health inequality requires upstream policy action, not just more clinics.

What are Social Determinants of Health (SDH)?

The WHO Commission on Social Determinants of Health (Marmot Commission) defines SDH as "the conditions in which people are born, grow, live, work and age." These structural conditions — income distribution, educational access, housing quality, food security, occupational safety, social protection — determine health long before a person ever enters a clinic.

The Social Gradient of Health

Health outcomes follow a near-continuous social gradient: with each step down the socioeconomic ladder, health worsens. This is not simply a binary "poor vs rich" effect — managers have worse health than executives; clerks have worse health than managers. The Whitehall Studies of British civil servants demonstrated this gradient across an employed, non-destitute population — ruling out absolute deprivation as the sole explanation.

Measurement in Bangladesh

SDH DomainMetricBangladesh Inequality Gap
IncomeUnder-5 mortalityPoorest quintile: 65/1000; Richest: 20/1000
EducationStunting prevalenceNo education mothers: 47%; Higher education: 21%
GeographySkilled birth attendanceUrban: 74%; Rural: 41%
GenderAnaemiaWomen: 36%; Men: 15%

Policy Implications: Upstream vs Downstream

Downstream interventions (treating sick people) are necessary but insufficient. Upstream interventions — progressive taxation, housing standards, universal education, social protection — produce the largest and most equitable health gains per taka invested. The economic case is strong: the Marmot Review estimated that health inequalities cost England £31–33 billion annually in lost productivity. Reducing inequality is not just a moral imperative; it is economically rational.

Public Health 📅 20 December 2025 ⏱ 6 min
The Concept of Spillover Effects in Health Policy Evaluation
Standard RCT analysis assumes that a treated individual's outcomes depend only on their own treatment status (SUTVA — Stable Unit Treatment Value Assumption). In public health, this assumption is routinely violated: vaccine coverage protects the unvaccinated; deworming treated children improves untreated classmates; mental health programs change household dynamics. These spillovers are the rule, not the exception.

What is SUTVA and Why Does it Fail?

The Stable Unit Treatment Value Assumption (SUTVA) requires that (1) there is only one version of each treatment and (2) potential outcomes for any unit are unaffected by the treatment of other units. In infectious disease control, (2) is obviously false: vaccinating your neighbour reduces your infection risk. In nutrition programs, treating a sibling changes household food allocation. In mental health interventions, treating a depressed parent changes outcomes for untreated children.

Types of Spillover Effects

TypeMechanismDirectionPublic Health Example
Herd immunityReduced pathogen circulationPositiveVaccine coverage protects unvaccinated
Behavioural spilloverSocial norms changePositive/NegativeHand-washing programs change untreated neighbours
Resource reallocationHousehold budget effectsPositive/NegativeCash transfer to mother improves sibling nutrition
General equilibriumMarket/labour market changesOften positiveDeworming increases wages for untreated workers
CongestionOvercrowded servicesNegativeLarge vaccine campaign overwhelms clinics

Methods for Estimating Spillovers

Two-stage randomisation: Randomise the proportion treated within clusters (villages, schools), then randomise individuals within clusters. Compare outcomes for untreated individuals in high-coverage vs low-coverage clusters. Network-based approaches: Model spillovers through social networks, estimating how treatment of network-connected individuals affects outcomes. Geographic regression discontinuity: Compare outcomes at boundaries of treated and untreated areas.

The Kenya Deworming Controversy

The famous Kenya Primary School Deworming Project (Miguel & Kremer, 2004) showed enormous positive externalities — the benefit to untreated children in treated schools was so large that the total social benefit was many times larger than the direct benefit to treated children. This spillover-inclusive evaluation completely changed the cost-effectiveness of deworming, making it one of the most cost-effective development interventions ever measured. Ignoring spillovers can therefore produce catastrophically misleading policy guidance in either direction.

Health Policy 📅 1 December 2025 ⏱ 5 min
Health Inequality and Climate Change: Intersecting Crises
Sub-Saharan Africa contributes 3% of global cumulative CO₂ emissions but bears 25% of climate-related disease burden. Bangladesh emits 0.5 tonnes of CO₂ per capita vs 14.7 tonnes in the USA — yet faces existential climate risks. This paper examines the climate-inequality nexus and its implications for global health equity.

The Double Injustice of Climate and Health

The communities with the smallest carbon footprints face the largest climate health burdens. This is a double injustice: they bear costs they did not create, AND their lower adaptive capacity means they suffer worse health consequences per unit of climate exposure. A Bangladeshi farmer facing monsoon floods has fewer options — financial, geographic, informational — than a Dutch farmer facing the same flood risk, because the Netherlands has invested centuries of wealth into flood protection infrastructure.

Mechanisms of Compounding Disadvantage

Occupational exposure: The poor are disproportionately employed in outdoor, climate-exposed work (agriculture, construction, fishing) with no option to work remotely or in air-conditioned environments. Housing quality: Informal settlements without adequate insulation, cooling, or storm protection amplify heat and flood risk. Healthcare access: When climate disasters strike, the poorest are furthest from functioning health infrastructure. Nutrition: Climate crop failure first hits subsistence farmers, not supermarket shoppers.

Loss and Damage: A Policy Framework

The historic COP27 (2022) agreement to establish a Loss and Damage fund acknowledged that beyond mitigation and adaptation, there are irreversible climate harms — loss of land, lives, cultural heritage — requiring compensation. This is the first formal recognition of climate reparations in international law and has direct implications for health financing in vulnerable nations.

Research PriorityDeveloping country researchers must lead climate-health equity research — they understand local contexts, have access to disaggregated data, and bear personal witness to the realities being studied.