Short-form research notes, methods explanations, and learning posts on machine learning, statistics, epidemiology, climate health, and public health.
A classification threshold is the probability cutoff point above which a model predicts "outbreak" and below which it predicts "no outbreak." The default threshold of 0.5 assumes equally likely classes — a poor assumption for rare disease outbreaks where case counts are far less common than non-cases.
In outbreak surveillance, the cost of missing a true outbreak (false negative) is usually far higher than the cost of a false alarm (false positive). A model with 95% accuracy can still fail catastrophically if the threshold is wrong — triggering no alerts during the early phase of an epidemic when probabilities hover around 0.3.
| Threshold Strategy | Effect | Best Used When |
|---|---|---|
| Precision-Recall optimised | Balances detection vs precision | Imbalanced outbreak data |
| Cost-sensitive | Minimises real-world loss function | Miss cost >> false alarm cost |
| Seasonal adaptive | Changes threshold by season | Climate-sensitive diseases (dengue, cholera) |
| Spatial adaptive | Region-specific cutoff | Multi-region surveillance systems |
| Percentile-based | Flags top-risk observations | Early warning dashboards |
| Default 0.5 | Assumes balanced classes | Balanced datasets only — NOT outbreaks |
For outbreak prediction: (1) always use the Precision-Recall curve rather than ROC for threshold selection; (2) compute the F-beta score with β > 1 to weight recall higher than precision; (3) validate the selected threshold on held-out surveillance data from at least two epidemic seasons before deployment. The threshold is not a model parameter — it is a policy decision that should involve epidemiologists alongside data scientists.
Health estimates — national disease burden, mortality rates, future projections — directly drive policy decisions worth billions of dollars and millions of lives. Without a reporting standard, results may be unverifiable, selectively presented, or impossible to reproduce.
| Domain | Items | Purpose |
|---|---|---|
| Study Definition | Population, outcomes, time period, geography | Defines the scope of the estimate |
| Data Inputs | Sources, access method, inclusion/exclusion criteria | Ensures traceability and replicability |
| Data Processing | Cleaning, adjustment, availability | Enables independent verification |
| Statistical Model | Framework, methods, assumptions, justification | Explains how estimates were produced |
| Validation & Results | Model checking, point estimates, uncertainty intervals | Assesses reliability and communicates uncertainty |
| Transparency | Code sharing, data sharing | Supports full reproducibility |
GATHER specifically requires that uncertainty intervals accompany every estimate. Reporting "prevalence = 12.5%" is scientifically incomplete. The correct form is: 12.5% (95% UI: 10.2%–14.8%). The interval communicates how much confidence we can place in the point estimate, which is essential for policy-makers weighing intervention costs against disease burden.
Rescales all values to fall in [0, 1]: X_norm = (X − X_min) / (X_max − X_min). Preserves the shape of the original distribution. Best for: k-NN, K-Means, neural networks requiring bounded inputs. Fatal weakness: a single outlier can compress all other values into a tiny range.
Transforms to mean = 0, std = 1: X_std = (X − μ) / σ. Does not bound the range but is much more robust to outliers. Best for: PCA, SVM, logistic regression, linear discriminant analysis — any method that assumes or benefits from Gaussian-like feature distributions.
| Property | Normalization | Standardization |
|---|---|---|
| Output range | [0, 1] | Unbounded (typically −3 to +3) |
| Outlier sensitivity | Very high | Moderate |
| Preserves shape | Yes | Yes |
| Best algorithms | k-NN, K-Means, Neural Nets | PCA, SVM, Logistic Reg, LDA |
| Use when Gaussian? | Not required | Preferred |
Always fit the scaler on training data only. Apply the fitted scaler (same μ, σ or min, max) to test and validation data. Fitting on test data causes data leakage — your model will appear better than it actually is on new data.
Every epidemiologic study investigates the relationship between an exposure (risk factor, treatment, environment) and an outcome (disease, death, recovery) in a defined population. The fundamental measures are incidence (new cases per person-time), prevalence (existing cases at a point in time), and risk (probability of developing disease).
| Design | Direction | Measure | Best for | Key Limitation |
|---|---|---|---|---|
| Cohort (prospective) | Exposure → Outcome | Risk Ratio, Rate Ratio | Rare exposures, incidence estimation | Expensive, long follow-up |
| Case-Control | Outcome → Exposure | Odds Ratio | Rare outcomes, quick & cheap | Recall bias, selection bias |
| Cross-sectional | Simultaneous | Prevalence Ratio | Prevalence, hypothesis generation | Cannot establish temporality |
| Ecological | Group-level | Correlation | Policy-level analysis | Ecological fallacy |
Random allocation of participants to treatment/control eliminates confounding by design — the unknown lurking variables that bias observational studies. However, RCTs are expensive, sometimes unethical (cannot randomise people to smoke), and have limited external validity (trial populations differ from real-world patients).
Risk Ratio (RR): RR = Risk_exposed / Risk_unexposed. RR = 1 → no association; RR > 1 → positive association; RR < 1 → protective. Odds Ratio (OR): OR = (a/b) / (c/d) in a 2×2 table. When outcome is rare, OR ≈ RR. Attributable Risk (AR): Risk_exposed − Risk_unexposed → how much disease is attributable to the exposure in absolute terms.
Climate change affects health through direct pathways (heat stress, extreme weather injuries, UV exposure) and indirect pathways (vector-borne diseases, food insecurity, water contamination, displacement, mental health). The indirect pathways are more complex and harder to model, but often cause greater total morbidity.
| Climate Driver | Health Outcome | Vulnerable Population |
|---|---|---|
| Rising temperatures | Heat stroke, cardiovascular stress, preterm birth | Elderly, outdoor workers, pregnant women |
| Vector habitat expansion | Dengue, malaria, chikungunya shifting northward | Previously non-endemic regions |
| Flooding & storms | Diarrhoeal disease, leptospirosis, trauma, displacement | Coastal and low-lying communities |
| Drought & crop failure | Malnutrition, stunting, child mortality | Subsistence farmers in South Asia, Africa |
| Air quality degradation | COPD, asthma exacerbation, lung cancer | Urban poor, children |
Bangladesh is ranked among the most climate-vulnerable countries. Cyclone Amphan (2020) displaced 2.4 million people. Annual flooding submerges 20–25% of the country, contaminating tube wells and triggering diarrhoeal outbreaks. Average temperatures have risen 0.5°C since 1960, extending the dengue transmission season by approximately 3 weeks per decade. These are not future projections — they are present realities demanding urgent public health action.
Key analytical tools include: distributed lag non-linear models (DLNM) for modelling temperature-mortality relationships; time-series analysis linking weather patterns to disease surveillance data; spatial epidemiology mapping disease burden changes with climate projections; and scenario modelling using IPCC pathways (SSP2-4.5, SSP5-8.5) to project future health impacts.
The One Health concept, formally endorsed by the WHO, FAO, UNEP, and WOAH (the "Quadripartite"), recognises that human health cannot be protected in isolation from animal health and healthy ecosystems. The three domains are not parallel — they overlap, interact, and mutually determine each other. Deforestation creates human-wildlife interfaces; factory farming breeds antimicrobial resistance; wetland destruction eliminates natural barriers to vector proliferation.
| Disease | Animal Reservoir | Ecosystem Driver | Human Impact |
|---|---|---|---|
| COVID-19 | Bats (likely) | Wildlife trade, urbanisation | Global pandemic, millions of deaths |
| Nipah virus | Fruit bats | Deforestation, mango farming | Outbreaks in Bangladesh annually |
| Avian influenza H5N1 | Wild birds, poultry | Live bird markets, migration routes | High CFR (>60%) in humans |
| Antimicrobial resistance | Livestock (all species) | Agriculture overuse of antibiotics | 700,000 deaths/year; projected 10M by 2050 |
Bangladesh has experienced multiple Nipah virus outbreaks traced to raw date palm sap contaminated by bat urine — a direct human-animal-environment interface. The response required simultaneous action in public health (case detection), veterinary surveillance (bat monitoring), and environmental management (sap collection practices). This is One Health in practice: no single sector could have solved it alone.
A neural network predicting sepsis risk from 48 hours of ICU vital signs might achieve AUROC = 0.91 — but if the model triggers an alert, the clinician needs to know: which features drove this prediction? Was it lactate, temperature, or a subtle pattern across multiple vitals? Without this, clinicians cannot verify clinical plausibility, catch spurious correlations, or take targeted action.
| Method | Type | How it Works | Best For |
|---|---|---|---|
| SHAP (SHapley Additive exPlanations) | Model-agnostic | Attributes prediction to each feature using game-theoretic Shapley values | Tabular data, global + local explanations |
| LIME | Model-agnostic | Fits a local linear model in the neighbourhood of each prediction | Any model, quick local explanations |
| Grad-CAM | CNN-specific | Highlights image regions driving the classification using gradient flow | Medical imaging (X-ray, MRI, pathology) |
| Attention Weights | Transformer-specific | Visualises which tokens/timepoints the model attends to | NLP clinical notes, time-series EHR |
| Integrated Gradients | Deep learning | Attributes prediction to inputs by integrating gradients from baseline to input | Genomics, EHR, images |
In a study predicting severe dengue from clinical features at admission, SHAP values revealed that platelet count and haematocrit rise were the dominant predictors — consistent with clinical knowledge of dengue pathophysiology. Crucially, SHAP also flagged that the model was partly using "hospital ID" as a proxy feature — a spurious correlation that would fail catastrophically at a new site. XAI caught what accuracy metrics could not.
The EU AI Act (2024) classifies clinical AI as high-risk and requires explainability documentation. The FDA's 2021 AI/ML guidance similarly emphasises transparency and traceability. Explainability is no longer optional — it is a regulatory and ethical requirement for clinical deployment.
Eco-anxiety: Chronic fear of environmental doom and catastrophe. Solastalgia: Grief arising from the degradation of one's home environment (coined by philosopher Glenn Albrecht). Ecological grief: Mourning the loss of species, ecosystems, and places. Climate trauma: PTSD-like symptoms following direct exposure to extreme weather events such as cyclones, floods, or wildfires. These are not disorders of irrational thinking — they are rational emotional responses to real threats.
Post-disaster mental health studies consistently show elevated rates of depression, PTSD, and anxiety following climate events. After Cyclone Sidr (2007) in Bangladesh, researchers found PTSD prevalence of 31% among directly affected individuals one year post-disaster. After the 2022 Pakistan floods — the worst in its history — mental health services were overwhelmed simultaneously with physical trauma care.
Climate mental health impacts are not equally distributed. Young people (who face the longest future of climate impacts) show the highest eco-anxiety rates. Farmers and fisherfolk in climate-vulnerable livelihoods experience the highest rates of chronic stress and depression. Indigenous communities suffer unique solastalgia from the loss of culturally significant landscapes. Women in disaster-affected areas face compounded risks due to caregiving burdens and reduced autonomy.
The field needs validated screening tools (the Climate Change Worry Scale, the Climate Distress Scale), longitudinal cohort studies linking climate exposures to mental health trajectories, and climate-informed psychotherapy adaptations. Crucially, addressing eco-anxiety is not merely therapeutic — it is also a driver of climate action. Channelling distress into meaningful engagement reduces paralysis and builds resilience.
The WHO Commission on Social Determinants of Health (Marmot Commission) defines SDH as "the conditions in which people are born, grow, live, work and age." These structural conditions — income distribution, educational access, housing quality, food security, occupational safety, social protection — determine health long before a person ever enters a clinic.
Health outcomes follow a near-continuous social gradient: with each step down the socioeconomic ladder, health worsens. This is not simply a binary "poor vs rich" effect — managers have worse health than executives; clerks have worse health than managers. The Whitehall Studies of British civil servants demonstrated this gradient across an employed, non-destitute population — ruling out absolute deprivation as the sole explanation.
| SDH Domain | Metric | Bangladesh Inequality Gap |
|---|---|---|
| Income | Under-5 mortality | Poorest quintile: 65/1000; Richest: 20/1000 |
| Education | Stunting prevalence | No education mothers: 47%; Higher education: 21% |
| Geography | Skilled birth attendance | Urban: 74%; Rural: 41% |
| Gender | Anaemia | Women: 36%; Men: 15% |
Downstream interventions (treating sick people) are necessary but insufficient. Upstream interventions — progressive taxation, housing standards, universal education, social protection — produce the largest and most equitable health gains per taka invested. The economic case is strong: the Marmot Review estimated that health inequalities cost England £31–33 billion annually in lost productivity. Reducing inequality is not just a moral imperative; it is economically rational.
The Stable Unit Treatment Value Assumption (SUTVA) requires that (1) there is only one version of each treatment and (2) potential outcomes for any unit are unaffected by the treatment of other units. In infectious disease control, (2) is obviously false: vaccinating your neighbour reduces your infection risk. In nutrition programs, treating a sibling changes household food allocation. In mental health interventions, treating a depressed parent changes outcomes for untreated children.
| Type | Mechanism | Direction | Public Health Example |
|---|---|---|---|
| Herd immunity | Reduced pathogen circulation | Positive | Vaccine coverage protects unvaccinated |
| Behavioural spillover | Social norms change | Positive/Negative | Hand-washing programs change untreated neighbours |
| Resource reallocation | Household budget effects | Positive/Negative | Cash transfer to mother improves sibling nutrition |
| General equilibrium | Market/labour market changes | Often positive | Deworming increases wages for untreated workers |
| Congestion | Overcrowded services | Negative | Large vaccine campaign overwhelms clinics |
Two-stage randomisation: Randomise the proportion treated within clusters (villages, schools), then randomise individuals within clusters. Compare outcomes for untreated individuals in high-coverage vs low-coverage clusters. Network-based approaches: Model spillovers through social networks, estimating how treatment of network-connected individuals affects outcomes. Geographic regression discontinuity: Compare outcomes at boundaries of treated and untreated areas.
The famous Kenya Primary School Deworming Project (Miguel & Kremer, 2004) showed enormous positive externalities — the benefit to untreated children in treated schools was so large that the total social benefit was many times larger than the direct benefit to treated children. This spillover-inclusive evaluation completely changed the cost-effectiveness of deworming, making it one of the most cost-effective development interventions ever measured. Ignoring spillovers can therefore produce catastrophically misleading policy guidance in either direction.
The communities with the smallest carbon footprints face the largest climate health burdens. This is a double injustice: they bear costs they did not create, AND their lower adaptive capacity means they suffer worse health consequences per unit of climate exposure. A Bangladeshi farmer facing monsoon floods has fewer options — financial, geographic, informational — than a Dutch farmer facing the same flood risk, because the Netherlands has invested centuries of wealth into flood protection infrastructure.
Occupational exposure: The poor are disproportionately employed in outdoor, climate-exposed work (agriculture, construction, fishing) with no option to work remotely or in air-conditioned environments. Housing quality: Informal settlements without adequate insulation, cooling, or storm protection amplify heat and flood risk. Healthcare access: When climate disasters strike, the poorest are furthest from functioning health infrastructure. Nutrition: Climate crop failure first hits subsistence farmers, not supermarket shoppers.
The historic COP27 (2022) agreement to establish a Loss and Damage fund acknowledged that beyond mitigation and adaptation, there are irreversible climate harms — loss of land, lives, cultural heritage — requiring compensation. This is the first formal recognition of climate reparations in international law and has direct implications for health financing in vulnerable nations.