The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Validation Against WHO Official Fact Sheets

Abhijit Pakhare

2026-05-06

Acknowledgements

This validation relies on several resources developed and maintained by the World Health Organization (WHO):

We gratefully acknowledge the national STEPS survey teams and coordinators in all nine validation countries for designing, conducting, and managing these surveys and for making the data available for research use through the WHO NCD Microdata Repository. The quality and completeness of these datasets made systematic validation possible.

We also acknowledge the WHO NCD Surveillance team for developing and maintaining the STEPS methodology, analysis tools, and data sharing infrastructure that underpin this work.

Overview

The stepssurvey package has been validated against WHO-published fact sheet values for nine national STEPS surveys spanning different WHO regions, time periods, and STEPS instrument versions:

Country Year WHO Region Age range Instrument
Republic of Moldova 2021 EUR 18–69 v3.2
Mongolia 2019 WPR 15–69 v3.2
Georgia 2016 EUR 18–69 v3.1
Afghanistan 2018 EMR 18–69 v3.2
Algeria 2016 AFR 18–69 v3.1
Ukraine 2019 EUR 18–69 v3.2
Ecuador 2018 AMR 18–69 v3.2
Cabo Verde 2020 AFR 18–69 v3.2
Bahamas 2019 AMR 18–69 v3.2

For each country, the package pipeline (import -> detect -> clean -> survey design -> indicator computation) was run end-to-end and results compared with the “Both Sexes” estimates published in the corresponding WHO STEPS country fact sheet.

Validation criteria

Two complementary criteria were used to judge agreement:

  1. Point-estimate match — the package estimate falls within 1.0 percentage point (pp) of the fact sheet value for proportions, or 1.0 unit for continuous measures (BMI, SBP, cholesterol).
  2. Confidence-interval overlap — the 95 % confidence intervals from the package and the fact sheet share at least one common value.

The 1 pp tolerance accounts for rounding at different stages and minor differences in the treatment of edge cases (e.g. “don’t know” responses coded as 77 or 88).

Indicators compared

Up to 17 indicators were compared per country, covering all three STEPS Steps:

Step 1 — Behavioural risk factors: current tobacco use, current smoking, second-hand smoke exposure (home and workplace), current alcohol use (past 30 days), heavy episodic drinking, insufficient fruit and vegetable consumption, insufficient physical activity.

Step 2 — Physical measurements: mean BMI, overweight (BMI >= 25), obesity (BMI >= 30), mean systolic blood pressure, raised blood pressure or on medication.

Step 3 — Biochemical measurements: mean total cholesterol, raised cholesterol or on medication, raised fasting glucose or on medication, impaired fasting glucose.

Not all indicators were available in every country fact sheet. Some fact sheets omit second-hand smoke, and the Bahamas fact sheet reports Step 3 biochemical measurements as unweighted estimates (response rate < 60 %), so those were excluded from comparison.

Results summary

Validation summary: stepssurvey vs. WHO fact sheets (9 countries)
Country Indicators compared Within 1 pp CI overlap Match rate
Moldova 2021 17 16 17 94%
Mongolia 2019 15 15 15 100%
Georgia 2016 15 15 15 100%
Afghanistan 2018 14 14 14 100%
Algeria 2016 14 14 14 100%
Ukraine 2019 14 13 14 93%
Ecuador 2018 14 13 14 93%
Cabo Verde 2020 15 14 15 93%
Bahamas 2019 10 9 10 90%
TOTAL 128 123 128 96%

Overall, 123 of 128 indicators (96 %) match within 1 pp, and all 128 (100 %) have overlapping confidence intervals. The five remaining mismatches are small (1.0–2.2 pp) and all have overlapping CIs, suggesting they reflect minor methodological differences rather than errors in the package.

Detailed results by country

Moldova 2021

Moldova 2021: 16/17 within 1 pp, 17/17 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current tobacco use 27.7 29.9 -2.2 No Yes
Current smoking 27.6 27.6 0.0 Yes Yes
Second-hand smoke (home) 23.3 23.2 0.1 Yes Yes
Second-hand smoke (work) 26.4 26.4 0.0 Yes Yes
Current alcohol (30 d) 63.2 63.2 0.0 Yes Yes
Heavy episodic drinking 13.8 13.8 0.0 Yes Yes
Insufficient fruit/veg 63.2 63.4 -0.2 Yes Yes
Insufficient physical activity 9.1 9.1 0.0 Yes Yes
Overweight (BMI >= 25) 63.9 63.9 0.0 Yes Yes
Obese (BMI >= 30) 22.7 22.7 0.0 Yes Yes
Raised BP or on meds 35.0 34.8 0.2 Yes Yes
Raised glucose or on meds 6.3 6.3 0.0 Yes Yes
Raised cholesterol or on meds 28.4 27.7 0.7 Yes Yes
Impaired fasting glucose 9.9 9.9 0.0 Yes Yes
Mean BMI 26.9 26.9 0.0 Yes Yes
Mean SBP 129.2 129.2 0.0 Yes Yes
Mean total cholesterol 4.4 4.4 0.0 Yes Yes

The single mismatch is current tobacco use (–2.2 pp). The confidence intervals overlap (ours: 25.6–29.8 vs. WHO: 27.7–32.1), suggesting the difference may reflect rounding or a slightly different indicator definition in the published fact sheet.

Mongolia 2019

Mongolia 2019: 15/15 within 1 pp, 15/15 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current tobacco use 25.0 24.2 0.8 Yes Yes
Current smoking (daily) 21.6 21.6 0.0 Yes Yes
Current alcohol (30 d) 34.8 34.8 0.0 Yes Yes
Heavy episodic drinking 20.2 19.8 0.4 Yes Yes
Insufficient fruit/veg 83.2 83.4 -0.2 Yes Yes
Insufficient physical activity 22.5 21.9 0.6 Yes Yes
Overweight (BMI >= 25) 49.3 49.4 -0.1 Yes Yes
Obese (BMI >= 30) 18.5 18.5 0.0 Yes Yes
Raised BP or on meds (130/80) 44.3 44.0 0.3 Yes Yes
Raised glucose or on meds 8.3 8.3 0.0 Yes Yes
Raised cholesterol or on meds 27.8 27.8 0.0 Yes Yes
Impaired fasting glucose 17.4 17.4 0.0 Yes Yes
Mean BMI 25.6 25.5 0.1 Yes Yes
Mean SBP 120.5 120.5 0.0 Yes Yes
Mean total cholesterol 4.4 4.4 0.0 Yes Yes

Mongolia uses a raised blood-pressure threshold of 130/80 mmHg (rather than the standard 140/90), and the age range begins at 15 (rather than 18). Second-hand smoke indicators were not reported in the Mongolia fact sheet.

Note: the Mongolia WHO fact sheet reports daily smoking under the “current smoking” label. The package computes both any-current and daily smoking; the daily-smoking variable is used for this comparison.

Georgia 2016

Georgia 2016: 15/15 within 1 pp, 15/15 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current tobacco use 31.1 31.0 0.1 Yes Yes
Current smoking (daily) 28.0 28.0 0.0 Yes Yes
Current alcohol (30 d) 39.0 39.1 -0.1 Yes Yes
Heavy episodic drinking 18.7 18.3 0.4 Yes Yes
Insufficient fruit/veg 62.9 63.0 -0.1 Yes Yes
Insufficient physical activity 18.2 17.4 0.8 Yes Yes
Overweight (BMI >= 25) 64.6 64.6 0.0 Yes Yes
Obese (BMI >= 30) 33.4 33.2 0.2 Yes Yes
Raised BP or on meds 37.7 37.7 0.0 Yes Yes
Raised glucose or on meds 4.5 4.5 0.0 Yes Yes
Raised cholesterol or on meds 27.7 27.7 0.0 Yes Yes
Impaired fasting glucose 2.0 2.0 0.0 Yes Yes
Mean BMI 28.2 28.1 0.1 Yes Yes
Mean SBP 129.4 129.4 0.0 Yes Yes
Mean total cholesterol 4.3 4.3 0.0 Yes Yes

As with Mongolia, the Georgia WHO fact sheet reports daily smoking under the “current smoking” label. Second-hand smoke indicators were not reported in the Georgia fact sheet.

Afghanistan 2018

Afghanistan 2018: 14/14 within 1 pp, 14/14 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 8.6 8.6 0.0 Yes Yes
Current alcohol (30 d) 0.2 0.2 0.0 Yes Yes
Heavy episodic drinking 0.1 0.1 0.0 Yes Yes
Insufficient fruit/veg 97.3 97.3 0.0 Yes Yes
Insufficient physical activity 26.6 26.5 0.1 Yes Yes
Overweight (BMI >= 25) 42.7 42.7 0.0 Yes Yes
Obese (BMI >= 30) 17.2 17.0 0.2 Yes Yes
Raised BP or on meds 29.2 29.2 0.0 Yes Yes
Raised glucose or on meds 9.2 9.2 0.0 Yes Yes
Raised cholesterol or on meds 17.8 18.1 -0.3 Yes Yes
Impaired fasting glucose 4.9 4.7 0.2 Yes Yes
Mean BMI 25.2 25.1 0.1 Yes Yes
Mean SBP 125.5 125.5 0.0 Yes Yes
Mean total cholesterol 3.8 3.8 0.0 Yes Yes

All 14 comparable indicators match within 1 pp. The Afghanistan fact sheet reports cholesterol in mg/dl; the package auto-converts to mmol/L, and the raised-cholesterol threshold was aligned to 4.914 mmol/L (= 190 mg/dl) to match the WHO definition.

Algeria 2016

Algeria 2016: 14/14 within 1 pp, 14/14 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 16.4 16.5 -0.1 Yes Yes
Current alcohol (30 d) 2.1 2.1 0.0 Yes Yes
Heavy episodic drinking 1.3 1.3 0.0 Yes Yes
Insufficient fruit/veg 85.2 85.3 -0.1 Yes Yes
Insufficient physical activity 23.7 23.7 0.0 Yes Yes
Overweight (BMI >= 25) 55.5 55.6 -0.1 Yes Yes
Obese (BMI >= 30) 21.9 21.8 0.1 Yes Yes
Raised BP or on meds 23.7 23.6 0.1 Yes Yes
Raised glucose or on meds 8.8 9.0 -0.2 Yes Yes
Raised cholesterol or on meds 23.5 24.0 -0.5 Yes Yes
Impaired fasting glucose 8.6 8.2 0.4 Yes Yes
Mean BMI 26.4 26.4 0.0 Yes Yes
Mean SBP 126.4 126.3 0.1 Yes Yes
Mean total cholesterol 4.2 4.2 0.0 Yes Yes

All 14 comparable indicators match within 1 pp.

Ukraine 2019

Ukraine 2019: 13/14 within 1 pp, 14/14 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 33.7 33.9 -0.2 Yes Yes
Current alcohol (30 d) 55.5 55.6 -0.1 Yes Yes
Heavy episodic drinking 20.3 19.7 0.6 Yes Yes
Insufficient fruit/veg 66.1 66.4 -0.3 Yes Yes
Insufficient physical activity 10.7 10.0 0.7 Yes Yes
Overweight (BMI >= 25) 59.1 59.0 0.1 Yes Yes
Obese (BMI >= 30) 24.9 24.8 0.1 Yes Yes
Raised BP or on meds 36.7 34.8 1.9 No Yes
Raised glucose or on meds 7.1 7.1 0.0 Yes Yes
Raised cholesterol or on meds 40.7 40.7 0.0 Yes Yes
Impaired fasting glucose 9.0 8.8 0.2 Yes Yes
Mean BMI 26.9 26.8 0.1 Yes Yes
Mean SBP 129.2 129.1 0.1 Yes Yes
Mean total cholesterol 4.7 4.7 0.0 Yes Yes

The single mismatch is raised BP or on meds (+1.9 pp). The CIs overlap (ours: 33.7–39.6 vs. WHO: 31.2–38.4), suggesting a minor difference in medication question coding.

Ecuador 2018

Ecuador 2018: 13/14 within 1 pp, 14/14 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 13.7 13.7 0.0 Yes Yes
Current alcohol (30 d) 39.3 39.3 0.0 Yes Yes
Heavy episodic drinking 24.1 23.8 0.3 Yes Yes
Insufficient fruit/veg 94.6 94.6 0.0 Yes Yes
Insufficient physical activity 17.8 17.8 0.0 Yes Yes
Overweight (BMI >= 25) 63.6 63.6 0.0 Yes Yes
Obese (BMI >= 30) 25.7 25.7 0.0 Yes Yes
Raised BP or on meds 20.5 19.8 0.7 Yes Yes
Raised glucose or on meds 6.9 7.1 -0.2 Yes Yes
Raised cholesterol or on meds 33.7 34.7 -1.0 No Yes
Impaired fasting glucose 8.3 7.8 0.5 Yes Yes
Mean BMI 27.2 27.2 0.0 Yes Yes
Mean SBP 119.7 119.7 0.0 Yes Yes
Mean total cholesterol 4.4 4.4 0.0 Yes Yes

The single mismatch is raised cholesterol or on meds (–1.0 pp, right at the threshold boundary). The CIs overlap (ours: 31.7–35.8 vs. WHO: 32.6–36.8).

Cabo Verde 2020

Cabo Verde 2020: 14/15 within 1 pp, 15/15 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 9.6 9.6 0.0 Yes Yes
Second-hand smoke (work) 15.0 15.0 0.0 Yes Yes
Current alcohol (30 d) 45.0 45.0 0.0 Yes Yes
Heavy episodic drinking 17.5 17.5 0.0 Yes Yes
Insufficient fruit/veg 77.3 79.0 -1.7 No Yes
Insufficient physical activity 31.6 31.8 -0.2 Yes Yes
Overweight (BMI >= 25) 44.2 44.2 0.0 Yes Yes
Obese (BMI >= 30) 14.3 14.3 0.0 Yes Yes
Raised BP or on meds 31.1 30.8 0.3 Yes Yes
Raised glucose or on meds 3.6 3.7 -0.1 Yes Yes
Raised cholesterol or on meds 17.9 18.8 -0.9 Yes Yes
Impaired fasting glucose 2.4 2.3 0.1 Yes Yes
Mean BMI 25.1 25.1 0.0 Yes Yes
Mean SBP 128.8 128.8 0.0 Yes Yes
Mean total cholesterol 4.0 4.0 0.0 Yes Yes

The single mismatch is insufficient fruit/veg (–1.7 pp). The CIs overlap (ours: 75.3–79.4 vs. WHO: 77.1–80.9). The Cabo Verde fact sheet is one of the few that also reports second-hand smoke at work, which matched exactly.

Bahamas 2019

Bahamas 2019: 9/10 within 1 pp, 10/10 CI overlap
Indicator Ours WHO Diff Match CI_Overlap
Current smoking 17.4 17.4 0.0 Yes Yes
Current alcohol (30 d) 49.5 49.6 -0.1 Yes Yes
Heavy episodic drinking 18.1 17.6 0.5 Yes Yes
Insufficient fruit/veg 85.0 85.3 -0.3 Yes Yes
Insufficient physical activity 30.1 30.2 -0.1 Yes Yes
Overweight (BMI >= 25) 71.7 71.6 0.1 Yes Yes
Obese (BMI >= 30) 44.5 43.6 0.9 Yes Yes
Raised BP or on meds 35.3 36.7 -1.4 No Yes
Mean BMI 30.6 29.8 0.8 Yes Yes
Mean SBP 125.4 125.4 0.0 Yes Yes

The single mismatch is raised BP or on meds (–1.4 pp). Step 3 biochemical indicators were excluded from comparison because the Bahamas fact sheet reports them as unweighted estimates (Step 3 response rate was below 60 %).

Analysis of mismatches

Across all nine countries, only five of 128 indicators exceed the 1 pp tolerance. The mismatches are:

Current tobacco use (Moldova –2.2 pp): the largest remaining mismatch. The confidence intervals overlap, suggesting a slightly different indicator definition in the published fact sheet (e.g. whether smokeless tobacco is included).

Raised blood pressure or on medication (Ukraine +1.9, Bahamas –1.4): the mismatches are in opposite directions, suggesting country-specific differences in how the medication question is coded rather than a systematic package issue.

Raised cholesterol or on medication (Ecuador –1.0 pp): right at the 1 pp threshold boundary. The CIs overlap.

Insufficient fruit/veg (Cabo Verde –1.7 pp): the CIs overlap.

All five mismatches have overlapping 95 % confidence intervals, confirming that the differences are not statistically significant at conventional levels.

Key methodological alignments

During validation, several methodological details were identified where alignment with the WHO official analysis scripts was essential for reproducibility:

Survey design. Sampling weights are used as-is without trimming, matching the WHO official STEPS analysis scripts. Lone PSUs are handled with options(survey.lonely.psu = "adjust").

Alcohol skip patterns. Non-drinkers who skip the heavy-episodic drinking question (A9) are coded as FALSE (not NA), ensuring the denominator covers the total population rather than only current drinkers.

Physical activity screening questions. GPAQ screening questions (P1, P4, P7, P10, P13) where the respondent answers “No” set the domain contribution to zero, rather than leaving it as NA. Special codes 77 and 88 (“don’t know” / “refused”) are cleaned to NA.

Diet zero-days handling. When a respondent reports eating fruit or vegetables on zero days per week, the servings-per-day variable (which is skipped and therefore NA) is set to zero rather than excluded from the denominator.

Configurable blood-pressure thresholds. The package supports custom SBP/DBP thresholds via clean_steps_data(bp_sbp_threshold, bp_dbp_threshold), needed because some countries (e.g. Mongolia) use thresholds other than the standard 140/90.

Cholesterol unit conversion and threshold alignment. Some country datasets store cholesterol in mg/dl rather than mmol/L. The package auto-detects units and converts to mmol/L. For validation, the raised-cholesterol threshold was set to 4.914 mmol/L (= 190 mg/dl, the WHO standard) rather than the default 5.0 mmol/L, aligning with the WHO fact sheet definition.

Physical activity data quality (P_clean). Following the WHO GPAQ Analysis Guide, respondents with inconsistent physical activity data are excluded from all PA computations. Specifically, if a respondent answers “Yes” to a GPAQ screening question (e.g. “Do you do vigorous work?”) but has missing or invalid follow-up data for that domain, all PA variables are set to NA. Additionally, if any single domain reports more than 960 minutes (16 hours) per day, all PA data for that respondent is invalidated. Without this cleaning, partial domain data would be silently dropped by rowSums(na.rm = TRUE), underestimating total PA and systematically inflating the insufficient_pa indicator.

Data quality filters. The WHO smk_cln filter is applied: logically inconsistent tobacco respondents (e.g. “I currently smoke” combined with “I never smoked in the past”) are excluded from the tobacco denominator.

Tobacco indicator mapping. WHO fact sheets vary in which tobacco indicator they report under “current smoking”: some report daily smoking (Mongolia, Georgia), while others report any-frequency tobacco smoking (Afghanistan, Algeria, Ukraine, Ecuador, Cabo Verde, Bahamas). The package computes all variants and the appropriate one is used for each country comparison.

Column mapping. Country datasets often use non-standard variable names. The package’s read_column_mapping() function and Excel mapping template were used to handle datasets where auto-detection required manual overrides.

Running the validation yourself

The full validation script is included in the package at inst/validation/validate_all.R. To run it, you need the licensed STEPS microdata files placed in a STEPS Licensed Datasets directory alongside the package:

library(stepssurvey)
## Run from the package source directory
source(system.file("validation", "validate_all.R", package = "stepssurvey"))

Per-country results are saved as CSV files in inst/validation/.

Reproducibility notes

The validation was performed with R 4.5.1, using survey package version 4.5, and stepssurvey version 0.1.0.

The licensed STEPS microdata files are not redistributable and are therefore not included in the package. Researchers with access to the WHO STEPS microdata repository can request datasets for the nine countries listed above and reproduce these results independently.

References

  1. World Health Organization. (2017). WHO STEPS Surveillance Manual: The WHO STEPwise Approach to Noncommunicable Disease Risk Factor Surveillance. Geneva: WHO. https://www.who.int/teams/noncommunicable-diseases/surveillance/systems-tools/steps

  2. World Health Organization. WHO NCD Microdata Repository: STEPS Survey Data. https://extranet.who.int/ncdsmicrodata

  3. World Health Organization. STEPS data analysis scripts (R). GitHub repository. https://github.com/WorldHealthOrganization

  4. Republic of Moldova STEPS Survey 2021. WHO NCD Country Fact Sheet. Geneva: WHO, 2022.

  5. Mongolia STEPS Survey 2019. WHO NCD Country Fact Sheet. Geneva: WHO, 2020.

  6. Georgia STEPS Survey 2016. WHO NCD Country Fact Sheet. Geneva: WHO, 2017.

  7. Afghanistan STEPS Survey 2018. WHO NCD Country Fact Sheet. Geneva: WHO, 2019.

  8. Algeria STEPS Survey 2016. WHO NCD Country Fact Sheet. Geneva: WHO, 2017.

  9. Ukraine STEPS Survey 2019. WHO NCD Country Fact Sheet. Geneva: WHO, 2020.

  10. Ecuador STEPS Survey 2018. WHO NCD Country Fact Sheet. Geneva: WHO, 2019.

  11. Cabo Verde STEPS Survey 2020. WHO NCD Country Fact Sheet. Geneva: WHO, 2021.

  12. Bahamas STEPS Survey 2019. WHO NCD Country Fact Sheet. Geneva: WHO, 2020.

  13. Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software, 9(8), 1–19. R package survey version 4.5.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.