nhanesA was developed to enable fully customizable retrieval of data from the National Health and Nutrition Examination Survey (NHANES). The survey is conducted by the National Center for Health Statistics (NCHS), and data are publicly available at: http://www.cdc.gov/nchs/nhanes.htm. Approximately 5,000 people are surveyed annually, and the results are grouped in two-year intervals. NHANES data are reported in well over one thousand peer-reviewed journal publications every year.
Since 1999, the NHANES survey has been conducted continuously, and the surveys during that period are referred to as “continous NHANES” to distinguish from several prior surveys. Continuous NHANES surveys are grouped in two-year intervals, with the first interval being 1999-2000.
Most NHANES data are in the form of tables in SAS ‘XPT’ format. The survey is grouped into five data categories that are publicly available, as well as an additional category (Limited access data) that requires written justification and prior approval before access. Package nhanesA is intended mostly for use with the publicly available data, but some information pertaining to the limited access data can also be retrieved.
The five publicly available data categories are: - Demographics (DEMO) - Dietary (DIET) - Examination (EXAM) - Laboratory (LAB) - Questionnaire (Q) The abbreviated forms in parentheses may be substituted for the long form in nhanesA commands.
To quickly get familiar with NHANES data, it is helpful to display a listing of tables. Use nhanesTables to get information on tables that are available for a given category for a given year.
suppressWarnings(library(nhanesA))
nhanesTables('EXAM', 2005)
## FileName Description
## 1 AUX_D Audiometry
## 2 AUXAR_D Audiometry - Acoustic Reflex
## 3 AUXTYM_D Audiometry - Tympanometry
## 4 BPX_D Blood Pressure
## 5 BMX_D Body Measures
## 6 DXXAG_D Dual Energy X-ray Absorptiometry - Android/Gynoid
## 7 DXXFEM_D Dual Energy X-ray Absorptiometry - Femur
## 8 DXXSPN_D Dual Energy X-ray Absorptiometry - Spine
## 9 OPXFDT_D Ophthalmology - Frequency Doubling Technology
## 10 OPXRET_D Ophthalmology - Retinal Imaging
## 11 OHX_D Oral Health
## 12 PAXRAW_D Physical Activity Monitor
## 13 VIX_D Vision
Note that the survey is grouped in two-year intervals beginning with the odd year. For convenience, only a single 4-digit year is entered such that nhanesTables('EXAM', 2005)
and nhanesTables('EXAM', 2006)
yield identical output. In the preceding example we see that the Examination data for the 2005-2006 survey consists of thirteen tables.
After viewing the output, we decide we are interested in table ‘BMX_D’ that contains body measures data. To better determine if that table is of interest, we can display detailed information on the table contents using nhanesTableVars.
nhanesTableVars('EXAM', 'BMX_D')
## Variable.Name Variable.Description
## 1 SEQN Respondent sequence number.
## 2 BMDSTATS Body Measures Component status Code
## 3 BMXWT Weight (kg)
## 4 BMIWT Weight Comment
## 5 BMXRECUM Recumbent Length (cm)
## 6 BMIRECUM Recumbent Length Comment
## 7 BMXHEAD Head Circumference (cm)
## 8 BMIHEAD Head Circumference Comment
## 9 BMXHT Standing Height (cm)
## 10 BMIHT Standing Height Comment
## 11 BMXBMI Body Mass Index (kg/m**2)
## 12 BMXLEG Upper Leg Length (cm)
## 13 BMILEG Upper Leg Length Comment
## 14 BMXCALF Maximal Calf Circumference (cm)
## 15 BMICALF Maximal Calf Comment
## 16 BMXARML Upper Arm Length (cm)
## 17 BMIARML Upper Arm Length Comment
## 18 BMXARMC Arm Circumference (cm)
## 19 BMIARMC Arm Circumference Comment
## 20 BMXWAIST Waist Circumference (cm)
## 21 BMIWAIST Waist Circumference Comment
## 22 BMXTHICR Thigh Circumference (cm)
## 23 BMITHICR Thigh Circumference Comment
## 24 BMXTRI Triceps Skinfold (mm)
## 25 BMITRI Triceps Skinfold Comment
## 26 BMXSUB Subscapular Skinfold (mm)
## 27 BMISUB Subscapular Skinfold Comment
We see that there are 27 columns in table BMX_D. The first column (SEQN) is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables. We now import BMX_D along with the demographics table DEMO_D.
bmx_d <- nhanes('BMX_D')
## Processing SAS dataset BMX_D ..
demo_d <- nhanes('DEMO_D')
## Processing SAS dataset DEMO_D ..
We then merge the tables and compute average values by gender for several variables:
bmx_demo <- merge(demo_d, bmx_d)
aggregate(cbind(bmxht,bmxwt, bmxleg, bmxcalf, bmxthicr)~riagendr, bmx_demo, mean)
## riagendr bmxht bmxwt bmxleg bmxcalf bmxthicr
## 1 1 170.0105 76.90659 40.49696 37.47678 51.45514
## 2 2 158.9209 68.17988 37.19337 36.88751 51.09288
NHANES uses coded values for many fields. In the preceding example, gender is coded as 1 or 2. To determine what the values mean, we can list the code translations for the gender field RIAGENDR in table DEMO_D
nhanesTranslate('DEMO_D', 'RIAGENDR')
## $RIAGENDR
## Code.or.Value Value.Description
## 1 1 Male
## 2 2 Female
## 3 . Missing
If desired, we can use nhanesTranslate to apply the code translation to demo_d directly by assigning data=demo_d.
levels(as.factor(demo_d$riagendr))
## [1] "1" "2"
demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
## Translated columns: RIAGENDR
levels(demo_d$riagendr)
## [1] "Male" "Female"
bmx_demo <- merge(demo_d, bmx_d)
aggregate(cbind(bmxht,bmxwt, bmxleg, bmxcalf, bmxthicr)~riagendr, bmx_demo, mean)
## riagendr bmxht bmxwt bmxleg bmxcalf bmxthicr
## 1 Male 170.0105 76.90659 40.49696 37.47678 51.45514
## 2 Female 158.9209 68.17988 37.19337 36.88751 51.09288
The primary goal of nhanesA is to enable fully customizable processing of select NHANES tables. However, it is quite easy to download entire surveys using nhanesA functions. Say we want to download every questionnaire in the 2007-2008 survey. We first get a list of the table names by using nhanesTables with namesonly = TRUE. The tables can then be downloaded using nhanes with lapply.
q2007names <- nhanesTables('Q', 2007, namesonly=TRUE)
q2007tables <- lapply(q2007names, nhanes)
names(q2007tables) <- q2007names
An NHANES table may have dozens of columns with coded values. Translating all possible columns is a three step process. 1: Download the table 2: Download the list of table variables using nhanesTableVars with namesonly=TRUE 3: Pass the table and variable list to nhanesTranslate
bpx_d <- nhanes('BPX_D')
## Processing SAS dataset BPX_D ..
head(bpx_d[,6:11])
## bpq150a bpq150b bpq150c bpq150d bpaarm bpacsz
## 1 NA NA NA NA NA NA
## 2 2 2 2 2 1 3
## 3 1 2 2 2 1 4
## 4 2 2 2 2 1 3
## 5 2 2 2 2 1 4
## 6 2 2 2 2 1 4
bpx_d_vars <- nhanesTableVars('EXAM', 'BPX_D', namesonly=TRUE)
bpx_d <- suppressWarnings(nhanesTranslate('BPX_D', bpx_d_vars, data=bpx_d))
## Translated columns: PEASCST1 PEASCCT1 BPQ150A BPQ150B BPQ150C BPQ150D BPAARM BPACSZ BPXPULS BPXPTY BPAEN2 BPAEN3 BPAEN4
head(bpx_d[,6:11])
## bpq150a bpq150b bpq150c bpq150d bpaarm bpacsz
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 No No No No Right Adult (12X22)
## 3 Yes No No No Right Large (15X32)
## 4 No No No No Right Adult (12X22)
## 5 No No No No Right Large (15X32)
## 6 No No No No Right Large (15X32)
Some discretion is applied by nhanesTranslate such that not all of the coded columns will be translated. In general, columns that have at least two categories (e.g. Male, Female) will be translated. In some cases the code translations are quite long, thus to improve readability the maximum translation string should be limited. The default translation string length is 32 but can be set as high as 128.
Sincerely,
Christopher Endres