Help for package nhanesdata

Title:

Harmonized Access to NHANES Survey Data

Version:

0.2.1

Description:

Instant access to harmonized National Health and Nutrition Examination Survey (NHANES) data spanning 1999-2023. Retrieve pre-processed datasets from reliable cloud storage with automatic type reconciliation and integrated search tools for variables and datasets. Simplifies NHANES data workflows by handling cycle management and maintaining data consistency across survey waves. Data is sourced from https://www.cdc.gov/nchs/nhanes/.

License:

MIT + file LICENSE

URL:

https://github.com/kyleGrealis/nhanesdata, https://www.kylegrealis.com/nhanesdata/

BugReports:

https://github.com/kyleGrealis/nhanesdata/issues

Language:

en-US

Encoding:

UTF-8

RoxygenNote:

7.3.3

Depends:

R (≥ 4.1.0)

Imports:

arrow, dplyr, nhanesA, rlang, scales, stringr, srvyr, tibble

Suggests:

cli, curl, devtools, fs, ggplot2, httptest2, janitor, jsonlite, knitr, paws.storage, purrr, reactable, rmarkdown, roxygen2, testthat (≥ 3.0.0), tools, withr, yaml

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-02-23 19:09:49 UTC; kyle

Author:

Kyle Grealis

[aut, cre], Amrit Baral

[ctb], Natalie Neugaard

[ctb], Raymond Balise

[ctb], Johannes Thrul

[ctb], Janardan Devkota

[ctb]

Maintainer:

Kyle Grealis <kylegrealis@proton.me>

Repository:

CRAN

Date/Publication:

2026-02-28 20:40:02 UTC

nhanesdata: Harmonized Access to NHANES Survey Data

Description

Author(s)

Maintainer: Kyle Grealis kylegrealis@proton.me (ORCID)

Other contributors:

Amrit Baral abaral3@jhu.edu (ORCID) [contributor]
Natalie Neugaard nataliegoulett@gmail.com (ORCID) [contributor]
Raymond Balise balise@miami.edu (ORCID) [contributor]
Johannes Thrul jthrul@jhu.edu (ORCID) [contributor]
Janardan Devkota jdevkot1@jhu.edu (ORCID) [contributor]

Calculate survey design weight within a NHANES dataset

Description

Input an NHANES dataset and apply the proper weight calculation. There are 3 categories of weights:

Interview weight
Mobile Exam Center (MEC) weight
Fasting weight

The probability of being sampled for each type of NHANES category decreases from interview to fasting samples. Therefore, when selecting the proper weight, the practitioner should use the weight with the lowest probability when combining variables across categories. For example, when performing an analysis using demographics (interview), diabetes information (interview), and DEXA scanning (MEC), the associated MEC weight is the proper weight variable to use.

It is also important to select the proper year grouping for the cycle. NHANES cycles for 1999 and 2001 use 4-year sample weights, while all subsequent cycles use 2-year sample weights. This type of combination requires careful attention to:

The variables used, to determine weight category (interview, MEC, fasting).
The cycles (years) used, to select proper year grouping variable.

This function will allow the user to input a dataset, select analysis start & end years, and specify the type of weight category. The resulting survey design will calculate the proper weight and apply that when creating the design object.

NOTE: It is not required to specify variables for this function and it is highly recommended to perform preprocessing of variables before creating a complex design object.

Usage

create_design(
  dsn,
  start_yr,
  end_yr,
  wt_type = c("interview", "mec", "fasting")
)

Arguments

dsn

Tibble or data-frame.

start_yr

Numeric. Lower bound for year filtering (inclusive). Must be an odd year representing a valid NHANES cycle start: 1999, 2001, 2003, ..., 2019, 2021. For example, use 2007 for the 2007-2008 cycle. Data will be filtered to include years between start_yr and end_yr.

end_yr

Numeric. Upper bound for year filtering (inclusive). Must be an odd year >= start_yr. Weight calculations are based on the number of cycles actually present in the filtered data, so it is valid to have gaps (e.g., start_yr=1999, end_yr=2017 with 2007-2010 missing).

wt_type

Character. Category of weight to be used. Use the weight category with the lowest probability of selection, but only if at least one variable from that category is to be used. Accepts full names ("interview", "mec", "fasting") or abbreviations ("int", "mec", "fast").

Details

Weight Calculation for Combined Cycles

NHANES provides 4-year weights for the 1999-2000 and 2001-2002 cycles, while all subsequent cycles provide only 2-year weights. When combining multiple cycles:

If 1999 or 2001 cycles are included: Use the 4-year weight variable multiplied by 2/n where n is the total number of cycles. The numerator is 2 because the 4-year weight represents two 2-year cycles.
For cycles 2003 and beyond: Use the 2-year weight variable multiplied by 1/n.
The denominator n is always the total number of cycles in the analysis.

Example: Combining 4 cycles (1999, 2001, 2003, 2005):

1999 & 2001: wtmec4yr * 2/4
2003 & 2005: wtmec2yr * 1/4

Fasting weights (wtsaf2yr) are used with 1/n multiplication.

NOTE: 4-year fasting weights (wtsaf4yr) exist in NHANES laboratory files for 1999-2002 but are not currently supported by this function.

Fasting Subsample Weights

For fasting subsample analyses combining 1999-2002 cycles, the 4-year fasting weight (WTSAF4YR) exists in laboratory files (e.g., LAB10AM, LAB13AM) but is typically NOT in demographic files obtained via nhanesA. If your dataset includes merged laboratory fasting data from 1999-2002, ensure WTSAF4YR is present. Otherwise, this function assumes only 2-year fasting weights (WTSAF2YR) are available.

Value

A survey design object of class tbl_svy (from srvyr package) containing the calculated design weights and survey design metadata (PSUs, strata). Participants without valid weights for the specified weight type are automatically filtered out before design object creation. Participants with zero weights are retained in the design object but will be automatically excluded from most survey analyses.

Examples


# Load demographics data
demo <- read_nhanes("demo")

# Create design object with interview weights
design <- create_design(
  dsn = demo,
  start_yr = 1999,
  end_yr = 2011,
  wt_type = "interview"
)

# Combine with examination data and use MEC weights
bmx <- read_nhanes("bmx")
combined <- demo |>
  dplyr::left_join(bmx, by = c("seqn", "year"))

design_mec <- create_design(
  dsn = combined,
  start_yr = 2007,
  end_yr = 2017,
  wt_type = "mec"
)

Get CDC Documentation URL for NHANES Table

Description

Constructs and returns the full CDC documentation URL for a given NHANES table. The function handles table names with or without cycle suffixes (e.g., "DEMO_J" for 2017-2018 or "DEMO" for 1999-2000) and automatically maps the suffix to the appropriate survey cycle year.

Usage

get_url(table)

Arguments

table

Character. The table where variable information is needed. Can include cycle suffix (e.g., "DEMO_J") or not (e.g., "DEMO"). Not case-sensitive.

Value

Character string (invisibly). Full URL to CDC data documentation, codebook, and frequencies is returned invisibly and also printed to the console via message() for interactive use.

Examples

# These examples will run and display URLs
get_url("DEMO_J") # Demographics 2017-2018
get_url("diq_j") # Case-insensitive: Diabetes 2017-2018
get_url("DIQ") # No suffix = 1999-2000 cycle

Read NHANES Data from Cloud Storage

Description

Downloads pre-processed NHANES data files from cloud storage. Data includes all survey cycles (1999-2023) automatically merged and harmonized, with quarterly updates.

Usage

read_nhanes(dataset)

Arguments

dataset

Character. NHANES dataset base name (e.g., "trigly", "demo"). Case-insensitive - use 'demo', 'DEMO', or 'Demo' interchangeably. Must be a single string (length 1). Leading/trailing whitespace is automatically trimmed.

Details

This function downloads NHANES datasets from cloud storage (hosted at nhanes.kylegrealis.com). All datasets combine multiple survey cycles with automatic type harmonization. Data is updated quarterly via automated workflows that pull fresh data from CDC servers.

Dataset names are case-insensitive throughout this package. Use uppercase (matches CDC documentation) or lowercase (easier to type) - both work identically.

Error handling: The function validates inputs and provides informative error messages if the dataset fails to load (e.g., network issues, non-existent datasets, misspelled names). Error messages include the attempted URL and suggestions for troubleshooting.

Value

A tibble containing the requested NHANES dataset across all available survey cycles. Always includes year and seqn columns plus dataset-specific variables.

Examples


# All case variations work identically:
trigly <- read_nhanes("trigly") # Lowercase
demo <- read_nhanes("DEMO") # Uppercase
acq <- read_nhanes("Acq") # Mixed case

Search NHANES Variables by Term or Phrase

Description

A convenience wrapper around nhanesA::nhanesSearch that returns a simplified, concise output focused on variable names, descriptions, and survey years. Results are sorted by year (most recent first) and then by variable name.

Usage

term_search(var)

Arguments

var

Character. Search term or phrase to find in variable names or descriptions. Case-insensitive. Special regex characters are automatically escaped for literal matching.

Value

A data.frame with 4 columns:

Variable.Name: NHANES variable code
Variable.Description: Description of the variable
Data.File.Name: Name of the data file containing the variable
Begin.Year: Starting year of the survey cycle (numeric)

Results are sorted by Begin.Year (descending) then Variable.Name. Returns an empty data.frame with correct structure if no matches found.

Examples


# Search for diabetes-related variables (showing first 5 results)
term_search("diabetes") |> head(5)

# Search for blood pressure measurements (showing first 5 results)
term_search("blood pressure") |> head(5)

Internal Utility Functions

Description

This file contains internal helper functions used across the nhanesdata package. These functions are not exported and are meant for internal package use only.

Search for NHANES Variable by Exact Name

Description

A convenience wrapper around nhanesA::nhanesSearchVarName that searches for variables by exact variable name match. The function automatically converts input to uppercase to match NHANES naming conventions. Use this when you know the variable code; use term_search() for text-based searches.

Usage

var_search(var)

Arguments

var

Character. Variable name to search for. Will be automatically converted to uppercase. Not case-sensitive.

Value

A character vector of CDC table names containing the variable (e.g., "DEMO", "DEMO_B", "DEMO_C"). Returns character(0) if the variable is not found.

Examples


# Search for specific variable (case-insensitive)
var_search("RIDAGEYR") # Age variable across all DEMO cycles
var_search("BPXSY1") # Systolic blood pressure

nhanesdata: Harmonized Access to NHANES Survey Data

Description

Author(s)

See Also

Calculate survey design weight within a NHANES dataset

Description

Usage

Arguments

Details

Value

Examples

Get CDC Documentation URL for NHANES Table

Description

Usage

Arguments

Value

See Also

Examples

Read NHANES Data from Cloud Storage

Description

Usage

Arguments

Details

Value

Examples

Search NHANES Variables by Term or Phrase

Description

Usage

Arguments

Value

See Also

Examples

Internal Utility Functions

Description

Search for NHANES Variable by Exact Name

Description

Usage

Arguments

Value

See Also

Examples