Package {weightflow}


Title: Declarative API for Staged Survey Weights
Version: 0.1.0
Description: Builds survey weights from design base weights by chaining hierarchical adjustments (unknown eligibility, nonresponse and calibration) through a declarative, pipeable, 'tidymodels'-style API. Calibration follows Deville and Sarndal (1992) <doi:10.2307/2290268>. Variances are obtained with a bootstrap that resamples primary sampling units and re-applies the whole recipe on each replicate, following the rescaling bootstrap of Rao and Wu (1988) <doi:10.1080/01621459.1988.10478591>, so the replicate weights carry the variability of every adjustment. The weights also bridge to the 'survey' and 'srvyr' packages for design-based inference.
License: MIT + file LICENSE
Encoding: UTF-8
Language: en-US
Depends: R (≥ 4.1.0)
Imports: stats, utils, graphics
Suggests: rpart, ranger, testthat (≥ 3.0.0), survey, srvyr, dplyr, knitr, rmarkdown, spelling
Config/roxygen2/version: 8.0.0
URL: https://github.com/jpferreira33/weightflow, https://jpferreira33.github.io/weightflow/
BugReports: https://github.com/jpferreira33/weightflow/issues
Config/testthat/edition: 3
LazyData: true
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-06-24 12:07:46 UTC; jp
Author: Juan Pablo Ferreira [aut, cre]
Maintainer: Juan Pablo Ferreira <juanpablo.ferreira@fcea.edu.uy>
Repository: CRAN
Date/Publication: 2026-06-30 11:30:02 UTC

weightflow: declarative survey weighting

Description

Build survey weights from design base weights by chaining hierarchical adjustments (unknown eligibility, nonresponse, trimming, calibration, rounding, rescaling, assertions) through a declarative, pipeable, tidymodels-style API. Computes weights only; for variance/inference, export the weights and use them with the 'survey' package.

Details

Start with weighting_spec(), add ⁠step_*()⁠ adjustments, estimate the cascade with prep(), and extract the weights with collect_weights(). Inspect with summary(), plot() and report_weighting().

Author(s)

Maintainer: Juan Pablo Ferreira juanpablo.ferreira@fcea.edu.uy

Authors:

See Also

Useful links:


Export weightflow weights to a survey design

Description

as_svydesign() builds a linearization (ultimate-cluster) design from a prepped recipe; as_svrepdesign() builds a replicate-weights design from a bootstrap object, so survey/srvyr standard errors include the recipe's adjustments. Both require the 'survey' package.

Usage

as_svydesign(object, ids, strata = NULL, weight_name = ".weight", ...)

as_svrepdesign(boot, ...)

Arguments

object

a prepped recipe (for as_svydesign) or a data frame with the weight and design columns.

ids, strata

column names of the PSU and the stratum.

weight_name

name of the weight column.

...

passed to the survey constructor.

boot

a weightflow_boot object.

Value

A survey.design / svyrep.design object.


Bootstrap estimate, standard error and confidence interval

Description

Applies a statistic to the point weights and to every replicate, and summarises it with the bootstrap variance (1/B)\sum(\theta^*_b - \hat\theta)^2.

Usage

bootstrap_estimate(boot, statistic, level = 0.95)

boot_total(boot, variable)

boot_mean(boot, variable)

Arguments

boot

a weightflow_boot object.

statistic

a function ⁠function(w, data)⁠ returning a numeric scalar (or vector) given a weight vector and the data.

level

confidence level for the (normal) interval.

variable

name of the variable to estimate.

Value

A data frame with estimate, se, ci_lower, ci_upper.


Bootstrap replicate weights that re-apply the recipe

Description

Builds bootstrap replicate weights by resampling primary sampling units (PSUs) with replacement within strata and re-running the whole recipe on each replicate. Because every adjustment (nonresponse, calibration, ...) is recomputed per replicate, the resulting replicate weights propagate the variability introduced by each weighting stage.

Usage

bootstrap_weights(
  object,
  replicates = 200L,
  strata = NULL,
  psu = NULL,
  m = NULL,
  seed = NULL,
  progress = TRUE
)

Arguments

object

a weighting_spec (or a prepped one) holding the recipe.

replicates

number of bootstrap replicates.

strata, psu

column names of the stratum and the PSU. If psu is NULL each unit is its own PSU; if strata is NULL a single stratum is assumed.

m

PSUs drawn per stratum (default n - 1).

seed

optional RNG seed.

progress

print progress every 25 replicates.

Details

The multiplier is the Rao-Wu rescaling bootstrap: within a stratum with n PSUs, m PSUs are drawn with replacement (default m = n - 1) and unit i in PSU k gets \lambda = 1 - \sqrt{m/(n-1)} + \sqrt{m/(n-1)}\,(n/m)\,t_k, with t_k the number of times its PSU was drawn.

Value

An object of class weightflow_boot with the replicates matrix (units x replicates), the point weights, and the design metadata.

Examples

spec <- weighting_spec(sample_survey, base_weights = pw) |>
  step_calibrate(method = "raking",
                 margins = list(region = c(table(population$region))))
boot <- bootstrap_weights(spec, replicates = 50, strata = "region",
                          psu = "psu", seed = 1)
boot_total(boot, "responded")

Collect replicate weights into a data frame ready for srvyr

Description

Returns the data with the point weight and the bootstrap replicate weights as columns, so it can be fed directly to srvyr::as_survey_rep() (or survey::svrepdesign()). Replicate columns are full weights, so use combined.weights = TRUE, scale = 1 / R, rscales = 1, mse = TRUE.

Usage

collect_replicate_weights(
  boot,
  weight_name = ".weight",
  prefix = "rep_",
  drop_zero = TRUE
)

Arguments

boot

a weightflow_boot object.

weight_name

name of the point-weight column to add.

prefix

prefix for the replicate-weight columns (rep_1, rep_2, ...).

drop_zero

keep only active units (point weight > 0).

Value

A data frame: the original columns, weight_name, and one column per replicate. The number of replicates is stored in attribute "R".

Examples

spec <- weighting_spec(sample_survey, base_weights = pw) |>
  step_calibrate(method = "raking",
                 margins = list(region = c(table(population$region))))
boot <- bootstrap_weights(spec, replicates = 30, strata = "region",
                          psu = "psu", seed = 1, progress = FALSE)
df <- collect_replicate_weights(boot)
if (requireNamespace("srvyr", quietly = TRUE) &&
    requireNamespace("dplyr", quietly = TRUE)) {
  srvyr::as_survey_rep(df, weights = .weight,
                       repweights = dplyr::starts_with("rep_"),
                       type = "bootstrap", combined.weights = TRUE,
                       scale = 1 / attr(df, "R"), rscales = 1, mse = TRUE)
}

Extract the data with the computed weights

Description

Extract the data with the computed weights

Usage

collect_weights(
  object,
  drop_zero = TRUE,
  keep_intermediate = FALSE,
  weight_name = ".weight"
)

Arguments

object

a prepped object (output of prep()).

drop_zero

logical. If TRUE, drops rows with final weight 0 (ineligible / nonresponse). Default TRUE.

keep_intermediate

logical. If TRUE, adds one column per stage.

weight_name

name of the final weight column. Default ".weight".

Value

data.frame.

Examples

fitted <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  prep()
head(collect_weights(fitted))

Kish design effect from unequal weighting

Description

deff = 1 + CV^2(w) = m * sum(w^2) / (sum(w))^2, over the active weights. The effective sample size is n_eff = m / deff.

Usage

design_effect(w)

Arguments

w

vector of weights (zeros are dropped).

Value

list with deff, n_eff, cv and n.

Examples

design_effect(sample_survey$pw)

Diagnostic plots for the weights

Description

Diagnostic plots for the weights

Usage

## S3 method for class 'prepped_weighting_spec'
plot(x, type = c("all", "factors", "summary"), ...)

Arguments

x

a prepped object (output of prep()).

type

"all" (default): per-step adjustment-factor histograms PLUS the summary panel (final weights, cumulative factor, base vs final, deff by stage), all in one grid. "factors": only the per-step factor histograms. "summary": only the summary panel.

...

ignored.

Value

(invisibly) x. Called for its side effect of drawing the diagnostic plots.

Examples

fitted <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  prep()
plot(fitted)

Example target population for weightflow

Description

A simulated population (sampling frame) of individuals nested in households and primary sampling units (PSUs) within strata (regions), with demographic auxiliaries and two outcomes. Used to illustrate calibration targets and model calibration, and to validate weighted estimates. Generated by data-raw/weightflow_data.R.

Usage

population

Format

A data frame with one row per person:

person_id

individual identifier

household_id

household identifier (cluster)

psu

primary sampling unit (segment) within the stratum

region

stratum: North, South, East or West

sex

F or M

age

age in years (18-95)

income

annual income

employed

employment indicator (0/1)


Estimate the weighting cascade

Description

Walks the steps in the order they were added, starting from the base weights. Each step multiplies the current weight by its adjustment factor.

Usage

prep(spec)

Arguments

spec

a weighting_spec.

Value

a "prepped_weighting_spec" object.

Examples

rec <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region")
prep(rec)

Build a nice HTML report of the weighting recipe

Description

Writes a self-contained HTML file (no dependencies, no server) showing the pipeline, the parameters requested at each step, the per-stage summary (n, sum, CV, Kish deff, effective n) and per-step diagnostics, and opens it in the browser.

Usage

report_weighting(object, file = NULL, open = TRUE, plots = TRUE)

Arguments

object

a prepped object (output of prep()).

file

output path; if NULL, a temporary .html file.

open

logical; open the file in the browser.

plots

logical; add per-step plots (weight before-vs-after scatter and adjustment-factor histogram). Uses ggplot2 if installed, else base graphics.

Value

(invisibly) the path to the written HTML file.

Examples

fitted <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  prep()
f <- tempfile(fileext = ".html")
report_weighting(fitted, file = f, open = FALSE)

Example survey sample (select-one-person, multistage)

Description

A realistic multistage design (stratum -> PSU -> household, then one selected person per household). Unknown-eligibility and ineligible addresses appear as single rows with no roster; resolved eligible households are either reached (a roster is obtained) or are household nonresponse; in reached households one person is selected with an unequal within-household probability and may or may not respond. Supports the full household pipeline: household-level eligibility (cluster), dropping ineligibles, household and person nonresponse, and step_select_within. Generated by data-raw/weightflow_data.R.

Usage

sample_one

Format

A data frame with one row per sampled household (the selected person, or a single placeholder row for non-roster cases):

person_id, household_id, psu

identifiers

region

stratum

sex, age

selected person's attributes (NA on non-roster rows)

pw

design base weight (product of the stage selection probabilities)

status

"eligible", "ineligible" or "unknown"

unknown_elig

1 if eligibility is unknown (no roster)

ineligible

1 if the address is out of scope (no roster)

hh_responded

1 reached, 0 household nonresponse, NA for non-eligible

responded

1 if the selected person responded (NA on non-roster rows)

n_elig

number of eligible persons in the household (NA on non-roster rows)

p_within

within-household selection probability of the selected person

income, employed

survey outcomes; NA unless the person responded


Example survey sample (take-all roster)

Description

A stratified household sample drawn from population where every eligible person in the household is kept (take-all roster). Carries unequal design base weights, an unknown-eligibility flag and a person-level response indicator; survey outcomes (income, employed) are observed only for respondents. Generated by data-raw/weightflow_data.R.

Usage

sample_survey

Format

A data frame with one row per sampled person:

person_id, household_id, psu

identifiers

region, sex, age

frame auxiliaries, known for all units

pw

design base weight (inverse sampling fraction)

unknown_elig

1 if eligibility is unknown

responded

1 if the person responded

income, employed

survey outcomes; NA for nonrespondents


Assert conditions on the weights at this point of the cascade

Description

A checkpoint that does NOT change the weights; it verifies conditions and fails (error) or warns if they are not met. Useful to guard a production pipeline (tidymodels-style tests inside the recipe).

Usage

step_assert(
  spec,
  max_deff = NULL,
  max_weight_ratio = NULL,
  min_n_eff = NULL,
  on_fail = c("error", "warning")
)

Arguments

spec

a weighting_spec.

max_deff

numeric or NULL. Maximum acceptable Kish design effect.

max_weight_ratio

numeric or NULL. Maximum allowed final/base weight ratio (per active unit).

min_n_eff

numeric or NULL. Minimum acceptable effective sample size.

on_fail

"error" (stop the cascade) or "warning".

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_assert(max_deff = 5, on_fail = "warning") |> prep()

Calibration to population totals

Description

Calibration to population totals

Usage

step_calibrate(
  spec,
  margins = NULL,
  method = c("raking", "poststratify", "linear"),
  formula = NULL,
  totals = NULL,
  cluster = NULL,
  equal_within_cluster = FALSE,
  calfun = c("linear", "logit"),
  bounds = NULL,
  maxit = 50L,
  tol = 1e-06
)

Arguments

spec

a weighting_spec.

margins

named list (for "raking"/"poststratify"). Each element is a named numeric vector with the target totals per category. E.g.: list(sex = c(M = 5000, F = 5200), region = c(N = 3000, S = 7200)).

method

"raking" (IPF, categorical margins), "poststratify" (a single categorical variable) or "linear" (GREG / regression estimator; handles continuous and categorical auxiliaries together).

formula

(only "linear") auxiliary formula, e.g. ~ sex + income. Uses model.matrix; includes the intercept unless you write ~ 0 + ...

totals

(only "linear") named numeric vector with the population totals, names matching the model.matrix columns (including "(Intercept)" = N if there is an intercept). If names do not match, the error lists the expected ones.

cluster

(only "linear") name of the cluster id column (e.g. "household"), for equal weights within the cluster.

equal_within_cluster

(only "linear") logical. If TRUE, Lemaitre-Dufour (1987) integrative calibration: a single weight per cluster. Requires cluster. Final weights are equal within the cluster provided the incoming weight is also uniform within the cluster.

calfun

(only "linear") distance function: "linear" (g = 1 + u) or "logit" (bounded by construction). With "logit", bounds is required.

bounds

(only "linear") numeric c(L, U) with L < 1 < U. Bounds on the calibration factor g (g-weights). With "linear" it truncates; with "logit" it is enforced smoothly. Avoids extreme/negative weights without a separate trimming step.

maxit, tol

convergence control for raking and bounded calibration.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

# Raking to population margins
weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  step_calibrate(method = "raking",
                 margins = list(sex    = c(table(population$sex)),
                                region = c(table(population$region)))) |>
  prep()

Drop ineligible (out-of-scope) units

Description

Sets the weight of known-ineligible units to zero so they leave the cascade (excluded from every later step and from collect_weights). No redistribution is done.

Usage

step_drop_ineligible(spec, ineligible)

Arguments

spec

a weighting_spec.

ineligible

a 0/1 dummy column (1 = ineligible) or any logical condition (unquoted) that is TRUE for out-of-scope units.

Details

Apply it AFTER step_unknown_eligibility: ineligibles must be present and NOT flagged as unknown during that step, so they take part in the known-eligibility group and receive their share of the redistributed unknown weight. Their weight is then correctly discarded here (it represents the ineligible share of the unknown units, which are out of scope).

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

df <- transform(sample_survey,
                ineligible = as.integer(region == "West" & age > 90))
weighting_spec(df, base_weights = pw) |>
  step_drop_ineligible(ineligible = ineligible) |>
  prep()

Model calibration (model-assisted, Wu & Sitter 2001)

Description

Fits a working model for each study variable y, predicts over the population, and calibrates the weights so that the sample total of each prediction equals its population total (model-assisted efficiency). It also calibrates to the X totals (consistency with the auxiliary controls).

Usage

step_model_calibration(
  spec,
  x_formula,
  models,
  population,
  cluster = NULL,
  equal_within_cluster = FALSE
)

Arguments

spec

a weighting_spec.

x_formula

formula of the consistency auxiliaries, e.g. ~ sex + region.

models

named list of models created with y_model(). The names label the prediction constraints.

population

population data.frame with the auxiliary and predictor columns (the y variables are not needed; they are predicted).

cluster

name of the cluster id column (e.g. "household"), for equal weights within the cluster.

equal_within_cluster

logical. If TRUE, integrative calibration: a single weight per cluster. Requires cluster and that the incoming weight be uniform within the cluster.

Details

Requires COMPLETE auxiliary information: a data.frame population with the x_formula columns and the model predictors for the whole population (or a reference frame/census).

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  step_model_calibration(
    x_formula  = ~ sex + region,
    models     = list(income = y_model(income ~ age + sex, engine = "glm")),
    population = population) |>
  prep()

Nonresponse adjustment

Description

Nonresponse adjustment

Usage

step_nonresponse(
  spec,
  respondent,
  method = c("weighting_class", "propensity"),
  by = NULL,
  formula = NULL,
  engine = c("logit", "tree", "forest"),
  num_classes = 5L,
  cluster = NULL
)

Arguments

spec

a weighting_spec.

respondent

a 0/1 dummy column (1 = responded) or any logical condition (unquoted) TRUE for respondents. Eligible cases that are not respondents are treated as nonresponse.

method

"weighting_class" (cells) or "propensity" (predictive model).

by

character. Adjustment cells for method = "weighting_class".

formula

predictor formula (right-hand side only), e.g. ~ age + region, used when method = "propensity".

engine

engine to estimate the propensity when method = "propensity": "logit" (logistic regression, base R), "tree" (CART via package 'rpart') or "forest" (random forest via package 'ranger'). 'rpart' and 'ranger' are optional: only needed if you pick that engine.

num_classes

integer or NULL. Controls how propensities are used: an integer forms that many propensity classes (cell adjustment within each class); NULL applies the direct factor 1/p to each unit.

cluster

character or NULL. If given, the adjustment is done at the cluster (e.g. household) level for whole-household nonresponse: each household counts once with its (uniform) weight; in "weighting_class" the redistribution is between responding and nonresponding households within the cells, and in "propensity" the model is fitted with one row per household (household auxiliaries), predicting the household response. The resulting factor is assigned to every member; nonresponding households go to zero. As always, only active units (weight > 0) take part, so units already dropped (unknown eligibility, ineligible) are excluded automatically.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class",
                   by = "region")

# household-level nonresponse (whole household responds or not)
weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class",
                   by = "region", cluster = "household_id") |>
  prep()

Rescale (normalize) the weights

Description

Rescale (normalize) the weights

Usage

step_rescale(spec, to = c("n", "total"), total = NULL, by = NULL)

Arguments

spec

a weighting_spec.

to

"n" (weights sum to the number of active units, i.e. mean weight 1) or "total" (weights sum to total).

total

numeric. Target sum when to = "total".

by

character. Rescale within these groups (optional). With to = "n", each group sums to its own active count.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_rescale(to = "n") |> prep()

Round the final weights

Description

Optional step, typically the last one (after calibration). Simple rounding ("nearest") slightly breaks the calibrated totals; "preserve_total" uses the largest-remainder method to keep the exact total.

Usage

step_round(spec, digits = 0L, method = c("nearest", "preserve_total"))

Arguments

spec

a weighting_spec.

digits

integer. Decimals to keep (0 = integers).

method

"nearest" (simple rounding) or "preserve_total" (keeps the sum of weights). Note: "preserve_total" can break equality of weights within a cluster; if you need integer and equal weights per household, use "nearest".

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_round(digits = 0) |> prep()

Within-household selection adjustment

Description

When one (or a subsample) of the eligible persons is selected within each household, the selected person represents all eligible persons, so the weight is multiplied by the inverse of the within-household selection probability. Apply it after the (household-level) eligibility adjustment and before the nonresponse adjustment.

Usage

step_select_within(spec, prob = NULL, n_eligible = NULL)

Arguments

spec

a weighting_spec.

prob

unquoted column with the within-household selection probability of the selected person (need not be 1/n_eligible). The weight is multiplied by 1/prob.

n_eligible

unquoted column with the number of eligible persons in the household, for simple random selection of one person. The weight is multiplied by n_eligible (equivalent to prob = 1/n_eligible).

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

# simple random selection of one eligible person per household
df <- transform(sample_survey,
                n_elig = ave(person_id, household_id, FUN = length))
weighting_spec(df, base_weights = pw) |>
  step_select_within(n_eligible = n_elig)

Trim extreme weights

Description

Caps weights above a limit and, optionally, redistributes the excess among the others to preserve the weighted total (Potter 1988, 1990; Liu et al. 2004). Optional step that can be inserted anywhere in the recipe, even several times. Operates on the CURRENT weights at that point of the cascade.

Usage

step_trim(
  spec,
  max_ratio,
  min_ratio = NULL,
  reference = c("base", "median", "value"),
  redistribute = TRUE,
  by = NULL,
  maxit = 50L
)

Arguments

spec

a weighting_spec.

max_ratio

number. Upper cap. Its meaning depends on reference. E.g. with reference = "base" and max_ratio = 4, no weight may exceed 4 times its design weight.

min_ratio

number or NULL. Lower floor (same units as max_ratio).

reference

"base" (multiple of each unit's base weight), "median" (multiple of the median of current weights) or "value" (absolute weight value).

redistribute

logical. If TRUE, redistributes the trimmed excess among the uncapped weights to preserve the total (iterating). If you calibrate afterwards you can use FALSE: calibration restores the totals.

by

character. Groups within which to redistribute (optional).

maxit

integer. Maximum cap+redistribution iterations.

Details

There is no standard threshold: max_ratio is an analyst decision, a bias-variance trade-off. Use Kish's design effect (see summary) to judge whether trimming is worth it.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_trim(max_ratio = 3, reference = "base")

Automatic weight trimming (survey-style)

Description

Caps weights into ⁠[lower, upper]⁠ and redistributes the change among the untrimmed units to preserve the total, mirroring survey::trimWeights(). By default no weight may fall below 1, and the upper cap is set by an automatic empirical rule (Tukey far-out fence: Q3 + 3*IQR).

Usage

step_trim_weights(spec, lower = 1, upper = NULL, strict = TRUE, maxit = 50L)

Arguments

spec

a weighting_spec.

lower

numeric. Lower floor (default 1: no weight below 1).

upper

numeric or NULL. Upper cap. If NULL, automatic rule Q3 + 3*IQR of the active weights.

strict

logical. If TRUE (default), iterate cap+redistribution until no weight is outside ⁠[lower, upper]⁠ (like survey's strict = TRUE). If FALSE, a single pass (redistribution may push some weights slightly past the cap).

maxit

integer. Maximum iterations when strict = TRUE.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  step_trim_weights(lower = 1, strict = TRUE) |> prep()

Unknown-eligibility adjustment

Description

Redistributes the weight of unknown-eligibility cases among the known-eligibility cases, within the cells defined by by.

Usage

step_unknown_eligibility(spec, unknown, by = NULL, cluster = NULL)

Arguments

spec

a weighting_spec.

unknown

a 0/1 dummy column (1 = eligibility unknown) or any logical condition (unquoted) that is TRUE for unknown-eligibility cases. Evaluated on the data.

by

character. Variables defining the adjustment cells (optional).

cluster

character. Cluster (e.g. household) id column. If given, the redistribution is done at the cluster level: each cluster counts once with its (uniform) weight, the weight of unknown-eligibility clusters is redistributed among the known ones, and the adjusted weight is assigned to every member. Use this when unknown-eligibility units have no roster (one row per address) while resolved units are expanded by person.

Value

The input weighting_spec with this step appended (for piping). The step is evaluated later by prep().

Examples

weighting_spec(sample_survey, base_weights = pw) |>
  step_unknown_eligibility(unknown = unknown_elig, by = "region")

# household-level redistribution (unknown units without roster)
weighting_spec(sample_survey, base_weights = pw) |>
  step_unknown_eligibility(unknown = unknown_elig, by = "region",
                           cluster = "household_id")

Detailed per-step diagnostics

Description

Detailed per-step diagnostics

Usage

## S3 method for class 'prepped_weighting_spec'
summary(object, ...)

Arguments

object

a prepped object (output of prep()).

...

ignored.

Value

(invisibly) the prepped object.

Examples

fitted <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  prep()
summary(fitted)

Per-unit adjustment factors table

Description

Returns a data.frame with the weight at each stage and the factor of each step (stage weight / previous-stage weight), handy for custom plots.

Usage

weight_factors(object)

Arguments

object

a prepped object (output of prep()).

Value

data.frame with one weight column per stage and one factor per step.

Examples

fitted <- weighting_spec(sample_survey, base_weights = pw) |>
  step_nonresponse(respondent = responded, method = "weighting_class", by = "region") |>
  prep()
head(weight_factors(fitted))

Start a weighting specification

Description

Creates an inert recipe object. Nothing is computed until prep() is called.

Usage

weighting_spec(data, base_weights)

Arguments

data

data.frame with the sample units (one row per case).

base_weights

unquoted name of the design base-weight column.

Value

an object of class "weighting_spec".

Examples

rec <- weighting_spec(sample_survey, base_weights = pw)
rec

Specify a working model for a study variable y

Description

Specify a working model for a study variable y

Usage

y_model(formula, engine = c("glm", "tree", "forest"), family = NULL)

Arguments

formula

full formula, e.g. income ~ sex + age_g.

engine

"glm", "tree" (rpart) or "forest" (ranger).

family

for engine = "glm": "gaussian", "binomial" or "poisson". For tree/forest, regression vs classification is inferred from y.

Value

a model specification list.

Examples

y_model(income ~ age + sex, engine = "glm")