Overview

tabbitR automates the production of large sets of weighted crosstabulation tables and exports them directly to ‘Excel’.
It is designed for situations where analysts need many tables at once, such as:

multiple outcome variables
multiple explanatory (breakdown) variables
weighted percentages
unweighted counts
clear and transparent reporting of missing values.

This is a routine but time-consuming task in survey research, monitoring and evaluation, and exploratory data analysis. Doing it manually is repetitive, error-prone, and difficult to keep consistent.

tabbitR::tabbit_excel() solves this by producing, for each
outcome x breakdown variable pair:

a weighted percentage table
a matching unweighted N table
a clearly labelled summary of missing responses
light formatting via openxlsx
one or multiple Excel sheets depending on user preference.

The goal is to reduce manual work, enforce consistency, and accelerate the early stages of analysis.

tabbitR use cases

Survey teams, academic researchers, and data analysts often need to deliver hundreds or thousands of tables - for example:

one table per survey measure
for each demographic variable
across multiple countries or survey waves.

Manual workflows struggle with:

maintaining consistent formatting
ensuring missing values are handled transparently
avoiding accidental errors in weights or denominators
producing both weighted and unweighted summaries for each outcome variable
generating reproducible outputs.

tabbitR automates all of this in one reproducible command.

Key features

Weighted percentages

Percentages are computed using user-supplied weights:

column percentages by default
row percentages if row_pct = TRUE

Percentage formatting respects the decimals = argument.

Unweighted counts

Alongside the percentage table, tabbit_excel() writes:

an unweighted N table
including or excluding missing values (depending on user options).

Missing values: clear and explicit

tabbitR does not silently drop missing responses.

Users may choose:

default: missing outcomes excluded from percentage rows, but
summarised in a “Missing %” line
missingasrow = TRUE: missing values appear as "Response missing"
as a full row in both the percentage and N tables
nomissing = TRUE: no missing summary is shown.

Flexible layout

one sheet per breakdown variable (default)
or all tables in a single sheet
variable labels included when available (from e.g. haven-labelled data)
‘Excel’ formatting: bold headers, borders, readable layout.

Designed for large projects

tabbitR is especially useful when producing:

formatted frequencies or breakdowns for hundreds of outcome variables
for multiple countries or survey waves
publication-ready Excel files for clients.

Usage

A minimal example

library(tabbitR)

df <- data.frame(
  outcome = factor(c("A", "B", "A", NA, "C", NA)),
  sex     = factor(c("Male", "Male", "Female", "Female",
                     "Prefer not to say", "Male")),
  weight  = c(1, 2, 1, 1, 0.75, 3)
)

tmp <- tempfile(fileext = ".xlsx")

tabbit_excel(
  data        = df,
  vars        = "outcome",
  breakdown   = "sex",
  wtvar       = "weight",
  file        = tmp,
  decimals    = 1
)

tmp
#> [1] "C:\\Users\\siobh\\AppData\\Local\\Temp\\RtmpIftjlZ\\file482c63c8662c.xlsx"

# The workbook is written to a temporary location (tmp).
# Open the file in a spreadsheet application to inspect the output.

Multiple outcomes and multiple breakdowns

### Example toy survey data
set.seed(123)

survey_df <- data.frame(
  outcome1       = factor(sample(c("Agree", "Neutral", "Disagree"), 200, replace = TRUE)),
  outcome2       = factor(sample(c("Often", "Sometimes", "Never"), 200, replace = TRUE)),
  outcome3       = factor(sample(c("Yes", "No"), 200, replace = TRUE)),
  sex            = factor(sample(c("Male", "Female"), 200, replace = TRUE)),
  age            = factor(sample(c("18-34", "35-54", "55+"), 200, replace = TRUE)),
  region         = factor(sample(c("North", "Midlands", "South"), 200, replace = TRUE)),
  survey_weight  = runif(200, 0.5, 2)
)

vars   <- c("outcome1", "outcome2", "outcome3")
breaks <- c("sex", "age", "region")

tmp2 <- tempfile(fileext = ".xlsx")

tabbit_excel(
  data        = survey_df,
  vars        = vars,
  breakdown   = breaks,
  wtvar       = "survey_weight",
  file        = tmp2,
  by_breakdown = TRUE,
  decimals    = 1
)

tmp2
#> [1] "C:\\Users\\siobh\\AppData\\Local\\Temp\\RtmpIftjlZ\\file482c5a837032.xlsx"

Understanding the options

Required

data: a data frame
vars: outcome variables (character vector)
breakdown: explanatory variables

Main options

wtvar: weight variable
by_breakdown: one sheet per breakdown variable (default TRUE)
decimals: decimal places for percentages (0-6)
row_pct: compute row percentages rather than column percentages.

Missingness

missingasrow: include missing outcomes as a row
nomissing: suppress all missing-value summaries.

Display

nooverall: drop the “Overall %” column
nototal: drop the “Total %” row.

Treatment of missing values

tabbitR always makes missingness explicit.

Default

missing values excluded from valid rows in weighted table, but shown in a separate “Missing %” row
denominator for missing % = valid + missing within each column.

`missingasrow = TRUE`

Missing values appear as "Response missing" inside the main table.

`nomissing = TRUE`

No missing information is displayed in either table.

How weighted bases are computed

For each breakdown column:

weighted base W = sum(weights for non-missing outcomes)

Weighted bases are rounded to whole numbers for readability.

Recommended workflow

Convert labelled variables using haven::as_factor() when needed.
Select outcome variables.
Choose breakdown variables.
Run tabbit_excel().
Inspect the Excel workbook.
Store code + output for reproducibility.

Citation

If you use tabbitR in published work, please cite:

McAndrew, S. (2025). tabbitR: Automated weighted cross-tabulations for survey analysis. R package.
https://github.com/smmcandrew/tabbitR

A BibTeX entry is available via:

citation("tabbitR")

Session info

sessionInfo()
#> R version 4.5.0 (2025-04-11 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=C                           
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Europe/London
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tabbitR_0.1.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.54        
#>  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.9   rmarkdown_2.30   
#>  [9] lifecycle_1.0.4   cli_3.6.5         zip_2.3.3         openxlsx_4.2.8.1 
#> [13] sass_0.4.10       jquerylib_0.1.4   compiler_4.5.0    rstudioapi_0.17.1
#> [17] tools_4.5.0       evaluate_1.0.5    bslib_0.9.0       Rcpp_1.1.0       
#> [21] yaml_2.3.11       rlang_1.1.6       jsonlite_2.0.0    stringi_1.8.7

Introduction to tabbitR

Siobhan McAndrew