fastreg fastreg website

GitHub Release Build pre-commit.ci status lifecycle Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Overview

fastreg converts large SAS register files (.sas7bdat) into Apache Parquet format. This is particularly useful for researchers working with Danish registers at Statistics Denmark, where large SAS files are common. Parquet files are smaller on disk, faster to read, and work well with modern tools like DuckDB and Arrow.

A register in this context refers to a collection of related data files, typically with yearly snapshots like bef2020.sas7bdat, bef2021.sas7bdat (from the BEF register).

fastreg provides functions to:

Purpose

The primary purpose of the fastreg package is to simplify the process of converting the large Danish registers into the more modern Parquet storage format as well as to simplify reading these Parquet files. By converting data from SAS to the more modern and efficient Parquet format, the package reduces storage costs and aims to improve performance in data analysis workflows.

Installation

Install from CRAN:

install.packages("fastreg")

Install the latest development version from GitHub:

pak::pak("dp-next/fastreg")

Usage

Use convert_file() to convert a single SAS file to Parquet in Hive partition format:

library(fastreg)

convert_file(
  path = "path/to/file.sas7bdat",
  output_dir = "path/to/output_dir/"
)

Use convert_register() to convert several SAS files from the same register into a Hive partitioned Parquet dataset. To list all SAS files in a directory, you can use the helper function list_sas_files():

convert_register(
  path = list_sas_files("path/to/sas_register/"),
  output_dir = "path/to/output_dir/"
)

Use use_targets_template() to copy a targets template that converts multiple registers in parallel into your project:

use_targets_template()

Use read_register() to read a Parquet register as a DuckDB table:

read_register("path/to/parquet_register/")

See vignette("fastreg") for a complete guide.

Getting help

If you find a bug or have any questions, please add an Issue on GitHub. Please include a minimal reproducible example.

Code of conduct

Please note that the fastreg project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.