SCIproj

Author: Saskia Otto License: MIT

An R package for the initialization and organization of a scientific project following reproducible research and FAIR principles.

Overview

SCIproj is an R package that allows users to initialize a project through its function create_proj() and manage a scientific project as an R package or a research compendium. This combines structure, where files are located, and workflow, how analyses are reproduced or replicated.

The package is built on modern reproducibility standards and guidelines such as:

Defaults

The package has some default settings to ensure reproducibility. These include:

Project structure

your-project/
├── DESCRIPTION             # Project metadata, dependencies, and author info (with ORCID).
├── README.Rmd              # Top-level project description.
├── your-project.Rproj      # RStudio project file.
├── CITATION.cff            # Machine-readable citation metadata for FAIR compliance.
├── CONTRIBUTING.md         # Contribution guidelines.
├── LICENSE.md              # Full license text (optional, requires add_license).
├── NAMESPACE               # Auto-generated by roxygen2 (do not edit by hand).
│
├── data-raw/               # Raw data files and pre-processing scripts.
│   ├── clean_data.R        # Script template for data cleaning.
│   ├── DATA_SOURCES.md     # Data provenance: source, license, DOI, download date.
│   └── ...
│
├── data/                   # Cleaned datasets stored as .rda files.
│
├── R/                      # Custom R functions and dataset documentation.
│   ├── function_ex.R       # Template for custom functions.
│   ├── data.R              # Template for dataset documentation.
│   └── ...
│
├── analyses/               # R scripts or R Markdown/Quarto documents for analyses.
│   ├── figures/            # Generated plots.
│   └── ...
│
├── docs/                   # Publication-ready documents (article, report, presentation).
├── trash/                  # Temporary files that can be safely deleted.
│
├── _targets.R              # Pipeline definition for reproducible workflow (default).
├── renv/                   # renv library and settings (default).
├── renv.lock               # Lockfile for reproducible package versions (default).
└── Dockerfile              # Container definition for full reproducibility (optional).

Why an R package as research compendium?

Installation and usage

Install the development version from GitHub:

### Using remotes
# install.packages("remotes")
remotes::install_github("saskiaotto/SCIproj")

### Or better: using the new pak package
# install.packages("pak")
pak::pkg_install("saskiaotto/SCIproj")

Creating the project

library("SCIproj")
create_proj("my_research_project")

This creates a project with renv, targets, CITATION.cff, and DATA_SOURCES.md by default.

Customize with parameters:

### Full-featured project with GitHub, CI, and ORCID
create_proj("my_research_project",
  add_license = "MIT",
  license_holder = "Jane Doe",
  orcid = "0000-0001-2345-67893",
  create_github_repo = TRUE,
  ci = "gh-actions"
)

### Minimal project without workflow tools
create_proj("my_research_project",
  use_renv = FALSE,
  use_targets = FALSE
)

Parameters

Parameter Default Description
data_raw TRUE Add data-raw/ folder with templates
makefile FALSE Add makefile.R template
testthat FALSE Add testthat infrastructure
use_pipe FALSE Add magrittr pipe (native \|> recommended)
add_license NULL License type: "MIT", "GPL", "Apache", etc.
license_holder "Your name" License holder / project author
orcid NULL ORCID iD for CITATION.cff
use_git TRUE Initialize local git repo
create_github_repo FALSE Create GitHub repo (needs GITHUB_PAT)
ci "none" CI type: "none" or "gh-actions"
use_renv TRUE Initialize renv for dependency management
use_targets TRUE Add _targets.R pipeline template
use_docker FALSE Add Dockerfile template
open_proj FALSE Open new project in RStudio

Developing the project

  1. Create the project with create_proj().

  2. Edit DESCRIPTION with project metadata: title, summary, contributors (with ORCID), license, dependencies.

  3. Edit README.Rmd with project details: objectives, timeline, workflow.

  4. Document your data provenance in data-raw/DATA_SOURCES.md: source, license, download date, DOI for each dataset.

  5. Place original (raw) data in data-raw/. Use clean_data.R (or more scripts) for pre-processing. Store clean datasets with usethis::use_data().

  6. Document clean datasets using roxygen in R/ (see template data.R). For details, see Documenting data.

  7. Place custom functions in R/ with roxygen documentation. See the documentation chapter in the R Packages book.

  8. Write tests for your functions in tests/ (set testthat = TRUE in create_proj()). See Testing basics.

  9. Place analysis scripts/notebooks in analyses/. Save plots in analyses/figures/.

  10. Place final manuscripts, reports, and presentations in docs/. Use R Markdown, Quarto, or templates from rticles, thesisdown, or Quarto journal extensions.

  11. Keep dependencies in sync: usethis::use_package() for DESCRIPTION, renv::snapshot() for the lockfile.

  12. Update CITATION.cff when you archive your project or publish.

Workflow

For a detailed introduction to targets, see the user manual.

For maximum reproducibility, consider also using Docker (use_docker = TRUE). See the Rocker Project for R-specific Docker images.

Archiving and DOI

When your project is finalized:

  1. Archive the GitHub repo to make it read-only.
  2. Get a DOI via Zenodo (integrates directly with GitHub) or another DOI Registration Agency.
  3. Update CITATION.cff with the DOI.
  4. Optionally, generate a codemeta.json with codemetar::write_codemeta() for richer metadata.

Useful resources

Guidelines and standards

R packages and tools

Research compendium concept

Credits