Quick Start

Gilles Colling

2026-03-17

Why Your Test Accuracy Might Be Wrong

A model shows 95% accuracy on test data, then drops to 60% in production. The usual culprit: data leakage.

Leakage happens when information from your test set contaminates training. Common causes:

BORG checks for these problems before you compute metrics.

Basic Usage

# Create sample data
set.seed(42)
data <- data.frame(
  x1 = rnorm(100),
  x2 = rnorm(100),
  y = rnorm(100)
)

# Define a split
train_idx <- 1:70
test_idx <- 71:100

# Inspect the split
result <- borg_inspect(data, train_idx = train_idx, test_idx = test_idx)
result
#> BorgRisk Assessment
#> ===================
#> 
#> Status: VALID (no hard violations)
#>   Hard violations:  0
#>   Soft inflations:  0
#>   Train indices:    70 rows
#>   Test indices:     30 rows
#>   Inspected at:     2026-03-17 13:44:34
#> 
#> No risks detected.

No violations detected. But what if we made a mistake?

# Accidental overlap in indices
bad_result <- borg_inspect(data, train_idx = 1:60, test_idx = 51:100)
bad_result
#> BorgRisk Assessment
#> ===================
#> 
#> Status: INVALID (1 hard violation) — Resistance is futile
#>   Hard violations:  1
#>   Soft inflations:  0
#>   Train indices:    60 rows
#>   Test indices:     50 rows
#>   Inspected at:     2026-03-17 13:44:34
#> 
#> --- HARD VIOLATIONS (must fix) ---
#> 
#> [1] index_overlap
#>     Train and test indices overlap (10 shared indices). This invalidates evaluation.
#>     Source: train_idx/test_idx
#>     Affected: 10 indices (first 5: 51, 52, 53, 54, 55)

BORG caught the overlap immediately.

The Main Entry Point: borg()

For most workflows, borg() is all you need. It handles two modes:

Mode 1: Diagnose Data Dependencies

When you have structured data (spatial coordinates, time column, or groups), BORG diagnoses dependencies and generates appropriate CV folds:

# Spatial data with coordinates
set.seed(42)
spatial_data <- data.frame(
  lon = runif(200, -10, 10),
  lat = runif(200, -10, 10),
  elevation = rnorm(200, 500, 100),
  response = rnorm(200)
)

# Let BORG diagnose and create CV folds
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
result
#> BORG Result
#> ===========
#> 
#> Dependency:  NONE (none severity)
#> Strategy:    random
#> Folds:       5
#> 
#> Fold sizes:  train 160-160, test 40-40
#> 
#> Access components:
#>   $diagnosis  - BorgDiagnosis object
#>   $folds      - List of train/test index vectors
#>   $cv         - Full borg_cv object

BORG detected spatial structure and recommended spatial block CV instead of random CV.

Mode 2: Validate Existing Splits

When you have your own train/test indices, BORG validates them:

# Validate a manual split
risk <- borg(spatial_data, train_idx = 1:150, test_idx = 151:200)
risk
#> BorgRisk Assessment
#> ===================
#> 
#> Status: VALID (no hard violations)
#>   Hard violations:  0
#>   Soft inflations:  0
#>   Train indices:    150 rows
#>   Test indices:     50 rows
#>   Inspected at:     2026-03-17 13:44:34
#> 
#> No risks detected.

Visualizing Results

Use standard R plot() and summary():

# Plot the risk assessment
plot(risk)

# Generate methods text for publications
summary(result)
#> 
#> Model performance was evaluated using random k-fold cross-validation (k = 5). Cross-validation strategy was determined using the BORG package (version 0.2.5) for R.

Data Dependency Types

BORG handles three types of data dependencies:

Spatial Autocorrelation

Points close together tend to have similar values. Random CV underestimates error because train and test points are intermixed.

result_spatial <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
result_spatial$diagnosis@recommended_cv
#> [1] "random"

Temporal Autocorrelation

Sequential observations are correlated. Future data must not leak into past predictions.

temporal_data <- data.frame(
  date = seq(as.Date("2020-01-01"), by = "day", length.out = 200),
  value = cumsum(rnorm(200))
)

result_temporal <- borg(temporal_data, time = "date", target = "value")
result_temporal$diagnosis@recommended_cv
#> [1] "temporal_block"

Clustered/Grouped Data

Observations within groups (patients, sites, species) are more similar than between groups.

grouped_data <- data.frame(
  site = rep(1:20, each = 10),
  measurement = rnorm(200)
)

result_grouped <- borg(grouped_data, groups = "site", target = "measurement")
result_grouped$diagnosis@recommended_cv
#> [1] "random"

Risk Categories

BORG classifies risks into two categories:

Hard Violations (Evaluation Invalid)

These invalidate your results completely:

Risk Description
index_overlap Same row in both train and test
duplicate_rows Identical observations in train and test
target_leakage Feature with
group_leakage Same group in train and test
temporal_leakage Test data predates training data
preprocessing_leakage Scaler/PCA fitted on full data

Soft Inflation (Results Biased)

These inflate metrics but don’t completely invalidate:

Risk Description
proxy_leakage Feature with
spatial_proximity Test points too close to train
random_cv_inflation Random CV on dependent data

Detecting Specific Leakage Types

Target Leakage

Features derived from the outcome:

# Simulate target leakage
leaky_data <- data.frame(
  x = rnorm(100),
  leaked_feature = rnorm(100),  # Will be made leaky
  outcome = rnorm(100)
)
# Make leaked_feature highly correlated with outcome
leaky_data$leaked_feature <- leaky_data$outcome + rnorm(100, sd = 0.05)

result <- borg_inspect(leaky_data, train_idx = 1:70, test_idx = 71:100,
                       target = "outcome")
result
#> BorgRisk Assessment
#> ===================
#> 
#> Status: INVALID (1 hard violation) — Resistance is futile
#>   Hard violations:  1
#>   Soft inflations:  0
#>   Train indices:    70 rows
#>   Test indices:     30 rows
#>   Inspected at:     2026-03-17 13:44:34
#> 
#> --- HARD VIOLATIONS (must fix) ---
#> 
#> [1] target_leakage_direct
#>     Feature 'leaked_feature' has correlation 0.998 with target 'outcome'. Likely derived from outcome.
#>     Source: data.frame$leaked_feature

Group Leakage

Same entity in train and test:

# Simulate clinical data with patient IDs
clinical_data <- data.frame(
  patient_id = rep(1:10, each = 10),
  visit = rep(1:10, times = 10),
  measurement = rnorm(100)
)

# Random split ignoring patients (BAD)
set.seed(123)
all_idx <- sample(100)
train_idx <- all_idx[1:70]
test_idx <- all_idx[71:100]

# Check for group leakage
result <- borg_inspect(clinical_data, train_idx = train_idx, test_idx = test_idx,
                       groups = "patient_id")
result
#> BorgRisk Assessment
#> ===================
#> 
#> Status: VALID (no hard violations)
#>   Hard violations:  0
#>   Soft inflations:  0
#>   Train indices:    70 rows
#>   Test indices:     30 rows
#>   Inspected at:     2026-03-17 13:44:34
#> 
#> No risks detected.

Working with CV Folds

Access the generated folds directly:

result <- borg(spatial_data, coords = c("lon", "lat"), target = "response", v = 5)

# Number of folds
length(result$folds)
#> [1] 5

# First fold's train/test sizes
cat("Fold 1 - Train:", length(result$folds[[1]]$train),
    "Test:", length(result$folds[[1]]$test), "\n")
#> Fold 1 - Train: 160 Test: 40

Exporting Results

For reproducibility, export validation certificates:

# Create a certificate
cert <- borg_certificate(result$diagnosis, data = spatial_data)
cert
#> BORG Validation Certificate
#> ===========================
#> 
#> Generated: 2026-03-17T13:44:34+0100
#> BORG version: 0.2.5
#> Validation: PASSED
#> 
#> Data Characteristics:
#>   Observations: 200
#>   Features: 4
#>   Hash: sig:200|4|lon,lat,elevation,response
#> 
#> Dependency Diagnosis:
#>   Type: NONE
#>   Severity: none
#>   Recommended CV: random
#> 
#>   Spatial Analysis:
#>     Moran's I: 0.000 (p = 0.6171)
#>     Range: 2.0
#> 
#>   Temporal Analysis:
#> 
#>   Clustered Analysis:
# Export to file
borg_export(result$diagnosis, spatial_data, "validation.yaml")
borg_export(result$diagnosis, spatial_data, "validation.json")

Writing Methods Sections

summary() generates publication-ready methods paragraphs that include the statistical tests BORG ran, the dependency type detected, and the CV strategy chosen. Three citation styles are supported:

# Default APA style
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
methods_text <- summary(result)
#> 
#> Model performance was evaluated using random k-fold cross-validation (k = 5). Cross-validation strategy was determined using the BORG package (version 0.2.5) for R.
# Nature style
summary(result, style = "nature")

# Ecology style
summary(result, style = "ecology")

The returned text is a character string you can paste directly into a manuscript. If you also ran borg_compare_cv(), pass the comparison object to include empirical inflation estimates:

comparison <- borg_compare_cv(spatial_data, response ~ lon + lat,
                              coords = c("lon", "lat"))
summary(result, comparison = comparison)

Empirical CV Comparison

When reviewers ask “does it really matter?”, borg_compare_cv() runs both random and blocked CV on the same data and model, then tests whether the difference is statistically significant:

comparison <- borg_compare_cv(
  spatial_data,
  formula = response ~ lon + lat,
  coords = c("lon", "lat"),
  v = 5,
  repeats = 5  # Use more repeats in practice
)
#> Computing dependency diagnosis...
#> Dependency: none (severity: none)
#> Blocked CV strategy: random
#> Repeat 1/5...
#> 
#> CV Comparison Results
#> =====================
#> 
#> Metric: rmse
#> Random CV:  1.0534 (SD: 0.0039)
#> Blocked CV: 1.0425 (SD: 0.0062)
#> 
#> Inflation: 1.1% deflated*
#> p-value: 0.01345 (paired t-test, n=5 repeats)
#> 
#> Random CV significantly deflates rmse estimates.
print(comparison)
#> CV Comparison Results
#> =====================
#> 
#> Metric: rmse
#> Random CV:  1.0534 (SD: 0.0039)
#> Blocked CV: 1.0425 (SD: 0.0062)
#> 
#> Inflation: 1.1% deflated*
#> p-value: 0.01345 (paired t-test, n=5 repeats)
#> 
#> Random CV significantly deflates rmse estimates.
plot(comparison)

Power Analysis After Blocking

Switching from random to blocked CV reduces effective sample size. Before committing to blocked CV, check whether your dataset is large enough:

# Clustered data: 20 sites, 10 observations each
clustered_data <- data.frame(
  site = rep(1:20, each = 10),
  value = rep(rnorm(20, sd = 2), each = 10) + rnorm(200, sd = 0.5)
)

pw <- borg_power(clustered_data, groups = "site", target = "value")
print(pw)
#> BORG Power Analysis
#> ====================
#> 
#> Sample size:      200 observations
#> Effective size:   22 (after blocking)
#> Design effect:    8.93
#>   Components:
#>     clustered     8.93
#> 
#> Target power:     80%
#> Alpha:            0.050
#> 
#> Min detectable effect (blocked): d = 1.184
#> Min detectable effect (random):  d = 0.396
#> Effect size ratio:               2.99x
#> 
#> Sufficient data:  YES
#> 
#> Blocking reduces effective sample size from 200 to 22 (DEFF = 8.9).
#> Minimum detectable effect: d = 1.18 (blocked) vs d = 0.40 (random).
#> Large design effect. Consider whether a coarser blocking strategy could reduce power loss.
#> Despite power loss, blocked CV (group_fold) is required for valid inference.
summary(pw)
#> BORG Power Analysis Summary
#> ===========================
#> 
#> N actual:     200
#> N effective:  22
#> Design effect: 8.93
#> 
#> Sufficient: YES

Interface Summary

Function Purpose
borg() Main entry point — diagnose data or validate splits
borg_inspect() Detailed inspection of train/test split
borg_diagnose() Analyze data dependencies only
borg_compare_cv() Empirical random vs blocked CV comparison
borg_power() Power analysis after blocking
plot() Visualize results
summary() Generate methods text for papers
borg_certificate() Create validation certificate
borg_export() Export certificate to YAML/JSON

See Also