CharAnalysis is a tool to help reconstruct local fire history from lake-sediment charcoal records. Charcoal preserved in lake sediments is a direct proxy for past fire activity, and peaks in charcoal deposition above a slowly varying background level have been shown to record individual fire events near a lake. CharAnalysis formalizes this peak-detection logic as a reproducible, quantitative workflow.
This R package is a direct translation of CharAnalysis v2.0 (MATLAB), validated against reference outputs on four benchmark datasets spanning a range of record lengths, sampling resolutions, ecosystems, and analysis configurations. Analytical methods are described in Higuera et al. (2009).
The full workflow proceeds in five steps:
# Install from GitHub (requires devtools)
devtools::install_github("phiguera/CharAnalysis",
subdir = "CharAnalysis_2_0_R")
# Suggested packages for figures
install.packages(c("ggplot2", "patchwork", "ggtext"))Code Lake (site code CO) in Alaska, USA, is the primary
validation dataset for CharAnalysis. The record spans
approximately the past 7,300 years and is analyzed here using a local
Gaussian mixture model (GMM) for threshold determination, with the
working threshold percentile set at 0.95 to compensate for
language-induced GMM drift relative to the MATLAB v2.0 implementation;
see the Comparison with MATLAB v2.0 section below for the
rationale.
CharAnalysis reads two CSV files:
CO_charData.csv — the charcoal data
table (depth, age, volume, count).CO_charParams.csv — the analysis
parameter file.Both files ship with the package in inst/validation/ and
are located with system.file(). The data file is
auto-derived from the site name embedded inside the params file, so only
the params path needs to be supplied to CharAnalysis(). In
your own use you would substitute the path to a
*_charParams.csv file of your choosing.
For this worked example we use
CO_compensated_charParams.csv, a variant of the standard
CO_charParams.csv in which the working threshold percentile
is lowered from 0.99 to 0.95. This compensates for a language-induced
shift in the GMM-fitted noise distribution; see the Comparison with
MATLAB v2.0 section below for details.
A single call to CharAnalysis() runs all five analytical
steps and returns a named list of results. The function prints progress
messages as it works through each step.
library(CharAnalysis)
out <- CharAnalysis(params_file)
#> (1) Reading input file...
#> ...done.
#> (1b) Validating input parameters...
#> Parameter validation passed.
#> (2) Pretreating charcoal data...
#> NOTE: zoneDiv[end] (7500 yr BP) exceeds the bottom age of the last raw sample (7444 yr BP). zoneDiv[end] corrected to 7444 yr BP.
#> ...done.
#> (3) Smoothing resampled CHAR to estimate low-frequency trends
#> and calculating peak CHAR...
#> ...done.
#> (4) Defining possible thresholds for peak identification...
#> ...done.
#> (5) Identifying peaks and applying minimum-count screening...
#> ...done.
#> (6) Post-processing: fire-return intervals, Weibull statistics...
#> ...done.
#> (7) Analysis complete.
#> Save CSV: char_write_results(out$char_results, out$site, out_dir = "<your/path>")
#> All figures: char_plot_all(out) [Figs 1-2 only when allFigures = 1]
#> char_plot_all(out, save = TRUE, out_dir = "<your/path>") # save PDFs
#> One figure: char_plot_raw(out) # Fig 1: C_raw, C_interp, C_back options
#> char_plot_thresh_diag(out) # Fig 2: threshold diagnostics
#> char_plot_peaks(out) # Fig 3: peak analysis
#> char_plot_sni(out) # Fig 4: threshold sensitivity and SNI
#> char_plot_cumulative(out) # Fig 5: cumulative peaks
#> char_plot_fri(out) # Fig 6: FRI distributions
#> char_plot_fire_history(out) # Fig 7: continuous fire history
#> char_plot_zones(out) # Fig 8: CHAR zone comparisonsCharAnalysis() returns a named list. The most commonly
used elements are:
names(out)
#> [1] "charcoal" "pretreatment" "smoothing" "peak_analysis"
#> [5] "results" "site" "gap_in" "char_thresh"
#> [9] "post" "char_results"charcoal objectout$charcoal holds all time-series outputs, both raw and
processed:
# Inspect the first few rows of key time series
head(data.frame(
age_BP = out$charcoal$ybpI, # interpolated age (yr BP)
CHAR = out$charcoal$accI, # C_interpolated (pieces cm-2 yr-1)
C_bkg = out$charcoal$accIS, # C_background
C_peak = out$charcoal$peak, # C_peak (residuals)
peaks = out$charcoal$charPeaks[, 4] # final-threshold peak flags (0/1)
))
#> age_BP CHAR C_bkg C_peak peaks
#> 1 -51 0.01185185 0.06231143 -5.045958e-02 0
#> 2 -36 0.08556833 0.06170566 2.386266e-02 0
#> 3 -21 0.10344828 0.06107716 4.237112e-02 0
#> 4 -6 0.09309350 0.06042475 3.266875e-02 0
#> 5 9 0.05964912 0.05974774 -9.862134e-05 0
#> 6 24 0.05964912 0.05904593 6.031933e-04 0char_thresh objectout$char_thresh holds threshold values, SNI, and
goodness-of-fit results:
# Threshold at the final percentile (column 4 = threshValues[4])
range(out$char_thresh$pos[, 4], na.rm = TRUE)
#> [1] 0.00721715 0.10081465
# Signal-to-noise index (SNI): values > 3 indicate a strong signal
summary(out$char_thresh$SNI)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.927 3.775 4.652 4.746 5.447 9.513out$post holds fire-history summary metrics:
# Fire-return intervals (FRIs) and mean FRI
cat("Number of FRIs:", length(out$post$FRI), "\n")
#> Number of FRIs: 49
cat("Mean FRI:", round(mean(out$post$FRI), 1), "yr\n")
#> Mean FRI: 147.6 yr
# Per-zone Weibull statistics (zone 1)
fri_z1 <- out$post$FRI_params_zone[1, ]
cat(sprintf(
"Zone 1 — nFRI: %d mFRI: %.1f yr WBLb: %.1f WBLc: %.2f\n",
fri_z1[1], fri_z1[2], fri_z1[5], fri_z1[8]
))
#> Zone 1 — nFRI: 41 mFRI: 128.8 yr WBLb: 147.5 WBLc: 1.81CharAnalysis provides eight publication-quality ggplot2
figures. The five analytical figures (3, 5, 6, 7, 8) summarise the
fire-history analysis at progressively higher levels of integration; the
three diagnostic figures (1, 2, 4) support inspection of the
pretreatment, smoothing, and threshold-determination steps. Each figure
is produced by its own char_plot_*() function; the
convenience wrapper char_plot_all() writes the five
analytical figures to PDF in one call. The figures below render live
from the Code Lake analysis just produced.
The figures require the ggplot2,
patchwork, and ggtext packages, all in
Suggests. If any of these is unavailable in the current R
environment the figure chunks below will be skipped silently.
char_plot_raw() is the diagnostic figure for the
pretreatment and smoothing steps. It overlays the raw CHAR series, the
resampled (interpolated) CHAR at the constant time-step set by
yrInterp, and the Cbackground curves produced by
all five smoothing methods (1: lowess; 2: robust lowess; 3: moving
average; 4: moving median; 5: moving mode) over the user-specified
smoothing window. Inspecting these alternatives helps assess whether the
chosen smoothing method is well-behaved on the record at hand and
whether the smoothing window is appropriate.
char_plot_thresh_diag() shows how the local threshold is
determined within sliding windows along the record. The figure is
organized as a 5×5 grid of windows, each panel showing the empirical
Cpeak distribution within that window, the fitted noise
component (Gaussian or Gaussian mixture, per threshMethod),
and the resulting threshold value at the working percentile. This
diagnostic is useful for confirming that the local threshold algorithm
is behaving sensibly across the full record, and for spotting any
windows where the noise model fits poorly.
char_plot_peaks() is the diagnostic figure for the
peak-detection step. The top panel shows the resampled charcoal
accumulation rate (CHAR, black bars) with the fitted
Cbackground trend overlaid as a grey line. The bottom panel
shows the Cpeak residual series with the positive and
negative threshold lines (red), identified fire events (+ symbols), and
candidate peaks that failed the minimum-count significance screen (grey
dots). This is typically the first figure to inspect when assessing
whether the smoothing window and threshold settings are appropriate for
a given record.
char_plot_cumulative() shows the cumulative count of
identified fire events plotted against age. The slope of the curve at
any point reflects the instantaneous fire frequency: steeper slopes
indicate higher fire frequency, gentler slopes lower fire frequency.
Visible changes in slope identify periods of regime change in the
record.
char_plot_fri() summarises the distribution of
fire-return intervals (FRIs) within each user-defined stratigraphic
zone. Each panel shows a histogram of FRIs in 20-year bins, normalised
to proportions of the zone’s FRI population. A two-parameter Weibull
probability density function is overlaid where the Kolmogorov-Smirnov
goodness-of-fit test passes (p > 0.10 for n < 30; p > 0.05 for
n ≥ 30). Weibull scale (b) and shape (c) parameters,
mean FRI, and sample size are annotated on each panel.
char_plot_fire_history() is the integrated fire-history
summary, with three stacked panels sharing a common time axis. The top
panel shows peak magnitude (integrated Cpeak above threshold,
pieces cm-2 peak-1) for each fire event. The
middle panel shows fire-return intervals through time as filled squares,
with the smoothed mean FRI curve (black line) and bootstrapped 95%
confidence ribbon (grey). The bottom panel shows lowess-smoothed fire
frequency (fires per 1000 yr), the most common single summary in
fire-history publications.
char_plot_zones() tests whether raw charcoal
accumulation differs between user-defined stratigraphic zones. The left
panel shows empirical cumulative distribution functions of raw CHAR
within each zone, with pairwise two-sample Kolmogorov-Smirnov p-values
annotated. The right panel shows box plots (10th, 25th, 50th, 75th, 90th
percentiles) of raw CHAR by zone, allowing direct visual comparison of
central tendency and spread.
char_plot_all() saves all five figures to PDF in one
call. out_dir is required when save = TRUE.
The example below writes to a temporary directory so it runs on any
system; in your own work you would substitute a path of your
choosing.
char_plot_all(out, save = TRUE, out_dir = tempdir())
# Saves to tempdir():
# CO_03_CHAR_analysis.pdf
# CO_05_cumulative_peaks.pdf
# CO_06_FRI_distributions.pdf
# CO_07_continuous_fire_hx.pdf
# CO_08_zone_comparisons.pdfNote: Figures 9 (threshold sensitivity detail) and 10 (multi-site comparisons) from the MATLAB v2.0 interface are not implemented in this R package. Figure 4 (
char_plot_sni(), sensitivity to alternative threshold values and signal-to-noise index) is implemented but not displayed in this vignette; call it directly on theoutobject to inspect.
char_write_results() writes the 33-column output matrix
to a CSV file whose column names and format match the MATLAB
charResults output exactly. out_dir is
required (no default); substitute a path of your choosing for the
temporary directory used here.
char_write_results(out$char_results,
site = out$site,
out_dir = tempdir())
# Writes: <tempdir>/CO_charResults.csvThe output CSV contains one row per interpolated time step and 33 columns covering all analytical outputs from Cinterp through to per-zone Weibull confidence intervals. Column headers match the MATLAB reference file exactly to facilitate direct numerical comparison.
The parameter file (*_charParams.csv) controls all
aspects of the analysis. The most commonly adjusted parameters are:
| Parameter | Default | Description |
|---|---|---|
yrInterp |
15 | Resampling resolution (yr). Set to 0 for automatic (median raw resolution). |
yr |
500 | Smoothing window width (yr) for Cbackground estimation. |
threshType |
2 | Threshold type: 1 = global, 2 = local (sliding window). |
threshMethod |
3 | Noise distribution: 2 = Gaussian, 3 = Gaussian mixture model. |
threshValues |
0.95, 0.99, 0.999, 0.99 | Percentile thresholds; the final value defines the working threshold. |
minCountP |
0.05 | Alpha level for the minimum-count significance screen. |
peakFrequ |
1000 | Window width (yr) for smoothed fire frequency and FRI statistics. |
zoneDiv |
record bounds | Zone boundaries (yr BP) for stratified FRI and Weibull analysis. |
CharAnalysis in R is a direct translation of
CharAnalysis v2.0 (MATLAB). Outputs are quantitatively
equivalent on all validated reference datasets. The table below
summarises results across the four validation datasets; full details are
in inst/z_Validation_report_R_vs_MATLAB_V_2.0.md.
| Dataset | Site | Smoothing | Threshold | charBkg max|diff| | Peaks R v2.0.x | Peaks MATLAB v2.0 |
|---|---|---|---|---|---|---|
| CO | Code Lake, AK | Method 4 (moving median) | 0.99 | ~5 × 10-6 | 39 | 48 |
| CO (compensated) | Code Lake, AK | Method 4 (moving median) | 0.95* | ~5 × 10-6 | 50 | 48 |
| CH10 | Chickaree Lake, CO | Method 2 (robust lowess) | 0.99 | 0.267 | 59 | 50 |
| SI17 | Silver Lake, CO | Method 2 (robust lowess) | 0.99 | 0.130 | 25 | 31 |
| RA07 | Raven Lake, AK | Method 2 (robust lowess) | 0.99 | < 0.001 | 15 | 17 |
* The Threshold column lists the R-side working percentile
(threshValues[4]); MATLAB uses 0.99 in all rows. The “CO
(compensated)” row therefore compares R @ 0.95 with MATLAB @ 0.99,
illustrating the parameter offset that brings R into qualitative
agreement with the published MATLAB result. See Compensating for GMM
drift below.
Two sources of numerical difference are documented:
1. Robust lowess background (smoothing method 2) —
MATLAB’s Curve Fitting Toolbox smooth(..., 'rlowess') and
the R char_lowess() port produce slightly different
Cbackground series. For gap-free records (RA07) the
difference is negligible (< 0.001). For records with NaN gaps (CH10,
SI17) the difference is larger (≤ 0.267) because the two implementations
handle gap positions differently inside the bisquare robustness
iteration. The other smoothing methods (1 plain lowess, 3 moving
average, 4 moving median, 5 moving mode) are unaffected and agree to
within floating-point noise on all datasets — including Code Lake (CO),
which uses method 4.
2. Gaussian mixture model (GMM) peak counts — The R
package ports the MATLAB GaussianMixture.m EM algorithm
directly. Floating-point arithmetic accumulates differently across
languages during EM iterations, causing the two implementations to reach
slightly different threshold values in some local windows. Peak counts
differ by 10–20% across datasets, with the direction varying (R
sometimes higher, sometimes lower). All threshold and peak differences
are downstream consequences of this single source; interpolation and
peak-magnitude outputs agree to within numerical precision.
Compensating for GMM drift. The peak-count
differences in the strict-comparison rows above reflect a 1-to-1
reproduction of each MATLAB v2.0 reference run, with all parameters held
identical across languages. The differences are not irreducible. For
Code Lake, the R implementation produces a slightly higher GMM threshold
than MATLAB does, so fewer Cpeak values pass the screen;
lowering the working threshold percentile (threshValues[4])
from 0.99 to 0.95 compensates for this drift and brings the R peak count
substantially closer to the MATLAB v2.0 count. The required compensation
is dataset-specific because the direction of GMM drift varies by record.
We report unmodified results in the validation table to characterize the
language-induced numerical drift directly, rather than mask it.
The worked example earlier in this vignette uses the bundled
CO_compensated_charParams.csv (identical to
CO_charParams.csv except
threshValues[4] = 0.95). With this compensation, R
identifies 50 peaks for Code Lake. This is much closer to the published
MATLAB v2.0 result of 48 peaks, and it captures the significant decrease
in FRIs from Zone 1 to Zone 2, highlighted in Higuera et al. (2009).
Weibull confidence intervals — Bootstrap CIs use random resampling and will differ between R and MATLAB runs regardless of any other differences. Weibull point estimates (scale b, shape c) agree within a few percent on datasets where peak counts are similar.
If you use CharAnalysis in published research, please cite Higuera et al. (2009), the first study to apply the core analytical tools implemented in the program. If you used CharAnalysis v2.0 (MATLAB or R v2.0.x) specifically, please also cite the software:
Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. Ecological Monographs 79:201–219. https://doi.org/10.1890/07-2019.1
Higuera, P.E. 2026. CharAnalysis: Diagnostic and analytical tools for peak analysis in sediment-charcoal records (Version 2.0). Zenodo. https://doi.org/10.5281/zenodo.19304064
Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. Ecological Monographs 79:201–219. https://doi.org/10.1890/07-2019.1