Title: Robust Hotelling-Type T² Control Chart Based on the Dual STATIS Approach
Version: 0.1.0
Description: Implements a robust multivariate control-chart methodology for batch-based industrial processes with multiple correlated variables using the Dual STATIS (Structuration des Tableaux A Trois Indices de la Statistique) framework. A robust compromise covariance matrix is constructed from Phase I batches with the Minimum Covariance Determinant (MCD) estimator, and a Hotelling-type T² statistic is applied for anomaly detection in Phase II. The package includes functions to simulate clean and contaminated batches, to compute both robust and classical Hotelling T² control charts, to visualize results via robust biplots, and to launch an interactive 'shiny' dashboard. An internal dataset (pharma_data) is provided for reproducibility. See Lavit, Escoufier, Sabatier and Traissac (1994) <doi:10.1016/0167-9473(94)90134-1> for the original STATIS methodology, and Rousseeuw and Van Driessen (1999) <doi:10.1080/00401706.1999.10485670> for the MCD estimator.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Imports: dplyr, MASS, ggplot2, rrcov, shiny, ggrepel, forcats, Matrix
Depends: R (≥ 3.5)
URL: https://github.com/SergioDanielFG/robustT2
BugReports: https://github.com/SergioDanielFG/robustT2/issues
Suggests: spelling
Language: en-US
NeedsCompilation: no
Packaged: 2025-09-05 12:19:03 UTC; sergi
Author: Sergio Daniel Frutos Galarza ORCID iD [aut, cre], Omar Ruiz Barzola ORCID iD [aut], Purificación Galindo Villardón ORCID iD [aut]
Maintainer: Sergio Daniel Frutos Galarza <sergio_dan88@hotmail.com>
Repository: CRAN
Date/Publication: 2025-09-10 08:00:02 UTC

Classical Hotelling T2 Chart - Phase 1

Description

Applies the classical Hotelling T2 methodology to Phase 1 data, using the sample mean and covariance matrix.

Usage

hotelling_t2_phase1(data, variables)

Arguments

data

A data frame containing Phase 1 data (under control batches).

variables

A character vector with the names of the quantitative variables to be used.

Value

A list with:


Classical Hotelling T2 Chart - Phase 2

Description

Evaluates new batches (Phase 2) using T2 statistics based on Phase 1 estimators.

Usage

hotelling_t2_phase2(new_data, variables, center, covariance)

Arguments

new_data

A data frame with new batches to evaluate (Phase 2).

variables

Character vector of quantitative variables.

center

Mean vector from Phase 1.

covariance

Covariance matrix from Phase 1.

Value

A list with:


Simulated Pharmaceutical Manufacturing Data

Description

This dataset contains simulated pharmaceutical manufacturing data generated by simulate_pharma_batches() with seed = 780 and obs_per_batch = 30.

Usage

data("pharma_data")

Format

A data frame with 450 rows and 7 variables:

Batch

Batch identifier (factor)

Phase

Phase indicator: "Phase 1" or "Phase 2" (factor)

Status

Batch status: "Under Control" or "Out of Control" (factor)

Concentration

Concentration of active ingredient (mg/mL)

Humidity

Humidity percentage (% w/w)

Dissolution

Dissolution percentage (% released)

Density

Density (g/cm^3)

Details

Phase 1 includes 10 under-control batches with natural variability in mean and covariance, without contamination.

Phase 2 includes 2 additional under-control batches and 3 out-of-control batches. The out-of-control batches exhibit shifts in both mean and variability, along with moderate contamination in a portion of their observations.

Each batch contains 30 observations measured across four quantitative quality-control variables.

Source

Simulated using simulate_pharma_batches with seed = 780 and obs_per_batch = 30.


Plot Classical Hotelling T2 Control Chart

Description

Plots the classical Hotelling T2 statistics per batch with a uniform color line. Batches are evaluated against a control threshold obtained from the chi-squared distribution with degrees of freedom equal to the number of variables.

Usage

plot_classical_hotelling_t2_chart(
  t2_statistics,
  num_vars,
  title = "Classical Hotelling T2 Control Chart"
)

Arguments

t2_statistics

A data frame with columns Batch and T2_Stat.

num_vars

Integer. Number of variables used in the multivariate analysis (to compute the Chi² threshold).

title

Optional string. Plot title.

Value

A ggplot2 object representing the control chart.

Examples

# Simulate pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()

# Phase 1 analysis: use Phase 1 data
phase1_data <- subset(sim_batches, Phase == "Phase 1")

# Apply classical Hotelling T2 methodology
t2_result <- hotelling_t2_phase1(
  data = phase1_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

# Plot classical Hotelling T2 control chart
plot_classical_hotelling_t2_chart(
  t2_statistics = t2_result$batch_statistics,
  num_vars = 4
)

Plot Classical Hotelling T2 Control Chart - Phase 2

Description

Plots the classical Hotelling T² statistics per batch for Phase 2 data, using the reference mean and covariance matrix estimated from Phase 1. Batches are color-coded by control status ("Under Control" = blue, "Out of Control" = red).

Usage

plot_classical_hotelling_t2_phase2_chart(
  t2_statistics,
  num_vars,
  title = "Classical Hotelling T2 Control Chart (Phase 2)"
)

Arguments

t2_statistics

A data frame with columns Batch, T2_Stat, and Status.

num_vars

Integer. Number of variables used in the multivariate analysis (degrees of freedom for Chi²).

title

Optional string. Plot title.

Value

A ggplot2 object with the Phase 2 control chart.

Examples

# Simulate pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()

# Split by phase
phase1_data <- subset(sim_batches, Phase == "Phase 1")
phase2_data <- subset(sim_batches, Phase == "Phase 2")

# Fit Phase 1 classical estimators
t2_phase1 <- hotelling_t2_phase1(
  data = phase1_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

# Evaluate Phase 2 batches
t2_phase2 <- hotelling_t2_phase2(
  new_data = phase2_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density"),
  center = t2_phase1$center,
  covariance = t2_phase1$covariance
)

# Combine with status for plotting
status_info <- phase2_data[!duplicated(phase2_data$Batch), "Status"]
t2_phase2_plot <- cbind(t2_phase2$batch_statistics, Status = status_info)

# Plot Phase 2 control chart
plot_classical_hotelling_t2_phase2_chart(
  t2_statistics = t2_phase2_plot,
  num_vars = 4
)

HJ-Biplot Projection - Robust STATIS Dual (Phase 2)

Description

Projects new batches from Phase 2 into the HJ-Biplot space defined by the robust compromise matrix and eigen decomposition from Phase 1.

Usage

plot_statis_biplot_projection(phase1_result, phase2_result, dims = c(1, 2))

Arguments

phase1_result

Result from robust_statis_phase1().

phase2_result

Result from robust_statis_phase2() (must include standardized_data, t2_stats_by_batch and threshold).

dims

Dimensions to plot (default: c(1, 2)).

Details

This implementation follows the HJ-Biplot formulation of Galindo-Villardón (1986). The compromise matrix C, being symmetric and positive semidefinite, is decomposed via an eigen decomposition (not a rectangular SVD). The square roots of eigenvalues are used to build the biplot scaling, consistent with robust STATIS Dual.

Value

A ggplot2 object with the projected HJ-Biplot for Phase 2 batches.

Examples

sim_batches <- simulate_pharma_batches()
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")
phase2_data <- subset(sim_batches, Phase == "Phase 2")

phase1 <- robust_statis_phase1(
  data = phase1_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

phase2 <- robust_statis_phase2(
  new_data = phase2_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density"),
  medians = phase1$global_medians,
  mads = phase1$global_mads,
  compromise_matrix = phase1$compromise_matrix,
  global_center = phase1$global_center
)

plot_statis_biplot_projection(phase1, phase2)

HJ-Biplot of Robust STATIS Dual Compromise (Galindo-Villardón)

Description

Generates an HJ-Biplot using the compromise matrix obtained from robust STATIS Dual. Individuals (batch centers) are projected as G = U D, and variables as H = V D, where D is the diagonal matrix of square roots of eigenvalues.

Usage

plot_statis_hj_biplot(
  phase1_result,
  dims = c(1, 2),
  color_by = c("none", "weight", "distance"),
  highlight_batches = NULL
)

Arguments

phase1_result

Result from robust_statis_phase1().

dims

Dimensions to plot (default: c(1, 2)).

color_by

One of "none", "weight", or "distance" for coloring batches.

highlight_batches

Optional vector of batch names to emphasize.

Value

ggplot2 object with HJ-Biplot.

Examples

sim_batches <- simulate_pharma_batches()
phase1 <- robust_statis_phase1(
  data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
plot_statis_hj_biplot(phase1)

Plot Control Chart - Robust STATIS Dual (Phase 1)

Description

Plots the Hotelling T² statistic per batch using the robust center and compromise matrix estimated in robust_statis_phase1(). The control limit is based on a Chi-squared distribution with degrees of freedom equal to the number of variables.

Usage

plot_statis_phase1_chart(
  batch_statistics,
  num_vars,
  title = "Robust STATIS Dual Control Chart - Phase 1"
)

Arguments

batch_statistics

A data frame with columns Batch and T2_Stat, typically from phase1_result$batch_statistics.

num_vars

Integer. Number of variables used in the multivariate analysis (to compute the Chi² threshold).

title

Optional string. Plot title.

Value

A ggplot2 object.

Examples

sim_batches <- simulate_pharma_batches()

# Phase 1 analysis: select under control batches from Phase 1
phase1_result <- robust_statis_phase1(
  data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

# Plot the Phase 1 robust control chart
plot_statis_phase1_chart(
  batch_statistics = phase1_result$batch_statistics,
  num_vars = 4
)

Plot STATIS Dual Robust Control Chart - Phase 2 Only

Description

Plots the robust Hotelling T² statistics for Phase 2 batches only, using the results from the robust STATIS Dual method.

Usage

plot_statis_phase2_chart(
  phase2_result,
  title = "Robust STATIS Dual Control Chart - Phase 2"
)

Arguments

phase2_result

A list returned by robust_statis_phase2(), including t2_stats_by_batch with Hotelling T² values and a control threshold.

title

Optional string. Plot title.

Value

A ggplot2 object representing the control chart for Phase 2 batches.

Examples

sim_batches <- simulate_pharma_batches()
phase1 <- robust_statis_phase1(
  data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
phase2 <- robust_statis_phase2(
  new_data = subset(sim_batches, Phase == "Phase 2"),
  variables = c("Concentration", "Humidity", "Dissolution", "Density"),
  medians = phase1$global_medians,
  mads = phase1$global_mads,
  compromise_matrix = phase1$compromise_matrix,
  global_center = phase1$global_center
)
plot_statis_phase2_chart(phase2_result = phase2)

Robust STATIS Dual - Phase 1 (Under Control Batches)

Description

Applies the Robust STATIS Dual methodology to Phase 1 data (under control batches), using robust batch-wise standardization (median and MAD ). Covariance matrices are robustly estimated using the MCD method and used directly (without trace normalization) to construct the compromise matrix.

Usage

robust_statis_phase1(data, variables)

Arguments

data

A data frame containing the process data with batch information.

variables

Character vector with the names of the variables to be used in the analysis.

Value

A list containing:

compromise_matrix

Robust compromise matrix (without trace normalization)

global_center

Global robust center of the batches

batch_statistics

Data frame with Batch, T2_Stat (Hotelling-type robust statistic), and Weight

batch_medians

List of medians per batch and variable

batch_mads

List of MADs per batch and variable

global_medians

Global medians per variable (for use in Phase 2)

global_mads

Global MADs per variable

robust_means

List of robust centers of each batch (estimated by MCD)

standardized_data

Data set standardized batch by batch

robust_covariances

List of robust covariance matrices per batch

similarity_matrix

Hilbert-Schmidt similarity matrix between batches

statis_weights

Weights obtained from the first eigenvector of the similarity matrix

first_eigenvector

First eigenvector of the similarity matrix (unnormalized)

Examples

# Simulate new pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()

# Select only Phase 1 under control batches
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")

# Apply robust STATIS Dual methodology
result <- robust_statis_phase1(
  data = phase1_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

# View main outputs
result$compromise_matrix
result$batch_statistics
result$robust_covariances
result$similarity_matrix
result$statis_weights
result$robust_means

Robust STATIS Dual - Phase 2 (New Batches Evaluation)

Description

Applies the robust STATIS Dual control chart methodology to evaluate new batches, using the compromise matrix and the global robust center obtained in Phase 1. Each batch is summarized using a robust Hotelling-type \( T^2 \) statistic.

Usage

robust_statis_phase2(
  new_data,
  variables,
  medians,
  mads,
  compromise_matrix,
  global_center
)

Arguments

new_data

A data frame containing the new batches to evaluate.

variables

Character vector with the names of the variables to be used.

medians

Named numeric vector containing the global medians obtained in Phase 1.

mads

Named numeric vector containing the scaled MADs obtained in Phase 1.

compromise_matrix

Robust compromise matrix computed in Phase 1.

global_center

Robust global center obtained in Phase 1.

Value

A list containing:

standardized_data

Data frame with the new batches standardized using the global medians and scaled MADs.

t2_stats_by_batch

Data frame with the Hotelling-type \( T^2 \) statistics per batch.

threshold

Control limit based on the Chi-squared distribution (0.9973 quantile, degrees of freedom equal to the number of variables).

Examples

# Simulate new pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()

# Phase 1 analysis: use only Phase 1 and under control batches
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")
phase1 <- robust_statis_phase1(
  data = phase1_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density")
)

# Phase 2 analysis: evaluate new batches (Phase 2)
new_data <- subset(sim_batches, Phase == "Phase 2")
result_phase2 <- robust_statis_phase2(
  new_data = new_data,
  variables = c("Concentration", "Humidity", "Dissolution", "Density"),
  medians = phase1$global_medians,
  mads = phase1$global_mads,
  compromise_matrix = phase1$compromise_matrix,
  global_center = phase1$global_center
)

# View main outputs
result_phase2$t2_stats_by_batch
result_phase2$threshold

Launch STATIS Dual Robust Dashboard (Shiny)

Description

Launches an interactive Shiny dashboard that includes:

Usage

run_statis_dashboard()

Value

No return value, called for side effects (launches a Shiny application).

Examples

if (interactive()) {
  run_statis_dashboard()
}

Simulate Pharmaceutical Manufacturing Batches (Realistic Variability)

Description

Simulates pharmaceutical manufacturing batches across two phases. Phase 1 includes 10 under-control batches, each with natural variability in mean and covariance. Phase 2 includes 2 clean under-control batches and 3 out-of-control batches with shifted mean, increased dispersion, and moderate contamination.

Usage

simulate_pharma_batches(obs_per_batch = 30, seed = 780)

Arguments

obs_per_batch

Integer. Number of observations per batch. Default is 30.

seed

Optional integer. If provided, sets a random seed for reproducibility.

Details

The simulated data includes four quality control variables: Concentration, Humidity, Dissolution, and Density.

Value

A data frame with 450 observations and the following columns:

Batch

Factor. Batch identifier (Batch_1 to Batch_15).

Phase

Factor. Phase of the process: "Phase 1" or "Phase 2".

Status

Factor. Control status: "Under Control" or "Out of Control".

Concentration, Humidity, Dissolution, Density

Numeric quality control variables.