Title: | Robust Hotelling-Type T² Control Chart Based on the Dual STATIS Approach |
Version: | 0.1.0 |
Description: | Implements a robust multivariate control-chart methodology for batch-based industrial processes with multiple correlated variables using the Dual STATIS (Structuration des Tableaux A Trois Indices de la Statistique) framework. A robust compromise covariance matrix is constructed from Phase I batches with the Minimum Covariance Determinant (MCD) estimator, and a Hotelling-type T² statistic is applied for anomaly detection in Phase II. The package includes functions to simulate clean and contaminated batches, to compute both robust and classical Hotelling T² control charts, to visualize results via robust biplots, and to launch an interactive 'shiny' dashboard. An internal dataset (pharma_data) is provided for reproducibility. See Lavit, Escoufier, Sabatier and Traissac (1994) <doi:10.1016/0167-9473(94)90134-1> for the original STATIS methodology, and Rousseeuw and Van Driessen (1999) <doi:10.1080/00401706.1999.10485670> for the MCD estimator. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | dplyr, MASS, ggplot2, rrcov, shiny, ggrepel, forcats, Matrix |
Depends: | R (≥ 3.5) |
URL: | https://github.com/SergioDanielFG/robustT2 |
BugReports: | https://github.com/SergioDanielFG/robustT2/issues |
Suggests: | spelling |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2025-09-05 12:19:03 UTC; sergi |
Author: | Sergio Daniel Frutos Galarza
|
Maintainer: | Sergio Daniel Frutos Galarza <sergio_dan88@hotmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-09-10 08:00:02 UTC |
Classical Hotelling T2 Chart - Phase 1
Description
Applies the classical Hotelling T2 methodology to Phase 1 data, using the sample mean and covariance matrix.
Usage
hotelling_t2_phase1(data, variables)
Arguments
data |
A data frame containing Phase 1 data (under control batches). |
variables |
A character vector with the names of the quantitative variables to be used. |
Value
A list with:
center: classical mean vector
covariance: classical covariance matrix
batch_statistics: data frame with T2_Stat per batch
threshold: Chi-squared control limit (0.9973 quantile)
Classical Hotelling T2 Chart - Phase 2
Description
Evaluates new batches (Phase 2) using T2 statistics based on Phase 1 estimators.
Usage
hotelling_t2_phase2(new_data, variables, center, covariance)
Arguments
new_data |
A data frame with new batches to evaluate (Phase 2). |
variables |
Character vector of quantitative variables. |
center |
Mean vector from Phase 1. |
covariance |
Covariance matrix from Phase 1. |
Value
A list with:
batch_statistics: data frame with T2_Stat per new batch
threshold: Chi-squared control limit (0.9973 quantile)
Simulated Pharmaceutical Manufacturing Data
Description
This dataset contains simulated pharmaceutical manufacturing data generated by
simulate_pharma_batches()
with seed = 780
and obs_per_batch = 30
.
Usage
data("pharma_data")
Format
A data frame with 450 rows and 7 variables:
- Batch
Batch identifier (factor)
- Phase
Phase indicator: "Phase 1" or "Phase 2" (factor)
- Status
Batch status: "Under Control" or "Out of Control" (factor)
- Concentration
Concentration of active ingredient (mg/mL)
- Humidity
Humidity percentage (% w/w)
- Dissolution
Dissolution percentage (% released)
- Density
Density (g/cm
^3
)
Details
Phase 1 includes 10 under-control batches with natural variability in mean and covariance, without contamination.
Phase 2 includes 2 additional under-control batches and 3 out-of-control batches. The out-of-control batches exhibit shifts in both mean and variability, along with moderate contamination in a portion of their observations.
Each batch contains 30 observations measured across four quantitative quality-control variables.
Source
Simulated using simulate_pharma_batches
with seed = 780
and obs_per_batch = 30
.
Plot Classical Hotelling T2 Control Chart
Description
Plots the classical Hotelling T2 statistics per batch with a uniform color line. Batches are evaluated against a control threshold obtained from the chi-squared distribution with degrees of freedom equal to the number of variables.
Usage
plot_classical_hotelling_t2_chart(
t2_statistics,
num_vars,
title = "Classical Hotelling T2 Control Chart"
)
Arguments
t2_statistics |
A data frame with columns |
num_vars |
Integer. Number of variables used in the multivariate analysis (to compute the Chi² threshold). |
title |
Optional string. Plot title. |
Value
A ggplot2 object representing the control chart.
Examples
# Simulate pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()
# Phase 1 analysis: use Phase 1 data
phase1_data <- subset(sim_batches, Phase == "Phase 1")
# Apply classical Hotelling T2 methodology
t2_result <- hotelling_t2_phase1(
data = phase1_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
# Plot classical Hotelling T2 control chart
plot_classical_hotelling_t2_chart(
t2_statistics = t2_result$batch_statistics,
num_vars = 4
)
Plot Classical Hotelling T2 Control Chart - Phase 2
Description
Plots the classical Hotelling T² statistics per batch for Phase 2 data, using the reference mean and covariance matrix estimated from Phase 1. Batches are color-coded by control status ("Under Control" = blue, "Out of Control" = red).
Usage
plot_classical_hotelling_t2_phase2_chart(
t2_statistics,
num_vars,
title = "Classical Hotelling T2 Control Chart (Phase 2)"
)
Arguments
t2_statistics |
A data frame with columns |
num_vars |
Integer. Number of variables used in the multivariate analysis (degrees of freedom for Chi²). |
title |
Optional string. Plot title. |
Value
A ggplot2 object with the Phase 2 control chart.
Examples
# Simulate pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()
# Split by phase
phase1_data <- subset(sim_batches, Phase == "Phase 1")
phase2_data <- subset(sim_batches, Phase == "Phase 2")
# Fit Phase 1 classical estimators
t2_phase1 <- hotelling_t2_phase1(
data = phase1_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
# Evaluate Phase 2 batches
t2_phase2 <- hotelling_t2_phase2(
new_data = phase2_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density"),
center = t2_phase1$center,
covariance = t2_phase1$covariance
)
# Combine with status for plotting
status_info <- phase2_data[!duplicated(phase2_data$Batch), "Status"]
t2_phase2_plot <- cbind(t2_phase2$batch_statistics, Status = status_info)
# Plot Phase 2 control chart
plot_classical_hotelling_t2_phase2_chart(
t2_statistics = t2_phase2_plot,
num_vars = 4
)
HJ-Biplot Projection - Robust STATIS Dual (Phase 2)
Description
Projects new batches from Phase 2 into the HJ-Biplot space defined by the robust compromise matrix and eigen decomposition from Phase 1.
Usage
plot_statis_biplot_projection(phase1_result, phase2_result, dims = c(1, 2))
Arguments
phase1_result |
Result from |
phase2_result |
Result from |
dims |
Dimensions to plot (default: c(1, 2)). |
Details
This implementation follows the HJ-Biplot formulation of Galindo-Villardón (1986).
The compromise matrix C
, being symmetric and positive semidefinite, is
decomposed via an eigen decomposition (not a rectangular SVD). The square roots
of eigenvalues are used to build the biplot scaling, consistent with robust STATIS Dual.
Value
A ggplot2 object with the projected HJ-Biplot for Phase 2 batches.
Examples
sim_batches <- simulate_pharma_batches()
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")
phase2_data <- subset(sim_batches, Phase == "Phase 2")
phase1 <- robust_statis_phase1(
data = phase1_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
phase2 <- robust_statis_phase2(
new_data = phase2_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density"),
medians = phase1$global_medians,
mads = phase1$global_mads,
compromise_matrix = phase1$compromise_matrix,
global_center = phase1$global_center
)
plot_statis_biplot_projection(phase1, phase2)
HJ-Biplot of Robust STATIS Dual Compromise (Galindo-Villardón)
Description
Generates an HJ-Biplot using the compromise matrix obtained from robust STATIS Dual. Individuals (batch centers) are projected as G = U D, and variables as H = V D, where D is the diagonal matrix of square roots of eigenvalues.
Usage
plot_statis_hj_biplot(
phase1_result,
dims = c(1, 2),
color_by = c("none", "weight", "distance"),
highlight_batches = NULL
)
Arguments
phase1_result |
Result from robust_statis_phase1(). |
dims |
Dimensions to plot (default: c(1, 2)). |
color_by |
One of "none", "weight", or "distance" for coloring batches. |
highlight_batches |
Optional vector of batch names to emphasize. |
Value
ggplot2 object with HJ-Biplot.
Examples
sim_batches <- simulate_pharma_batches()
phase1 <- robust_statis_phase1(
data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
plot_statis_hj_biplot(phase1)
Plot Control Chart - Robust STATIS Dual (Phase 1)
Description
Plots the Hotelling T² statistic per batch using the robust center and compromise
matrix estimated in robust_statis_phase1()
. The control limit is based on a
Chi-squared distribution with degrees of freedom equal to the number of variables.
Usage
plot_statis_phase1_chart(
batch_statistics,
num_vars,
title = "Robust STATIS Dual Control Chart - Phase 1"
)
Arguments
batch_statistics |
A data frame with columns |
num_vars |
Integer. Number of variables used in the multivariate analysis (to compute the Chi² threshold). |
title |
Optional string. Plot title. |
Value
A ggplot2 object.
Examples
sim_batches <- simulate_pharma_batches()
# Phase 1 analysis: select under control batches from Phase 1
phase1_result <- robust_statis_phase1(
data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
# Plot the Phase 1 robust control chart
plot_statis_phase1_chart(
batch_statistics = phase1_result$batch_statistics,
num_vars = 4
)
Plot STATIS Dual Robust Control Chart - Phase 2 Only
Description
Plots the robust Hotelling T² statistics for Phase 2 batches only, using the results from the robust STATIS Dual method.
Usage
plot_statis_phase2_chart(
phase2_result,
title = "Robust STATIS Dual Control Chart - Phase 2"
)
Arguments
phase2_result |
A list returned by |
title |
Optional string. Plot title. |
Value
A ggplot2 object representing the control chart for Phase 2 batches.
Examples
sim_batches <- simulate_pharma_batches()
phase1 <- robust_statis_phase1(
data = subset(sim_batches, Phase == "Phase 1" & Status == "Under Control"),
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
phase2 <- robust_statis_phase2(
new_data = subset(sim_batches, Phase == "Phase 2"),
variables = c("Concentration", "Humidity", "Dissolution", "Density"),
medians = phase1$global_medians,
mads = phase1$global_mads,
compromise_matrix = phase1$compromise_matrix,
global_center = phase1$global_center
)
plot_statis_phase2_chart(phase2_result = phase2)
Robust STATIS Dual - Phase 1 (Under Control Batches)
Description
Applies the Robust STATIS Dual methodology to Phase 1 data (under control batches), using robust batch-wise standardization (median and MAD ). Covariance matrices are robustly estimated using the MCD method and used directly (without trace normalization) to construct the compromise matrix.
Usage
robust_statis_phase1(data, variables)
Arguments
data |
A data frame containing the process data with batch information. |
variables |
Character vector with the names of the variables to be used in the analysis. |
Value
A list containing:
- compromise_matrix
Robust compromise matrix (without trace normalization)
- global_center
Global robust center of the batches
- batch_statistics
Data frame with Batch, T2_Stat (Hotelling-type robust statistic), and Weight
- batch_medians
List of medians per batch and variable
- batch_mads
List of MADs per batch and variable
- global_medians
Global medians per variable (for use in Phase 2)
- global_mads
Global MADs per variable
- robust_means
List of robust centers of each batch (estimated by MCD)
- standardized_data
Data set standardized batch by batch
- robust_covariances
List of robust covariance matrices per batch
- similarity_matrix
Hilbert-Schmidt similarity matrix between batches
- statis_weights
Weights obtained from the first eigenvector of the similarity matrix
- first_eigenvector
First eigenvector of the similarity matrix (unnormalized)
Examples
# Simulate new pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()
# Select only Phase 1 under control batches
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")
# Apply robust STATIS Dual methodology
result <- robust_statis_phase1(
data = phase1_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
# View main outputs
result$compromise_matrix
result$batch_statistics
result$robust_covariances
result$similarity_matrix
result$statis_weights
result$robust_means
Robust STATIS Dual - Phase 2 (New Batches Evaluation)
Description
Applies the robust STATIS Dual control chart methodology to evaluate new batches, using the compromise matrix and the global robust center obtained in Phase 1. Each batch is summarized using a robust Hotelling-type \( T^2 \) statistic.
Usage
robust_statis_phase2(
new_data,
variables,
medians,
mads,
compromise_matrix,
global_center
)
Arguments
new_data |
A data frame containing the new batches to evaluate. |
variables |
Character vector with the names of the variables to be used. |
medians |
Named numeric vector containing the global medians obtained in Phase 1. |
mads |
Named numeric vector containing the scaled MADs obtained in Phase 1. |
compromise_matrix |
Robust compromise matrix computed in Phase 1. |
global_center |
Robust global center obtained in Phase 1. |
Value
A list containing:
- standardized_data
Data frame with the new batches standardized using the global medians and scaled MADs.
- t2_stats_by_batch
Data frame with the Hotelling-type \( T^2 \) statistics per batch.
- threshold
Control limit based on the Chi-squared distribution (0.9973 quantile, degrees of freedom equal to the number of variables).
Examples
# Simulate new pharmaceutical manufacturing batches
sim_batches <- simulate_pharma_batches()
# Phase 1 analysis: use only Phase 1 and under control batches
phase1_data <- subset(sim_batches, Phase == "Phase 1" & Status == "Under Control")
phase1 <- robust_statis_phase1(
data = phase1_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density")
)
# Phase 2 analysis: evaluate new batches (Phase 2)
new_data <- subset(sim_batches, Phase == "Phase 2")
result_phase2 <- robust_statis_phase2(
new_data = new_data,
variables = c("Concentration", "Humidity", "Dissolution", "Density"),
medians = phase1$global_medians,
mads = phase1$global_mads,
compromise_matrix = phase1$compromise_matrix,
global_center = phase1$global_center
)
# View main outputs
result_phase2$t2_stats_by_batch
result_phase2$threshold
Launch STATIS Dual Robust Dashboard (Shiny)
Description
Launches an interactive Shiny dashboard that includes:
Phase 1 control chart (sum of robust Mahalanobis distances)
Phase 2 control chart (for new batches)
HJ-Biplot visualization
Usage
run_statis_dashboard()
Value
No return value, called for side effects (launches a Shiny application).
Examples
if (interactive()) {
run_statis_dashboard()
}
Simulate Pharmaceutical Manufacturing Batches (Realistic Variability)
Description
Simulates pharmaceutical manufacturing batches across two phases. Phase 1 includes 10 under-control batches, each with natural variability in mean and covariance. Phase 2 includes 2 clean under-control batches and 3 out-of-control batches with shifted mean, increased dispersion, and moderate contamination.
Usage
simulate_pharma_batches(obs_per_batch = 30, seed = 780)
Arguments
obs_per_batch |
Integer. Number of observations per batch. Default is 30. |
seed |
Optional integer. If provided, sets a random seed for reproducibility. |
Details
The simulated data includes four quality control variables: Concentration, Humidity, Dissolution, and Density.
Value
A data frame with 450 observations and the following columns:
- Batch
Factor. Batch identifier (Batch_1 to Batch_15).
- Phase
Factor. Phase of the process: "Phase 1" or "Phase 2".
- Status
Factor. Control status: "Under Control" or "Out of Control".
- Concentration, Humidity, Dissolution, Density
Numeric quality control variables.