| Type: | Package |
| Title: | Inference on the Overlap Coefficient |
| Version: | 0.1.1 |
| Maintainer: | Alba M. Franco-Pereira <albfranc@ucm.es> |
| Description: | Provides functions to construct confidence intervals for the Overlap Coefficient (OVL). OVL measures the similarity between two distributions through the overlapping area of their distribution functions. Given its intuitive description and ease of visual representation by the straightforward depiction of the amount of overlap between the two corresponding histograms based on samples of measurements from each one of the two distributions, the development of accurate methods for confidence interval construction can be useful for applied researchers. Implements methods based on the work of Franco-Pereira, A.M., Nakas, C.T., Reiser, B., and Pardo, M.C. (2021) <doi:10.1177/09622802211046386> as well as extensions for multimodal distributions proposed by Alcaraz-Peñalba, A., Franco-Pereira, A., and Pardo, M.C. (2025) <doi:10.1007/s10182-025-00545-2>. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| Language: | en-US |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Imports: | ks, Matrix, mixtools, stats |
| Depends: | R (≥ 3.5) |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-23 14:39:32 UTC; Alba |
| Author: | Alba M. Franco-Pereira [aut, cre, cph], Christos T. Nakas [aut], Benjamin Reiser [aut], M.Carmen Pardo [aut], Alba Alcaraz-Peñalba [aut, cph] |
| Repository: | CRAN |
| Date/Publication: | 2026-01-23 19:50:39 UTC |
EM algorithm for a univariate Gaussian mixture
Description
Fits a univariate Gaussian mixture model using the Expectation-Maximization (EM) algorithm. The function is intended as a lightweight fallback implementation (e.g., when mixtools is unavailable or fails).
Usage
EM(X, K = 2, max_iter = 100, tol = 1e-05)
Arguments
X |
Numeric vector of observations. |
K |
Integer. Number of mixture components. |
max_iter |
Integer. Maximum number of EM iterations. |
tol |
Positive numeric. Convergence tolerance for the absolute change in the log-likelihood. |
Details
The algorithm is initialized using the k-means clustering procedure and then alternates between:
E-step: computing the expectation of the complete log-likelihood function.
M-step: maximizing the expectation of the complete log-likelihood function.
Value
A list with the following components:
- mu
Numeric vector of estimated component means (length
K).- sigma
Numeric vector of estimated component standard deviations (length
K).- pi
Numeric vector of estimated mixing proportions (length
K).- num_iteraciones
Number of iterations performed.
- posterior
Matrix of posterior probabilities (responsibilities) with dimension
length(X)byK.
Examples
set.seed(1)
x <- c(rnorm(100, -2, 1), rnorm(100, 2, 1))
fit <- EM(x, K = 2)
fit$mu
fit$pi
Fisher information matrix for a two-component Gaussian mixture (working approximation).
Description
Computes a Fisher information matrix approximation based on the outer product of gradients for a two-component univariate Gaussian mixture model.
Usage
FIM_mixture_normals(data, params)
Arguments
data |
Numeric vector of observations. |
params |
List with elements |
Value
Fisher information approximation
OVL.BCAN
Description
Parametric approach using a bootstrap-based approach to estimate the variance.
Usage
OVL.BCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCAN (controls,cases)
OVL.BCPB
Description
Parametric approach using a bootstrap percentil approach.
Usage
OVL.BCPB(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCPB (controls,cases)
OVL.BCbias
Description
Parametric approach using a bootstrap bias-corrected approach.
Usage
OVL.BCbias(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCAN (controls,cases)
OVL.D
Description
Parametric approach using the delta method.
Usage
OVL.D(x, y, alpha = 0.05)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.D (controls,cases)
OVL.DBC
Description
Parametric approach using the delta method after the Box-Cox transformation.
Usage
OVL.DBC(x, y, alpha = 0.05, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.DBC (controls,cases)
OVL.DBCL
Description
Parametric approach using the delta method after the Box-Cox transformation taking into account the variability of the estimated transformation parameter.
Usage
OVL.DBCL(x, y, alpha = 0.05, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.DBCL (controls,cases)
EM-Delta
Description
Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using EM-based estimation and the delta method.
Usage
OVL.Delta.mix(
x,
y,
alpha = 0.05,
h = 10^(-5),
interv = c(0, 20),
all_mix = FALSE
)
Arguments
x |
Numeric vector. Data from the first group. When |
y |
Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture. |
alpha |
confidence level. |
h |
Step size used to compute numerical derivatives. |
interv |
Numeric vector of length 2. Search interval for intersection points between the corresponding densities. |
all_mix |
Logical. If |
Value
A list containing a confidence interval.
Additional elements (e.g., var_OVL, parameter estimates, OVL_hat) may also be returned.
Examples
set.seed(1)
x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1))
y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1))
res <- OVL.Delta.mix(x, y, all_mix = TRUE, interv = c(-10, 10))
res$IC1
res$IC2
OVL.GPQ
Description
Parametric approach based on generalized inference.
Usage
OVL.GPQ(x, y, alpha = 0.05, K = 2500, h_ini = -1.6, BC = FALSE)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
K |
Number of simulated generalized pivotal quantities. |
h_ini |
initial value in the optimization problem. |
BC |
Logical. Indicates whether a Box–Cox transformation is applied to the data. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.GPQ (controls,cases)
GPQ-Mix
Description
Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using generalized inference.
Usage
OVL.GPQ.mix(x, y, alpha = 0.05, interv = c(0, 20), k = 1000, all_mix = FALSE)
Arguments
x |
Numeric vector. Data from the first group. When |
y |
Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture. |
alpha |
confidence level. |
interv |
Numeric vector of length 2. Search interval for intersection points between the corresponding densities. |
k |
Number of simulated generalized pivotal quantities. |
all_mix |
Logical. If |
Value
confidence interval.
Examples
set.seed(1)
x <- ifelse(runif(100) < 0.5,
rnorm(100, mean = 0, sd = 1),
rnorm(100, mean = 2, sd = 1))
y <- ifelse(runif(100) < 0.5,
rnorm(100, mean = 2.5, sd = 1),
rnorm(100, mean = 2, sd = 1))
res <- OVL.GPQ.mix(x, y, all_mix = TRUE, interv = c(-10, 10))
res$IC1
res$IC2
OVL.K
Description
Kernel approach estimating the variance via bootstrap.
Usage
OVL.K(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.K (controls,cases)
OVL.KPB
Description
Kernel approach using a bootstrap percentile approach.
Usage
OVL.KPB(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.KPB (controls,cases)
OVL.LogitBCAN
Description
BCAN procedure carried out in the logit scale and back-transformed.
Usage
OVL.LogitBCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitBCAN (controls,cases)
OVL.LogitD
Description
Parametric approach using the delta method after switching to a logit scale and then transforming back.
Usage
OVL.LogitD(x, y, alpha = 0.05)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitD (controls,cases)
OVL.LogitDBC
Description
Parametric approach using the delta method after the Box-Cox transformation after switching to a logit scale and then transforming back.
Usage
OVL.LogitDBC(x, y, alpha = 0.05, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitDBC (controls,cases)
OVL.LogitDBCL
Description
OVL.LogitDBCL
Usage
OVL.LogitDBCL(x, y, alpha = 0.05, h_ini = -0.6)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitDBCL (controls,cases)
OVL.LogitK
Description
Kernel approach estimating the variance via bootstrap in the logit scale and back-transformed.
Usage
OVL.LogitK(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
Arguments
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
Value
confidence interval.
Examples
controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitK (controls,cases)
Numerical derivatives of the overlap functional (normal vs. 2-component mixture).
Description
Computes finite-difference approximations of the partial derivatives of the overlap coefficient with respect to the parameters in the case where the first group is modeled as a normal distribution and the second group as a two-component Gaussian mixture.
Usage
OVL_derivates_1(
mu1_hat,
mu2_hat,
sigma1_hat,
sigma2_hat,
pi2_hat,
h,
intersec,
OVL_mix
)
Arguments
mu1_hat |
Numeric scalar. Estimated mean for the normal group. |
mu2_hat |
Numeric vector of length 2. Estimated means for the mixture group. |
sigma1_hat |
Numeric scalar. Estimated standard deviation for the normal group. |
sigma2_hat |
Numeric vector of length 2. Estimated standard deviations for the mixture group. |
pi2_hat |
Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the mixture, or a numeric vector of length 2 with elements summing to 1. |
h |
Positive numeric scalar. Base step size for finite differences. |
intersec |
Numeric vector. Intersection points used as cutpoints. |
OVL_mix |
Function that evaluates the overlap-related expression at cutpoints. |
Details
This function is intended for internal use (delta-method variance estimation).
Value
A list with components deriv1–deriv7.
Numerical derivatives of the overlap functional (2-component mixture vs. 2-component mixture).
Description
Computes finite-difference approximations of the partial derivatives of the overlap coefficient with respect to the parameters when both groups are modeled as two-component Gaussian mixtures.
Usage
OVL_derivates_2(
mu1_hat,
mu2_hat,
sigma1_hat,
sigma2_hat,
pi1_hat,
pi2_hat,
h,
intersec,
OVL_mix
)
Arguments
mu1_hat |
Numeric vector of length 2. Estimated means for the first mixture. |
mu2_hat |
Numeric vector of length 2. Estimated means for the second mixture. |
sigma1_hat |
Numeric vector of length 2. Estimated standard deviations for the first mixture. |
sigma2_hat |
Numeric vector of length 2. Estimated standard deviations for the second mixture. |
pi1_hat |
Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the first mixture, or a numeric vector of length 2 with elements summing to 1. |
pi2_hat |
Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the second mixture, or a numeric vector of length 2 with elements summing to 1. |
h |
Positive numeric scalar. Base step size for finite differences. |
intersec |
Numeric vector. Intersection points used as cutpoints. |
OVL_mix |
Function that evaluates the overlap-related expression at cutpoints. |
Details
This function is intended for internal use (delta-method variance estimation).
Value
A list with components deriv1–deriv10.
Computes the overlap coefficient (OVL) between two cumulative distribution functions corresponding to finite mixtures of two normal distributions.
Description
Computes the overlap coefficient (OVL) between two cumulative distribution functions corresponding to finite mixtures of two normal distributions.
Usage
OVL_mix(mu1, mu2, sigma1, sigma2, pi_H, pi_D, x)
Arguments
mu1 |
Numeric vector of length 2 containing the means of the first mixture. |
mu2 |
Numeric vector of length 2 containing the means of the second mixture. |
sigma1 |
Numeric vector of length 2 containing the standard deviations of the first mixture. |
sigma2 |
Numeric vector of length 2 containing the standard deviations of the second mixture. |
pi_H |
Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the first mixture, or a numeric vector of length 2 with elements summing to 1. |
pi_D |
Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the second mixture, or a numeric vector of length 2 with elements summing to 1. |
x |
Numeric vector of intersection points between the two mixture densities. |
Details
Mixing proportions equal to 0 or 1 are allowed, in which case the corresponding mixture reduces to a single normal distribution.
Value
A numeric value corresponding to the OVL between the two mixture distributions.
Evaluates an auxiliary function.
Description
Evaluates an auxiliary function.
Usage
U(mu1, mu2, sigma1, sigma2)
Arguments
mu1 |
sample mean of a vector x. |
mu2 |
sample mean of a vector y. |
sigma1 |
sample standard deviation of a vector x. |
sigma2 |
sample standard deviation of a vector y. |
Value
evaluation of an auxiliary function.
Computes the probability density function of a finite mixture of normal distributions at a given point or vector of points.
Description
Computes the probability density function of a finite mixture of normal distributions at a given point or vector of points.
Usage
dnorm_mixture(x, mu, sigma, pi)
Arguments
x |
Numeric vector of points at which the density is evaluated. |
mu |
Numeric vector of component means. |
sigma |
Numeric vector of component standard deviations. |
pi |
Numeric vector of mixing proportions. Must have the same length
as |
Value
A numeric vector containing the values of the mixture density evaluated
at x.
Identifies the roots of a univariate function over a given interval by
subdividing the interval into smaller subintervals and applying
uniroot on those subintervals where a sign change is detected.
Description
Identifies the roots of a univariate function over a given interval by
subdividing the interval into smaller subintervals and applying
uniroot on those subintervals where a sign change is detected.
Usage
encontrar_raices(intersection_function, interval, n_subintervals = 10)
Arguments
intersection_function |
A univariate numeric function whose roots are to be located. |
interval |
Numeric vector of length 2 specifying the lower and upper bounds of the search interval. |
n_subintervals |
Integer. Number of subintervals used to partition
|
Value
A numeric vector containing the distinct roots found within interval.
If no roots are detected, an empty numeric vector is returned.
Evaluates the Epanechnikov kernel.
Description
Evaluates the Epanechnikov kernel.
Usage
kernel.e(u)
Arguments
u |
vector of observations. |
Value
evaluation of the Epanechnikov kernel.
Estimates the density function using the Epanechnikov kernel.
Description
Estimates the density function using the Epanechnikov kernel.
Usage
kernel.e.density(data, points, h)
Arguments
data |
vector of observations. |
points |
in which the function is evaluated. |
h |
bandwidth. |
Value
density estimation.
Evaluates the Gaussian kernel.
Description
Evaluates the Gaussian kernel.
Usage
kernel.g(u)
Arguments
u |
vector of observations. |
Value
evaluation of the Gaussian kernel.
Estimates the density function using the Gaussian kernel.
Description
Estimates the density function using the Gaussian kernel.
Usage
kernel.g.density(data, points, h)
Arguments
data |
vector of observations. |
points |
in which the function is evaluated. |
h |
bandwidth. |
Value
density estimation.
Computation of the likelihood function of the BoxCox transformation.
Description
Computation of the likelihood function of the BoxCox transformation.
Usage
likbox(h, data, n)
Arguments
h |
parameter of the Box-Cox transformation. |
data |
joint vector of controls (first) and cases. |
n |
length of the vector of controls. |
Value
the likelihood function of the BoxCox transformation.
Simulated data with normal and mixture of normal distributions
Description
Contains control and case samples generated from a normal distribution and a two-component normal mixture distribution, respectively.
Usage
data(mixnorm_data)
Format
A data frame with 100 rows and 2 variables:
- controls
Simulated data from a N(5,1) normal distribution.
- cases
Simulated data from a two-component normal mixture distribution: 0.8N(2,1) + 0.2N(3,1).
References
This dataset was artificially generated for the OVL.CI package.
Examples
data(mixnorm_data)
Computes the sample variance of a vector of observations.
Description
Computes the sample variance of a vector of observations.
Usage
ssdd(x)
Arguments
x |
vector of observations. |
Value
the sample variance.
Simulated data with normal distributions
Description
Contains controls and cases data from normal distributions.
Usage
data(test_data)
Format
A data frame with 100 rows and 2 variables:
- controls
Simulated data from a N(10,1)distribution for the control group.
- cases
Simulated data from a N(10.5,0.5)distribution for the case group.
References
This data set was artificially created for the OVL.CI package.
Examples
data(test_data)