Type: Package
Title: Inference on the Overlap Coefficient
Version: 0.1.1
Maintainer: Alba M. Franco-Pereira <albfranc@ucm.es>
Description: Provides functions to construct confidence intervals for the Overlap Coefficient (OVL). OVL measures the similarity between two distributions through the overlapping area of their distribution functions. Given its intuitive description and ease of visual representation by the straightforward depiction of the amount of overlap between the two corresponding histograms based on samples of measurements from each one of the two distributions, the development of accurate methods for confidence interval construction can be useful for applied researchers. Implements methods based on the work of Franco-Pereira, A.M., Nakas, C.T., Reiser, B., and Pardo, M.C. (2021) <doi:10.1177/09622802211046386> as well as extensions for multimodal distributions proposed by Alcaraz-Peñalba, A., Franco-Pereira, A., and Pardo, M.C. (2025) <doi:10.1007/s10182-025-00545-2>.
License: GPL-2
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.3.2
Imports: ks, Matrix, mixtools, stats
Depends: R (≥ 3.5)
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2026-01-23 14:39:32 UTC; Alba
Author: Alba M. Franco-Pereira [aut, cre, cph], Christos T. Nakas [aut], Benjamin Reiser [aut], M.Carmen Pardo [aut], Alba Alcaraz-Peñalba [aut, cph]
Repository: CRAN
Date/Publication: 2026-01-23 19:50:39 UTC

EM algorithm for a univariate Gaussian mixture

Description

Fits a univariate Gaussian mixture model using the Expectation-Maximization (EM) algorithm. The function is intended as a lightweight fallback implementation (e.g., when mixtools is unavailable or fails).

Usage

EM(X, K = 2, max_iter = 100, tol = 1e-05)

Arguments

X

Numeric vector of observations.

K

Integer. Number of mixture components.

max_iter

Integer. Maximum number of EM iterations.

tol

Positive numeric. Convergence tolerance for the absolute change in the log-likelihood.

Details

The algorithm is initialized using the k-means clustering procedure and then alternates between:

  1. E-step: computing the expectation of the complete log-likelihood function.

  2. M-step: maximizing the expectation of the complete log-likelihood function.

Value

A list with the following components:

mu

Numeric vector of estimated component means (length K).

sigma

Numeric vector of estimated component standard deviations (length K).

pi

Numeric vector of estimated mixing proportions (length K).

num_iteraciones

Number of iterations performed.

posterior

Matrix of posterior probabilities (responsibilities) with dimension length(X) by K.

Examples

set.seed(1)
x <- c(rnorm(100, -2, 1), rnorm(100, 2, 1))
fit <- EM(x, K = 2)
fit$mu
fit$pi


Fisher information matrix for a two-component Gaussian mixture (working approximation).

Description

Computes a Fisher information matrix approximation based on the outer product of gradients for a two-component univariate Gaussian mixture model.

Usage

FIM_mixture_normals(data, params)

Arguments

data

Numeric vector of observations.

params

List with elements pi, mu1, mu2, sigma1, sigma2.

Value

Fisher information approximation


OVL.BCAN

Description

Parametric approach using a bootstrap-based approach to estimate the variance.

Usage

OVL.BCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCAN (controls,cases)

OVL.BCPB

Description

Parametric approach using a bootstrap percentil approach.

Usage

OVL.BCPB(x, y, alpha = 0.05, B = 100, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCPB (controls,cases)

OVL.BCbias

Description

Parametric approach using a bootstrap bias-corrected approach.

Usage

OVL.BCbias(x, y, alpha = 0.05, B = 100, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.BCAN (controls,cases)

OVL.D

Description

Parametric approach using the delta method.

Usage

OVL.D(x, y, alpha = 0.05)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.D (controls,cases)

OVL.DBC

Description

Parametric approach using the delta method after the Box-Cox transformation.

Usage

OVL.DBC(x, y, alpha = 0.05, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.DBC (controls,cases)

OVL.DBCL

Description

Parametric approach using the delta method after the Box-Cox transformation taking into account the variability of the estimated transformation parameter.

Usage

OVL.DBCL(x, y, alpha = 0.05, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.DBCL (controls,cases)

EM-Delta

Description

Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using EM-based estimation and the delta method.

Usage

OVL.Delta.mix(
  x,
  y,
  alpha = 0.05,
  h = 10^(-5),
  interv = c(0, 20),
  all_mix = FALSE
)

Arguments

x

Numeric vector. Data from the first group. When all_mix = FALSE, this group is modeled as Gaussian.

y

Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture.

alpha

confidence level.

h

Step size used to compute numerical derivatives.

interv

Numeric vector of length 2. Search interval for intersection points between the corresponding densities.

all_mix

Logical. If TRUE, both groups are modeled as two-component Gaussian mixtures. If FALSE, only y is modeled as a mixture and x is Gaussian.

Value

A list containing a confidence interval. Additional elements (e.g., var_OVL, parameter estimates, OVL_hat) may also be returned.

Examples

set.seed(1)
x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1))
y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1))
res <- OVL.Delta.mix(x, y, all_mix = TRUE, interv = c(-10, 10))
res$IC1
res$IC2


OVL.GPQ

Description

Parametric approach based on generalized inference.

Usage

OVL.GPQ(x, y, alpha = 0.05, K = 2500, h_ini = -1.6, BC = FALSE)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

K

Number of simulated generalized pivotal quantities.

h_ini

initial value in the optimization problem.

BC

Logical. Indicates whether a Box–Cox transformation is applied to the data.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.GPQ (controls,cases)

GPQ-Mix

Description

Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using generalized inference.

Usage

OVL.GPQ.mix(x, y, alpha = 0.05, interv = c(0, 20), k = 1000, all_mix = FALSE)

Arguments

x

Numeric vector. Data from the first group. When all_mix = FALSE, this group is modeled as Gaussian.

y

Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture.

alpha

confidence level.

interv

Numeric vector of length 2. Search interval for intersection points between the corresponding densities.

k

Number of simulated generalized pivotal quantities.

all_mix

Logical. If TRUE, both groups are modeled as two-component Gaussian mixtures. If FALSE, only y is modeled as a mixture and x is Gaussian.

Value

confidence interval.

Examples

set.seed(1)
x <- ifelse(runif(100) < 0.5,
            rnorm(100, mean = 0, sd = 1),
            rnorm(100, mean = 2, sd = 1))
y <- ifelse(runif(100) < 0.5,
            rnorm(100, mean = 2.5, sd = 1),
            rnorm(100, mean = 2, sd = 1))
res <- OVL.GPQ.mix(x, y, all_mix = TRUE, interv = c(-10, 10))
res$IC1
res$IC2

OVL.K

Description

Kernel approach estimating the variance via bootstrap.

Usage

OVL.K(x, y, alpha = 0.05, B = 100, k = 1, h = 1)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

k

kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead.

h

bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.K (controls,cases)

OVL.KPB

Description

Kernel approach using a bootstrap percentile approach.

Usage

OVL.KPB(x, y, alpha = 0.05, B = 100, k = 1, h = 1)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

k

kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead.

h

bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.KPB (controls,cases)

OVL.LogitBCAN

Description

BCAN procedure carried out in the logit scale and back-transformed.

Usage

OVL.LogitBCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitBCAN (controls,cases)

OVL.LogitD

Description

Parametric approach using the delta method after switching to a logit scale and then transforming back.

Usage

OVL.LogitD(x, y, alpha = 0.05)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitD (controls,cases)

OVL.LogitDBC

Description

Parametric approach using the delta method after the Box-Cox transformation after switching to a logit scale and then transforming back.

Usage

OVL.LogitDBC(x, y, alpha = 0.05, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitDBC (controls,cases)

OVL.LogitDBCL

Description

OVL.LogitDBCL

Usage

OVL.LogitDBCL(x, y, alpha = 0.05, h_ini = -0.6)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

h_ini

initial value in the optimization problem.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitDBCL (controls,cases)

OVL.LogitK

Description

Kernel approach estimating the variance via bootstrap in the logit scale and back-transformed.

Usage

OVL.LogitK(x, y, alpha = 0.05, B = 100, k = 1, h = 1)

Arguments

x

Numeric vector. Data from the first group.

y

Numeric vector. Data from the second group.

alpha

confidence level.

B

bootstrap size.

k

kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead.

h

bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead.

Value

confidence interval.

Examples

controls = rnorm(50,6,1)
cases = rnorm(100,6.5,0.5)
OVL.LogitK (controls,cases)

Numerical derivatives of the overlap functional (normal vs. 2-component mixture).

Description

Computes finite-difference approximations of the partial derivatives of the overlap coefficient with respect to the parameters in the case where the first group is modeled as a normal distribution and the second group as a two-component Gaussian mixture.

Usage

OVL_derivates_1(
  mu1_hat,
  mu2_hat,
  sigma1_hat,
  sigma2_hat,
  pi2_hat,
  h,
  intersec,
  OVL_mix
)

Arguments

mu1_hat

Numeric scalar. Estimated mean for the normal group.

mu2_hat

Numeric vector of length 2. Estimated means for the mixture group.

sigma1_hat

Numeric scalar. Estimated standard deviation for the normal group.

sigma2_hat

Numeric vector of length 2. Estimated standard deviations for the mixture group.

pi2_hat

Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the mixture, or a numeric vector of length 2 with elements summing to 1.

h

Positive numeric scalar. Base step size for finite differences.

intersec

Numeric vector. Intersection points used as cutpoints.

OVL_mix

Function that evaluates the overlap-related expression at cutpoints.

Details

This function is intended for internal use (delta-method variance estimation).

Value

A list with components deriv1deriv7.


Numerical derivatives of the overlap functional (2-component mixture vs. 2-component mixture).

Description

Computes finite-difference approximations of the partial derivatives of the overlap coefficient with respect to the parameters when both groups are modeled as two-component Gaussian mixtures.

Usage

OVL_derivates_2(
  mu1_hat,
  mu2_hat,
  sigma1_hat,
  sigma2_hat,
  pi1_hat,
  pi2_hat,
  h,
  intersec,
  OVL_mix
)

Arguments

mu1_hat

Numeric vector of length 2. Estimated means for the first mixture.

mu2_hat

Numeric vector of length 2. Estimated means for the second mixture.

sigma1_hat

Numeric vector of length 2. Estimated standard deviations for the first mixture.

sigma2_hat

Numeric vector of length 2. Estimated standard deviations for the second mixture.

pi1_hat

Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the first mixture, or a numeric vector of length 2 with elements summing to 1.

pi2_hat

Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the second mixture, or a numeric vector of length 2 with elements summing to 1.

h

Positive numeric scalar. Base step size for finite differences.

intersec

Numeric vector. Intersection points used as cutpoints.

OVL_mix

Function that evaluates the overlap-related expression at cutpoints.

Details

This function is intended for internal use (delta-method variance estimation).

Value

A list with components deriv1deriv10.


Computes the overlap coefficient (OVL) between two cumulative distribution functions corresponding to finite mixtures of two normal distributions.

Description

Computes the overlap coefficient (OVL) between two cumulative distribution functions corresponding to finite mixtures of two normal distributions.

Usage

OVL_mix(mu1, mu2, sigma1, sigma2, pi_H, pi_D, x)

Arguments

mu1

Numeric vector of length 2 containing the means of the first mixture.

mu2

Numeric vector of length 2 containing the means of the second mixture.

sigma1

Numeric vector of length 2 containing the standard deviations of the first mixture.

sigma2

Numeric vector of length 2 containing the standard deviations of the second mixture.

pi_H

Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the first mixture, or a numeric vector of length 2 with elements summing to 1.

pi_D

Either a numeric scalar in [0,1] giving the mixing proportion of the first component in the second mixture, or a numeric vector of length 2 with elements summing to 1.

x

Numeric vector of intersection points between the two mixture densities.

Details

Mixing proportions equal to 0 or 1 are allowed, in which case the corresponding mixture reduces to a single normal distribution.

Value

A numeric value corresponding to the OVL between the two mixture distributions.


Evaluates an auxiliary function.

Description

Evaluates an auxiliary function.

Usage

U(mu1, mu2, sigma1, sigma2)

Arguments

mu1

sample mean of a vector x.

mu2

sample mean of a vector y.

sigma1

sample standard deviation of a vector x.

sigma2

sample standard deviation of a vector y.

Value

evaluation of an auxiliary function.


Computes the probability density function of a finite mixture of normal distributions at a given point or vector of points.

Description

Computes the probability density function of a finite mixture of normal distributions at a given point or vector of points.

Usage

dnorm_mixture(x, mu, sigma, pi)

Arguments

x

Numeric vector of points at which the density is evaluated.

mu

Numeric vector of component means.

sigma

Numeric vector of component standard deviations.

pi

Numeric vector of mixing proportions. Must have the same length as mu and sigma, and sum to 1.

Value

A numeric vector containing the values of the mixture density evaluated at x.


Identifies the roots of a univariate function over a given interval by subdividing the interval into smaller subintervals and applying uniroot on those subintervals where a sign change is detected.

Description

Identifies the roots of a univariate function over a given interval by subdividing the interval into smaller subintervals and applying uniroot on those subintervals where a sign change is detected.

Usage

encontrar_raices(intersection_function, interval, n_subintervals = 10)

Arguments

intersection_function

A univariate numeric function whose roots are to be located.

interval

Numeric vector of length 2 specifying the lower and upper bounds of the search interval.

n_subintervals

Integer. Number of subintervals used to partition interval.

Value

A numeric vector containing the distinct roots found within interval. If no roots are detected, an empty numeric vector is returned.


Evaluates the Epanechnikov kernel.

Description

Evaluates the Epanechnikov kernel.

Usage

kernel.e(u)

Arguments

u

vector of observations.

Value

evaluation of the Epanechnikov kernel.


Estimates the density function using the Epanechnikov kernel.

Description

Estimates the density function using the Epanechnikov kernel.

Usage

kernel.e.density(data, points, h)

Arguments

data

vector of observations.

points

in which the function is evaluated.

h

bandwidth.

Value

density estimation.


Evaluates the Gaussian kernel.

Description

Evaluates the Gaussian kernel.

Usage

kernel.g(u)

Arguments

u

vector of observations.

Value

evaluation of the Gaussian kernel.


Estimates the density function using the Gaussian kernel.

Description

Estimates the density function using the Gaussian kernel.

Usage

kernel.g.density(data, points, h)

Arguments

data

vector of observations.

points

in which the function is evaluated.

h

bandwidth.

Value

density estimation.


Computation of the likelihood function of the BoxCox transformation.

Description

Computation of the likelihood function of the BoxCox transformation.

Usage

likbox(h, data, n)

Arguments

h

parameter of the Box-Cox transformation.

data

joint vector of controls (first) and cases.

n

length of the vector of controls.

Value

the likelihood function of the BoxCox transformation.


Simulated data with normal and mixture of normal distributions

Description

Contains control and case samples generated from a normal distribution and a two-component normal mixture distribution, respectively.

Usage

data(mixnorm_data)

Format

A data frame with 100 rows and 2 variables:

controls

Simulated data from a N(5,1) normal distribution.

cases

Simulated data from a two-component normal mixture distribution: 0.8N(2,1) + 0.2N(3,1).

References

This dataset was artificially generated for the OVL.CI package.

Examples

data(mixnorm_data)

Computes the sample variance of a vector of observations.

Description

Computes the sample variance of a vector of observations.

Usage

ssdd(x)

Arguments

x

vector of observations.

Value

the sample variance.


Simulated data with normal distributions

Description

Contains controls and cases data from normal distributions.

Usage

data(test_data)

Format

A data frame with 100 rows and 2 variables:

controls

Simulated data from a N(10,1)distribution for the control group.

cases

Simulated data from a N(10.5,0.5)distribution for the case group.

References

This data set was artificially created for the OVL.CI package.

Examples


data(test_data)