| Title: | Information-Theoretic Measures for Revealing Variable Interactions |
| Version: | 0.1 |
| Description: | Implements information-theoretic measures to explore variable interactions, including ksg mutual information estimation for continuous variables from Kraskov et al. (2004) <doi:10.1103/PhysRevE.69.066138>, knockoff conditional mutual information described in Zhang & Chen (2025) <doi:10.1126/sciadv.adu6464>, synergistic-unique-redundant decomposition as introduced by Martinez-Sanchez et al. (2024) <doi:10.1038/s41467-024-53373-4>, allowing detection of complex and diverse relationships among variables. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| URL: | https://stscl.github.io/infoxtr/, https://github.com/stscl/infoxtr |
| BugReports: | https://github.com/stscl/infoxtr/issues |
| Depends: | R (≥ 4.1.0) |
| LinkingTo: | Rcpp, RcppThread, |
| Imports: | methods, sdsfun, sf, terra |
| Suggests: | knitr, Rcpp, RcppThread, readr, rmarkdown, spEDM, tEDM |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-03-16 15:10:08 UTC; dell |
| Author: | Wenbo Lyu |
| Maintainer: | Wenbo Lyu <lyu.geosocial@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-19 15:00:08 UTC |
Conditional Entropy
Description
Estimate the conditional entropy of target variables given conditioning variables.
Usage
ce(data, target, conds, base = exp(1), type = c("cont", "disc"), k = 3)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
conds |
Integer vector of column indices for the conditioning variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
Value
A numerical value.
Examples
infoxtr::ce(matrix(1:100,ncol=2),1,2)
Conditional Mutual Information
Description
Estimate the conditional mutual information between target and interacting variables given conditioning variables.
Usage
cmi(
data,
target,
interact,
conds,
base = exp(1),
type = c("cont", "disc"),
k = 3,
normalize = FALSE
)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
interact |
Integer vector of column indices for the interacting variables. |
conds |
Integer vector of column indices for the conditioning variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
Value
A numerical value.
Examples
set.seed(42)
infoxtr::cmi(matrix(stats::rnorm(99,1,10),ncol=3),1,2,3)
Discretization
Description
Discretize a numeric vector into categorical classes using several
commonly used discretization methods. Missing values (NA/NaN)
are ignored and returned as class 0.
Usage
discretize(
x,
n = 5,
method = "natural",
large = 3000,
prop = 0.15,
seed = 42,
thr = 0.4,
iter = 100,
bps = NULL,
right_closed = TRUE
)
Arguments
x |
A vector. |
n |
(optional) Number of classes. |
method |
(optional) Discretization method. One of
|
large |
(optional) Threshold sample size for natural breaks sampling. |
prop |
(optional) Sampling proportion used when |
seed |
(optional) Random seed used for sampling in natural breaks. |
thr |
(optional) Threshold used in the head/tail breaks algorithm. |
iter |
(optional) Maximum number of iterations for head/tail breaks. |
bps |
(optional) Numeric vector of manual breakpoints used when
|
right_closed |
(optional) Logical. If |
Value
A discretized integer vector.
Note
If x is not numeric, it will be converted to
integer categories via as.factor().
Examples
set.seed(42)
infoxtr::discretize(stats::rnorm(99,1,10))
Shannon Entropy
Description
Estimate the entropy of a vector using either category counts (for discrete data) or a k-nearest neighbor estimator (for continuous data).
Usage
entropy(vec, base = exp(1), type = c("cont", "disc"), k = 3)
Arguments
vec |
A vector. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
Value
A numerical value.
Examples
set.seed(42)
infoxtr::entropy(stats::rnorm(100), type = "cont")
infoxtr::entropy(sample(letters[1:5], 100, TRUE), base = 2, type = "disc")
Joint Entropy
Description
Estimate the joint entropy of selected variables.
Usage
je(data, indices, base = exp(1), type = c("cont", "disc"), k = 3)
Arguments
data |
Observation data. |
indices |
Integer vector of column indices to include in joint entropy calculation. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
Value
A numerical value.
Examples
infoxtr::je(matrix(1:100,ncol=2),1:2)
KOCMI
Description
Knockoff Conditional Mutual Information
Usage
kocmi(
data,
target,
agent,
conds,
knockoff,
null_knockoff = NULL,
type = c("cont", "disc"),
nboots = 10000,
k = 3,
threads = 1,
seed = 42,
base = exp(1),
method = "equal",
contain_null = TRUE
)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
conds |
Integer vector of column indices for the conditioning variables. |
knockoff |
Knockoff realizations constructed for the |
null_knockoff |
(optional) Knockoff realizations generated under the
null setting where all variables are jointly used to construct knockoffs.
Each column represents one Monte Carlo sample. If |
type |
(optional) Estimation method: |
nboots |
(optional) Number of permutations used in the sign-flipping permutation test for evaluating the significance of the mean information difference. |
k |
(optional) For |
threads |
(optional) Number of threads used. |
seed |
(optional) Random seed used for permutation test. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
method |
(optional) Discretization method. One of
|
contain_null |
(optional) Logical. If |
Value
A named numeric vector.
References
Zhang, X., Chen, L., 2025. Quantifying interventional causality by knockoff operation. Science Advances 11.
Examples
set.seed(42)
kn1 = replicate(50, stats::rnorm(100))
kn2 = replicate(50, stats::rnorm(100))
mat = replicate(3, stats::rnorm(100))
infoxtr::kocmi(mat, 1, 2, 3, kn1, kn2)
Mutual Information
Description
Estimate the mutual information between target and interacting variables.
Usage
mi(
data,
target,
interact,
base = exp(1),
type = c("cont", "disc"),
k = 3,
normalize = FALSE
)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
interact |
Integer vector of column indices for the interacting variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
Value
A numerical value.
Examples
infoxtr::mi(matrix(1:100,ncol=2),1,2)
SURD
Description
Synergistic-Unique-Redundant Decomposition
Usage
## S4 method for signature 'data.frame'
surd(
data,
target,
agent,
lag = 1,
bin = 5,
method = "equal",
max.combs = 10,
threads = 1,
base = 2,
normalize = TRUE
)
## S4 method for signature 'sf'
surd(
data,
target,
agent,
lag = 1,
bin = 5,
method = "equal",
max.combs = 10,
threads = 1,
base = 2,
normalize = TRUE,
nb = NULL
)
## S4 method for signature 'SpatRaster'
surd(
data,
target,
agent,
lag = 1,
bin = 5,
method = "equal",
max.combs = 10,
threads = 1,
base = 2,
normalize = TRUE
)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
lag |
(optional) Lag of the agent variables. |
bin |
(optional) Number of discretization bins. |
method |
(optional) Discretization method. One of
|
max.combs |
(optional) Maximum combination order. |
threads |
(optional) Number of threads used. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
normalize |
(optional) Logical; if |
nb |
(optional) Neighbours list. |
Value
A list.
- vars
Character vector indicating the variable combination associated with each information component.
- types
Character vector indicating the information type of each component.
- values
Numeric vector giving the magnitude of each information component.
Note
SURD only support numeric data.
References
Martinez-Sanchez, A., Arranz, G., Lozano-Duran, A., 2024. Decomposing causality into its synergistic, unique, and redundant components. Nature Communications 15.
Examples
columbus = sf::read_sf(system.file("case/columbus.gpkg", package="spEDM"))
infoxtr::surd(columbus, 1, 2:3)
Transfer Entropy
Description
Estimate the transfer entropy from agent variables to target variables.
Usage
te(
data,
target,
agent,
lag_p = 3,
lag_q = 3,
base = exp(1),
type = c("cont", "disc"),
k = 3,
normalize = FALSE,
lag_single = FALSE
)
Arguments
data |
Observation data. |
target |
Integer vector of column indices for the target variables. |
agent |
Integer vector of column indices for the source (agent) variables. |
lag_p |
(optional) Lag of the target variables. |
lag_q |
(optional) Lag of the agent variables. |
base |
(optional) Logarithm base of the entropy.
Defaults to |
type |
(optional) Estimation method:
|
k |
(optional) Number of nearest neighbors used by the continuous estimator.
Ignored when |
normalize |
(optional) Logical; if |
lag_single |
(optional) Logical; if |
Value
A numerical value.
Examples
set.seed(42)
infoxtr::te(matrix(stats::rnorm(100,1,10),ncol=2),1,2)