Mathematical Definitions of Sufficient Statistics

Overview

The redeem package models the intensities of interaction formation and dissolution using a log-linear formulation. The intensity (rate) for a dyad \((i,j)\) at time \(t\) is:

\[\lambda_{i,j}(t) = \exp(s_{i,j}(\mathscr{H}_t)^\top \beta + \alpha_i + \alpha_j + f(t, \gamma))\]

where:

This vignette provides precise mathematical definitions for all sufficient network statistics implemented in the package, as well as a guide for adding custom statistics.

The complete list of available formula terms is documented in the redeem_terms reference manual page. For model estimation, see the help pages for the models via dem and rem.


Statistic Transformations

Each sufficient statistic \(s_{i,j}(\mathscr{H}_t)\) is defined as a transformed count \(f(N(t))\), where \(N(t)\) is a raw network statistic. The package supports five standard transformations \(f(\cdot)\):

Transformation Mathematical Definition
identity \(f(s) = s\)
log \(f(s) = \log(1 + s)\)
recip \(f(s) = 1/(1+s)\)
bin \(f(s) = \mathbb{I}(s > 0)\)
sig \(f(s) = \frac{s}{s + K}\)

Endogenous Network Statistics

These statistics capture structural dependencies within the network evolution.

1. Intercept (Intercept / intercept)

A constant term representing the baseline log-intensity. \[s_{i,j}(\mathscr{H}_t) = 1\]

2. Inertia (inertia / number_interaction)

Counts how many times the dyad \((i,j)\) has initiated an interaction in the past. \[s_{i,j}(\mathscr{H}_t) = f(N_{i,j}(t))\] where \(N_{i,j}(t) = \sum_{k: t_k < t} \mathbb{I}(i_k = i, j_k = j)\) (or the windowed version \(N_{i,j}^w(t) = \sum_{k: t-w < t_k < t} \mathbb{I}(i_k = i, j_k = j)\)).

3. Reciprocity (reciprocity)

Models the tendency to reciprocate past interactions (directed only). \[s_{i,j}(\mathscr{H}_t) = f(N_{j,i}(t))\]

4. Duration (duration / current_interaction)

Measures the dependency on the time since the current interaction started (DEM only). \[s_{i,j}(\mathscr{H}_t) = \begin{cases} f(t - t_{\text{start, } i,j}) & \text{if dyad } (i,j) \text{ is interacting at } t \\ 0 & \text{otherwise} \end{cases}\] where \(t_{\text{start, } i,j}\) is the timestamp of the last formation event.

5. Participation Shifts (P-shifts)

P-shifts capture the sequential dependencies between consecutive events (REM only). Let the preceding event in the stream be \(A \to B\). For a candidate event \(i \to j\) at time \(t\):


Triadic Closure and Shared Partners

Triadic statistics capture structural closure. They can be calculated over active edges (designated as current_ in DEM) or historical event existence (designated as general_ in REM/DEM). Let \(\mathcal{A}_t\) represent the set of interacting dyads (for current_) or previously interacted dyads (for general_) at time \(t\).

Common Partners (general_common_partners / current_common_partners)

Counts the number of third-party actors \(k\) sharing a connection of a specified type with both \(i\) and \(j\):

Triangles (general_triangle / current_triangle)

Similar to common partners, but only non-zero if the focal dyad itself is active (directed only):


Degree and Centrality Statistics

These statistics capture actor-level activity or popularity. Let \(\mathcal{D}_t\) be the network state at time \(t\), and \(d_{i, \text{out}}(t)\), \(d_{i, \text{in}}(t)\) represent the out-degree and in-degree of \(i\) in \(\mathcal{D}_t\). Let \(c_{i, \text{out}}(t)\), \(c_{i, \text{in}}(t)\) be the total out-events and in-events involving \(i\).

Degree Statistics (general_degree_out_sender, etc.)

Count Statistics (general_count_out_sender, etc.)

Count statistics are identical to degree statistics but use total interaction counts (\(c\)) rather than binary degrees (\(d\)):

All of these degree and count statistics can be conveniently specified in model formulas using the degree() (or degrees()) and count() helper functions:


Exogenous Statistics

1. Dyadic Covariate (dyadic_cov)

\[s_{i,j}(\mathscr{H}_t) = f(X_{i,j}(t))\] where \(X(t)\) is an \(N \times N\) matrix.

2. Monadic Covariate (monadic_cov)

\[s_{i,j}(\mathscr{H}_t) = g(x_i(t), x_j(t))\] where \(x\) is a monadic vector and \(g(\cdot)\) is a user-defined function.


Developer Guide: Adding Custom Sufficient Statistics

To add a new sufficient statistic (e.g., my_stat) to the package, follow this two-step process:

Step 1: Implement the Update Logic in C++

Define a C++ function in the src/ directory (e.g., in a new file or in src/sufficient_statistics.cpp) with the signature ValidateFunction. This function computes the change to the statistic state matrix when an event occurs:

#include "redeem/sufficient_statistics.h"
#include "redeem/extension_api.hpp"

arma::uvec stat_my_stat(
    Data_DEM &object, 
    arma::mat &data, 
    unsigned int &from, 
    unsigned int &to, 
    unsigned int &number_event, 
    unsigned int col_number, 
    std::string transformation, 
    unsigned int type
) {
  if (from == 0 || to == 0) return arma::uvec();
  
  // 1. Calculate which dyad indices are affected and the change value (val)
  double val = (type == 1) ? 1.0 : -1.0; 
  arma::uvec affected_indices = { object.current_stats.find_from_to(from, to) };
  
  // 2. Apply the update using the helper (handles transformations internally)
  apply_update(object, affected_indices, col_number, val, transformation, data);
  
  // 3. Return the indices of modified dyads
  return affected_indices;
}

// 4. Register the term in the global C++ Registry
TERM_REGISTER("my_stat", stat_my_stat);

Step 2: Write the R Initializer Function

In R/init_terms.R, create an initialization function named InitRedeemTerm.my_stat. This function validates user-provided arguments and sets up initial values:

#' @keywords internal
InitRedeemTerm.my_stat <- function(arglist, n_nodes, model_type, directed, ...) {
  # Validate arguments against expected types and models
  arglist <- check.RedeemTerm(arglist,
    model_type = model_type, directed = directed,
    expected = list(transformation = c("identity", "log", "recip", "bin", "sig"), K = "numeric"),
    defaults = list(transformation = "identity", K = 1)
  )
  
  # Define initial statistic values at time 0 (optional)
  eval_at_zero <- matrix(0, n_nodes, n_nodes)
  
  # Return the configuration list
  list(
    base_name = "my_stat",
    transformation = arglist$transformation,
    eval_at_zero = eval_at_zero,
    event_stream = arglist$event_stream
  )
}

Once these steps are completed, run devtools::document() to update namespaces and documentation, and build the package. The new term is now ready for use in dem() and rem() by specifying it directly in the formula (e.g., ~ my_stat(transformation = "log")).