Type: Package
Title: Compositional Maximum Likelihood Estimation
Version: 1.0.2
Description: Provides composable optimization strategies for maximum likelihood estimation (MLE). Solvers are first-class functions that combine via sequential chaining, parallel racing, and random restarts. Implements gradient ascent, Newton-Raphson, quasi-Newton (BFGS), and derivative-free methods with support for constrained optimization and tracing. Returns 'mle' objects compatible with 'algebraic.mle' for downstream analysis. Methods based on Nocedal J, Wright SJ (2006) "Numerical Optimization" <doi:10.1007/978-0-387-40065-5>.
License: MIT + file LICENSE
Encoding: UTF-8
ByteCompile: true
Depends: R (≥ 3.5.0), algebraic.mle
Imports: MASS, numDeriv
Suggests: rmarkdown, dplyr, knitr, ggplot2, tibble, testthat (≥ 3.0.0), cli, future, hypothesize
URL: https://github.com/queelius/compositional.mle, https://queelius.github.io/compositional.mle/
BugReports: https://github.com/queelius/compositional.mle/issues
VignetteBuilder: knitr
Config/testthat/edition: 3
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2026-02-05 09:31:43 UTC; spinoza
Author: Alexander Towell ORCID iD [aut, cre]
Maintainer: Alexander Towell <queelius@gmail.com>
Repository: CRAN
Date/Publication: 2026-02-09 12:20:02 UTC

compositional.mle: Compositional Maximum Likelihood Estimation

Description

Provides composable optimization strategies for maximum likelihood estimation (MLE). Solvers are first-class functions that combine via sequential chaining, parallel racing, and random restarts. Implements gradient ascent, Newton-Raphson, quasi-Newton (BFGS), and derivative-free methods with support for constrained optimization and tracing. Returns 'mle' objects compatible with 'algebraic.mle' for downstream analysis. Methods based on Nocedal J, Wright SJ (2006) "Numerical Optimization" doi:10.1007/978-0-387-40065-5.

A domain-specific language for maximum likelihood estimation where solvers are first-class composable functions. Following SICP principles, solvers combine via sequential chaining, parallel racing, and iteration to build sophisticated optimization strategies from simple primitives.

The Problem Abstraction

mle_problem encapsulates a statistical estimation problem:

Solver Factories

Solver factories return solver functions with signature (problem, theta0, trace) -> mle_result:

Composition Operators

Combine solvers to build complex strategies:

Tracing

mle_trace configures what to track during optimization (values, path, gradients, timing) for diagnostics and visualization.

Author(s)

Maintainer: Alexander Towell queelius@gmail.com (ORCID)

See Also

Useful links:

Examples


# Define problem
set.seed(42)
x <- rnorm(100, mean = 5, sd = 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)

# Simple solve
result <- gradient_ascent()(problem, c(0, 1))

# Composed strategy: grid -> gradient -> Newton
strategy <- grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 5) %>>%
  gradient_ascent() %>>% newton_raphson()
result <- strategy(problem, c(0, 1))

# Race different methods
strategy <- gradient_ascent() %|% bfgs() %|% nelder_mead()
result <- strategy(problem, c(0, 1))



Sequential Solver Composition

Description

Chains two solvers sequentially. The result of the first solver becomes the starting point for the second. This enables coarse-to-fine strategies.

Usage

s1 %>>% s2

Arguments

s1

First solver function

s2

Second solver function

Details

Trace data from all solvers in the chain is merged into a single trace with stage boundaries preserved.

Value

A new solver function that runs s1 then s2

Examples

# Coarse-to-fine: grid search to find good region, then gradient ascent
strategy <- grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 5) %>>%
  gradient_ascent()

# Three-stage refinement
strategy <- grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 3) %>>%
  gradient_ascent() %>>%
  newton_raphson()


Description

Backtracking line search

Usage

.backtracking_line_search(
  loglike,
  theta,
  direction,
  max_step,
  backtrack_ratio,
  min_step,
  constraint
)

Description

Golden section line search along one coordinate

Usage

.coordinate_line_search(
  loglike,
  theta,
  coord,
  constraint,
  max_iter = 50,
  tol = 1e-08
)

Arguments

loglike

Log-likelihood function

theta

Current parameter vector

coord

Index of coordinate to optimize

constraint

Constraint object


Check if cli package is available

Description

Check if cli package is available

Usage

.has_cli()

Create a progress handler for optimization

Description

Create a progress handler for optimization

Usage

.progress_handler(
  verbose = FALSE,
  solver_name = "Solver",
  max_iter = NULL,
  show_every = 1L
)

Arguments

verbose

Logical; whether to show progress

solver_name

Name of the solver for display

max_iter

Maximum iterations (for progress bar)

show_every

Show progress every N iterations

Value

A list with start(), update(), and finish() functions


BFGS Solver

Description

Creates a solver using the BFGS quasi-Newton method via optim(). BFGS approximates the Hessian from gradient information, providing second-order-like convergence without computing the Hessian directly.

Usage

bfgs(max_iter = 100L, tol = 1e-08, report = 0L)

Arguments

max_iter

Maximum number of iterations

tol

Convergence tolerance (passed to optim's reltol)

report

Reporting frequency (0 = no reporting)

Details

BFGS is often a good default choice: it's more robust than Newton-Raphson (no matrix inversion issues) and faster than gradient ascent (uses curvature information).

The solver automatically uses the score function from the problem if available, otherwise computes gradients numerically.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)
# Basic usage
result <- bfgs()(problem, c(4, 1.5))

# Race BFGS against gradient ascent
strategy <- bfgs() %|% gradient_ascent()



Chain Solvers with Early Stopping

Description

Chains multiple solvers sequentially with optional early stopping. More flexible than %>>% operator.

Usage

chain(..., early_stop = NULL)

Arguments

...

Solver functions to chain

early_stop

Optional function that takes a result and returns TRUE to stop the chain early. Default is NULL (no early stopping).

Details

The chain runs solvers in order, passing each result's theta.hat to the next solver. If early_stop is provided and returns TRUE for any intermediate result, the chain stops early.

Common early stopping conditions:

Value

A new solver function that runs solvers in sequence

Examples

# Chain with early stopping when converged
strategy <- chain(
  grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 5),
  gradient_ascent(max_iter = 50),
  newton_raphson(max_iter = 20),
  early_stop = function(r) isTRUE(r$converged)
)

# Standard chain (no early stopping)
strategy <- chain(gradient_ascent(), newton_raphson())


Clear derivative cache

Description

Clears the cached numerical derivatives (score and Fisher) from an mle_problem. This is useful when you want to force recomputation, for example after modifying data that the log-likelihood depends on.

Usage

clear_cache(problem)

Arguments

problem

An mle_problem object

Value

The problem object (invisibly), modified in place

Examples


loglike <- function(theta) -sum((theta - c(1, 2))^2)
problem <- mle_problem(loglike, cache_derivatives = TRUE)
# ... run some optimization ...
clear_cache(problem)  # Force fresh derivative computation


Compose Multiple Solvers Sequentially

Description

Chains any number of solvers sequentially. Each solver's result becomes the starting point for the next. Alternative to using %>>% operator.

Usage

compose(...)

Arguments

...

Solver functions to compose

Details

Trace data from all solvers is merged into a single trace with stage boundaries preserved.

Value

A new solver function that runs all solvers in sequence

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)
# Three-stage strategy
strategy <- compose(
  grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 5),
  gradient_ascent(max_iter = 50),
  newton_raphson(max_iter = 20)
)
result <- strategy(problem, c(0, 1))



Compose Multiple Function Transformations

Description

Applies transformations right-to-left (like mathematical composition). This allows building complex log-likelihood transformations from simple ones.

Usage

compose_transforms(...)

Arguments

...

Transformer functions

Details

Note: For composing solvers, use compose instead.

Value

Composed transformer function

Examples


# Create a composition of transformations
transform <- compose_transforms(
  function(f) with_penalty(f, penalty_l1(), lambda = 0.01),
  function(f) with_penalty(f, penalty_l2(), lambda = 0.05)
)

# Apply to log-likelihood
loglike <- function(theta) -sum((theta - c(1, 2))^2)
loglike_transformed <- transform(loglike)
loglike_transformed(c(1, 2))


Coordinate Ascent Solver

Description

Creates a solver that optimizes one parameter at a time while holding others fixed. This is useful when parameters have different scales or when the likelihood decomposes nicely along coordinate directions.

Usage

coordinate_ascent(
  max_cycles = 50L,
  tol = 1e-08,
  line_search = TRUE,
  cycle_order = c("sequential", "random"),
  verbose = FALSE
)

Arguments

max_cycles

Maximum number of full cycles through all parameters

tol

Convergence tolerance on log-likelihood change

line_search

Use line search for each coordinate (slower but more robust)

cycle_order

Order of cycling: "sequential" (1,2,...,p) or "random"

verbose

Logical; if TRUE and the cli package is installed, display progress during optimization. Default is FALSE.

Details

Each cycle consists of optimizing each coordinate in turn using a simple golden section search. The algorithm converges when the log-likelihood improvement in a full cycle is less than tol.

Coordinate ascent can be effective when:

However, it may converge slowly for problems with strong parameter correlations.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

See Also

gradient_ascent for gradient-based optimization, nelder_mead for another derivative-free method

Examples

# Basic coordinate ascent
solver <- coordinate_ascent()

# With more cycles for difficult problems
solver <- coordinate_ascent(max_cycles = 100)

# Random cycling to avoid systematic bias
solver <- coordinate_ascent(cycle_order = "random")


Finalize trace recorder into trace data

Description

Finalize trace recorder into trace data

Usage

finalize_trace(recorder)

Arguments

recorder

Trace recorder

Value

List of trace data or NULL


Fisher Scoring Solver

Description

Variant of Newton-Raphson that uses the expected Fisher information instead of the observed Fisher. Can be more stable for some problems.

Usage

fisher_scoring(
  line_search = TRUE,
  max_iter = 50L,
  tol = 1e-08,
  backtrack_ratio = 0.5,
  min_step = 1e-12,
  verbose = FALSE
)

Arguments

line_search

Use backtracking line search for stability

max_iter

Maximum number of iterations

tol

Convergence tolerance (on parameter change)

backtrack_ratio

Step size reduction factor for line search

min_step

Minimum step size before giving up

verbose

Logical; if TRUE and the cli package is installed, display progress during optimization. Default is FALSE.

Details

Fisher scoring is identical to Newton-Raphson when the expected and observed Fisher information are equal (e.g., exponential families). For other models, it may have different convergence properties.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(
    support = function(theta) theta[2] > 0,
    project = function(theta) c(theta[1], max(theta[2], 1e-8))
  )
)
solver <- fisher_scoring()
result <- solver(problem, c(4, 1.5))



Get Fisher information function from problem

Description

Returns the Fisher information matrix function, computing numerically if not provided. If cache_derivatives = TRUE was set in the problem and Fisher is computed numerically, results are cached using a single-value cache.

Usage

get_fisher(problem)

Arguments

problem

An mle_problem object

Value

Fisher information function that takes a parameter vector and returns the Fisher information matrix (negative Hessian of log-likelihood).

Examples

problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
fisher_fn <- get_fisher(problem)
fisher_fn(c(1, 2))  # Fisher information at the optimum

Get score function from problem

Description

Returns the score (gradient) function, computing numerically if not provided. If cache_derivatives = TRUE was set in the problem and score is computed numerically, results are cached using a single-value cache.

Usage

get_score(problem)

Arguments

problem

An mle_problem object

Value

Score function that takes a parameter vector and returns the gradient of the log-likelihood.

Examples

problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
score_fn <- get_score(problem)
score_fn(c(0, 0))  # Gradient at (0, 0)

Gradient Ascent Solver

Description

Creates a solver that uses gradient ascent (steepest ascent) to find the MLE. Optionally uses backtracking line search for adaptive step sizes.

Usage

gradient_ascent(
  learning_rate = 1,
  line_search = TRUE,
  max_iter = 100L,
  tol = 1e-08,
  backtrack_ratio = 0.5,
  min_step = 1e-12,
  verbose = FALSE
)

Arguments

learning_rate

Base learning rate / maximum step size

line_search

Use backtracking line search for adaptive step sizes

max_iter

Maximum number of iterations

tol

Convergence tolerance (on parameter change)

backtrack_ratio

Step size reduction factor for line search (0 < r < 1)

min_step

Minimum step size before giving up

verbose

Logical; if TRUE and the cli package is installed, display progress during optimization. Default is FALSE.

Details

Gradient ascent iteratively moves in the direction of the score (gradient of log-likelihood). With line search enabled, the step size is adaptively chosen to ensure the log-likelihood increases.

The solver respects constraints defined in the problem via projection.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

See Also

newton_raphson for second-order optimization, bfgs for quasi-Newton, %>>% and %|% for solver composition

Examples

# Create a solver with default parameters
solver <- gradient_ascent()

# Create a solver with custom parameters
solver <- gradient_ascent(
  learning_rate = 0.5,
  max_iter = 500,
  tol = 1e-10
)

# Without line search (fixed step size)
solver <- gradient_ascent(learning_rate = 0.01, line_search = FALSE)


Description

Creates a solver that evaluates the log-likelihood on a grid of points and returns the best. Useful for finding good starting points or for low-dimensional problems.

Usage

grid_search(lower, upper, n = 10L)

Arguments

lower

Lower bounds for the grid

upper

Upper bounds for the grid

n

Number of points per dimension (scalar or vector)

Details

Grid search is deterministic and exhaustive within its bounds. It's most useful for 1-3 dimensional problems or as the first stage of a multi-stage strategy (e.g., grid_search

The theta0 argument is ignored; the grid is determined by lower/upper/n. Points outside the problem's constraint support are skipped.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)
# Simple grid search
solver <- grid_search(lower = c(-10, 0.1), upper = c(10, 5), n = 20)
result <- solver(problem, c(0, 1))

# Coarse-to-fine: grid then gradient
strategy <- grid_search(c(-10, 0.1), c(10, 5), n = 5) %>>% gradient_ascent()



Check if solver converged

Description

Check if solver converged

Usage

is_converged(x, ...)

Arguments

x

An mle result object

...

Additional arguments (unused)

Value

Logical indicating convergence

Examples


problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
result <- gradient_ascent(max_iter = 50)(problem, c(0, 0))
is_converged(result)


Check if object is an mle_constraint

Description

Check if object is an mle_constraint

Usage

is_mle_constraint(x)

Arguments

x

Object to test

Value

Logical indicating whether x is an mle_constraint.

Examples

constraint <- mle_constraint(support = function(theta) all(theta > 0))
is_mle_constraint(constraint)  # TRUE
is_mle_constraint(list())      # FALSE

Check if object is an mle_numerical

Description

Check if object is an mle_numerical

Usage

is_mle_numerical(x)

Arguments

x

Object to test

Value

Logical indicating whether x inherits from mle_numerical.

Examples


problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
result <- gradient_ascent(max_iter = 20)(problem, c(0, 0))
is_mle_numerical(result)  # TRUE
is_mle_numerical(list())  # FALSE


Check if object is an mle_problem

Description

Check if object is an mle_problem

Usage

is_mle_problem(x)

Arguments

x

Object to test

Value

Logical indicating whether x is an mle_problem.

Examples

problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
is_mle_problem(problem)  # TRUE
is_mle_problem(list())   # FALSE

Check if tracing is enabled

Description

Check if tracing is enabled

Usage

is_tracing(trace)

Arguments

trace

An mle_trace object

Value

Logical indicating if any tracing is enabled

Examples

# Tracing disabled (default)
trace <- mle_trace()
is_tracing(trace)  # FALSE

# Tracing enabled
trace <- mle_trace(values = TRUE)
is_tracing(trace)  # TRUE

L-BFGS-B Solver (Box Constrained)

Description

Creates a solver using L-BFGS-B, a limited-memory BFGS variant that supports box constraints (lower and upper bounds on parameters).

Usage

lbfgsb(lower = -Inf, upper = Inf, max_iter = 100L, tol = 1e-08)

Arguments

lower

Lower bounds on parameters (can be -Inf)

upper

Upper bounds on parameters (can be Inf)

max_iter

Maximum number of iterations

tol

Convergence tolerance

Details

Unlike the constraint system in mle_problem (which uses projection), L-BFGS-B handles box constraints natively within the algorithm. Use this when you have simple bound constraints.

Value

A solver function

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE))
)
# Positive sigma via box constraint
solver <- lbfgsb(lower = c(-Inf, 0.01), upper = c(Inf, Inf))
result <- solver(problem, c(4, 1.5))



Merge trace data from multiple results

Description

Concatenates trace data from a sequence of results (e.g., from composed solvers). The merged trace preserves stage boundaries for later analysis.

Usage

merge_traces(results)

Arguments

results

List of mle_numerical results with trace_data

Value

A merged mle_trace_data object with stage information


Create domain constraint specification

Description

Specifies domain constraints for optimization. The support function checks if parameters are valid, and the project function maps invalid parameters back to valid ones.

Usage

mle_constraint(support = function(theta) TRUE, project = function(theta) theta)

Arguments

support

Function testing if theta is in support (returns TRUE/FALSE)

project

Function projecting theta onto support

Value

An mle_constraint object

Examples

# Positive parameters only
constraint <- mle_constraint(
  support = function(theta) all(theta > 0),
  project = function(theta) pmax(theta, 1e-8)
)

# Parameters in [0, 1]
constraint <- mle_constraint(
  support = function(theta) all(theta >= 0 & theta <= 1),
  project = function(theta) pmax(0, pmin(1, theta))
)

# No constraints (default)
constraint <- mle_constraint()

Create an MLE Problem Specification

Description

Encapsulates a maximum likelihood estimation problem, separating the statistical specification from the optimization strategy.

Usage

mle_problem(
  loglike,
  score = NULL,
  fisher = NULL,
  constraint = NULL,
  theta_names = NULL,
  n_obs = NULL,
  cache_derivatives = FALSE
)

## S3 method for class 'mle_problem'
print(x, ...)

Arguments

loglike

Log-likelihood function taking parameter vector theta

score

Score function (gradient of log-likelihood). If NULL, computed numerically via numDeriv::grad when needed.

fisher

Fisher information matrix function. If NULL, computed numerically via numDeriv::hessian when needed.

constraint

Domain constraints as mle_constraint object

theta_names

Character vector of parameter names for nice output

n_obs

Number of observations (for AIC/BIC computation)

cache_derivatives

Logical; if TRUE and score/fisher are computed numerically, cache the most recent result to avoid redundant computation. This is particularly useful during line search where the same point may be evaluated multiple times. Default is FALSE.

x

An mle_problem object.

...

Additional arguments (unused).

Details

The problem object provides lazy evaluation of derivatives. If you don't provide analytic score or fisher functions, they will be computed numerically when requested.

When cache_derivatives = TRUE, numerical derivatives are cached using a single-value cache (stores the most recent theta and result). This is efficient for optimization where consecutive calls often evaluate at the same point (e.g., during line search or convergence checking). Use clear_cache to manually clear the cache if needed.

Value

An mle_problem object

The input object, invisibly (for method chaining).

Examples

# With analytic derivatives
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(data, theta[1], theta[2], log = TRUE)),
  score = function(theta) {
    c(sum(data - theta[1]) / theta[2]^2,
      -length(data)/theta[2] + sum((data - theta[1])^2) / theta[2]^3)
  },
  constraint = mle_constraint(
    support = function(theta) theta[2] > 0,
    project = function(theta) c(theta[1], max(theta[2], 1e-8))
  ),
  theta_names = c("mu", "sigma")
)

# Without analytic derivatives (computed numerically)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(data, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(
    support = function(theta) theta[2] > 0
  )
)


Create a Trace Configuration

Description

Specifies what information to track during optimization.

Usage

mle_trace(
  values = FALSE,
  path = FALSE,
  gradients = FALSE,
  timing = FALSE,
  every = 1L
)

## S3 method for class 'mle_trace'
print(x, ...)

Arguments

values

Track log-likelihood values at each iteration

path

Track parameter values at each iteration

gradients

Track gradient norms at each iteration

timing

Track wall-clock time

every

Record every nth iteration (1 = all iterations)

x

An mle_trace object.

...

Additional arguments (unused).

Value

An mle_trace configuration object

The input object, invisibly (for method chaining).

Examples

# Track everything
trace <- mle_trace(values = TRUE, path = TRUE, gradients = TRUE)

# Minimal tracing (just convergence path)
trace <- mle_trace(values = TRUE)

# Sample every 10th iteration for long runs
trace <- mle_trace(values = TRUE, path = TRUE, every = 10)


Nelder-Mead Solver (Derivative-Free)

Description

Creates a solver using the Nelder-Mead simplex method via optim(). This is a derivative-free method useful when gradients are unavailable or unreliable.

Usage

nelder_mead(max_iter = 500L, tol = 1e-08)

Arguments

max_iter

Maximum number of iterations

tol

Convergence tolerance

Details

Nelder-Mead doesn't use gradient information, making it robust but potentially slower. It's useful as a fallback when gradient-based methods fail, or for problems with non-smooth likelihoods.

Value

A solver function

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)
# Use when gradients are problematic
result <- nelder_mead()(problem, c(4, 1.5))

# Race against gradient methods
strategy <- gradient_ascent() %|% nelder_mead()



Create a trace recorder

Description

Internal function to create a mutable trace recorder.

Usage

new_trace_recorder(trace, n_params)

Arguments

trace

mle_trace configuration

n_params

Number of parameters (for pre-allocation)

Value

A trace recorder environment


Newton-Raphson Solver

Description

Creates a solver that uses Newton-Raphson (second-order) optimization. Uses the Fisher information matrix to scale the gradient for faster convergence near the optimum.

Usage

newton_raphson(
  line_search = TRUE,
  max_iter = 50L,
  tol = 1e-08,
  backtrack_ratio = 0.5,
  min_step = 1e-12,
  verbose = FALSE
)

Arguments

line_search

Use backtracking line search for stability

max_iter

Maximum number of iterations

tol

Convergence tolerance (on parameter change)

backtrack_ratio

Step size reduction factor for line search

min_step

Minimum step size before giving up

verbose

Logical; if TRUE and the cli package is installed, display progress during optimization. Default is FALSE.

Details

Newton-Raphson computes the search direction as I(\theta)^{-1} s(\theta) where I is the Fisher information and s is the score. This accounts for parameter scaling and typically converges faster than gradient ascent when near the optimum.

Requires the problem to have a Fisher information function (either analytic or computed numerically).

Value

A solver function with signature (problem, theta0, trace) -> mle_result

See Also

gradient_ascent for first-order optimization, fisher_scoring (alias), %>>% for chaining

Examples


set.seed(42)
x <- rnorm(50, 5, 2)
problem <- mle_problem(
  loglike = function(theta) sum(dnorm(x, theta[1], theta[2], log = TRUE)),
  constraint = mle_constraint(support = function(theta) theta[2] > 0,
                              project = function(theta) c(theta[1], max(theta[2], 1e-8)))
)
# Basic usage
solver <- newton_raphson()
result <- solver(problem, c(4, 1.5))

# Often used after gradient ascent for refinement
strategy <- gradient_ascent(max_iter = 50) %>>% newton_raphson(max_iter = 20)



Normal Sampler Factory

Description

Creates a sampler function for use with with_restarts that generates normally distributed starting points around a center.

Usage

normal_sampler(center, sd = 1)

Arguments

center

Mean of the normal distribution

sd

Standard deviation (scalar or vector)

Value

A sampler function

Examples

sampler <- normal_sampler(c(0, 1), sd = c(5, 0.5))
strategy <- with_restarts(gradient_ascent(), n = 20, sampler = sampler)


Get number of iterations

Description

Get number of iterations

Usage

num_iterations(x, ...)

Arguments

x

An mle result object

...

Additional arguments (unused)

Value

Number of iterations, or NA_integer_ if not available.

Examples


problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(1, 2))^2)
)
result <- gradient_ascent(max_iter = 50)(problem, c(0, 0))
num_iterations(result)


Extract Optimization Path as Data Frame

Description

Converts the trace data from an MLE result into a tidy data frame for custom analysis and plotting (e.g., with ggplot2).

Usage

optimization_path(x, ...)

Arguments

x

An mle_numerical result with trace_data, or an mle_trace_data object

...

Additional arguments (unused)

Value

A data frame with columns:

Examples


# Get optimization path as data frame
problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(3, 2))^2),
  constraint = mle_constraint(support = function(theta) TRUE)
)
trace_cfg <- mle_trace(values = TRUE, path = TRUE)
result <- gradient_ascent(max_iter = 30)(problem, c(0, 0), trace = trace_cfg)

path_df <- optimization_path(result)
head(path_df)



Elastic net penalty (combination of L1 and L2)

Description

Creates a penalty combining L1 and L2 norms. The parameter alpha controls the balance: alpha=1 is pure LASSO, alpha=0 is pure Ridge.

Usage

penalty_elastic_net(alpha = 0.5, weights = NULL)

Arguments

alpha

Balance between L1 and L2 (numeric in [0,1], default: 0.5)

weights

Optional parameter weights (default: all 1)

Value

Penalty function

Examples

# Equal mix of L1 and L2
penalty <- penalty_elastic_net(alpha = 0.5)

# More L1 (more sparsity)
penalty <- penalty_elastic_net(alpha = 0.9)

# More L2 (more shrinkage)
penalty <- penalty_elastic_net(alpha = 0.1)

L1 penalty function (LASSO)

Description

Creates a penalty function that computes the L1 norm (sum of absolute values). Used for sparsity-inducing regularization.

Usage

penalty_l1(weights = NULL)

Arguments

weights

Optional parameter weights (default: all 1)

Value

Penalty function

Examples

penalty <- penalty_l1()
penalty(c(1, -2, 3))  # Returns 6

# Weighted L1
penalty <- penalty_l1(weights = c(1, 2, 1))
penalty(c(1, -2, 3))  # Returns 1*1 + 2*2 + 1*3 = 8

L2 penalty function (Ridge)

Description

Creates a penalty function that computes the L2 norm squared (sum of squares). Used for parameter shrinkage.

Usage

penalty_l2(weights = NULL)

Arguments

weights

Optional parameter weights (default: all 1)

Value

Penalty function

Examples

penalty <- penalty_l2()
penalty(c(1, -2, 3))  # Returns 14

# Weighted L2
penalty <- penalty_l2(weights = c(1, 2, 1))
penalty(c(1, -2, 3))  # Returns 1^2 + (2*2)^2 + 3^2 = 26

Plot Optimization Convergence

Description

Visualizes the optimization trajectory from an MLE result with tracing enabled. Shows log-likelihood progression, gradient norm decay, and optionally the parameter path (for 2D problems).

Usage

## S3 method for class 'mle_numerical'
plot(x, which = c("loglike", "gradient"), main = NULL, ...)

Arguments

x

An mle_numerical result object with trace_data

which

Character vector specifying which plots to show: "loglike" (log-likelihood), "gradient" (gradient norm), "path" (2D parameter path)

main

Optional title

...

Additional arguments passed to plot

Details

This function requires that the solver was run with tracing enabled via mle_trace(). Without trace data, the function will warn and return invisibly.

The "path" plot is only shown for 2D parameter problems.

Value

Invisibly returns the trace data

Examples


# Enable tracing when solving
problem <- mle_problem(
  loglike = function(theta) -sum((theta - c(3, 2))^2),
  constraint = mle_constraint(support = function(theta) TRUE)
)
trace_cfg <- mle_trace(values = TRUE, gradients = TRUE, path = TRUE)
result <- gradient_ascent(max_iter = 50)(problem, c(0, 0), trace = trace_cfg)

# Plot convergence diagnostics
plot(result)



Plot Trace Data Directly

Description

Plot Trace Data Directly

Usage

## S3 method for class 'mle_trace_data'
plot(x, ...)

Arguments

x

An mle_trace_data object

...

Arguments passed to plotting functions

Value

Called for side effects (generates a plot). Returns the input object invisibly.


Print MLE Trace Data

Description

Print MLE Trace Data

Usage

## S3 method for class 'mle_trace_data'
print(x, ...)

Arguments

x

An mle_trace_data object.

...

Additional arguments (unused).

Value

The input object, invisibly (for method chaining).


Race Multiple Solvers

Description

Runs multiple solvers (optionally in parallel) and returns the best result (highest log-likelihood). More flexible than %|% operator.

Usage

race(..., parallel = FALSE)

Arguments

...

Solver functions to race

parallel

Logical; if TRUE and the future package is installed, solvers are run in parallel using the current future plan. Default is FALSE.

Details

When parallel = TRUE, solvers are executed using future::future() and results collected with future::value(). The current future plan determines how parallelization happens (e.g., plan(multisession) for multi-process execution).

Failed solvers (those that throw errors) are ignored. If all solvers fail, an error is thrown.

Value

A new solver function that races all solvers and picks the best

Examples

# Race three methods sequentially
strategy <- race(gradient_ascent(), bfgs(), nelder_mead())

# Race with parallel execution (requires future package)
## Not run: 
future::plan(future::multisession)
strategy <- race(gradient_ascent(), bfgs(), nelder_mead(), parallel = TRUE)

## End(Not run)


Parallel Solver Racing (Operator)

Description

Runs multiple solvers and returns the best result (highest log-likelihood). Useful when unsure which method will work best for a given problem.

Usage

s1 %|% s2

Arguments

s1

First solver function

s2

Second solver function

Details

For parallel execution or more than 2 solvers, use race.

Value

A new solver function that runs both and picks the best

See Also

race for parallel execution

Examples

# Race gradient-based vs derivative-free
strategy <- gradient_ascent() %|% nelder_mead()

# Race multiple methods
strategy <- gradient_ascent() %|% bfgs() %|% nelder_mead()


Description

Creates a solver that evaluates the log-likelihood at random points and returns the best. Useful for high-dimensional problems where grid search is infeasible.

Usage

random_search(sampler, n = 100L)

Arguments

sampler

Function generating random parameter vectors

n

Number of random points to evaluate

Details

Unlike grid search, random search scales better to high dimensions. The sampler should generate points in a reasonable region; points outside the problem's constraint support are skipped.

Value

A solver function

Examples

# Create a random search solver with uniform sampling
solver <- random_search(
  sampler = uniform_sampler(c(-10, 0.1), c(10, 5)),
  n = 100
)


Record an iteration to trace

Description

Record an iteration to trace

Usage

record_iteration(recorder, theta, value = NULL, gradient = NULL)

Arguments

recorder

Trace recorder from new_trace_recorder

theta

Current parameters

value

Current log-likelihood (or NULL)

gradient

Current gradient (or NULL)


Simulated Annealing Solver

Description

Creates a solver using simulated annealing for global optimization. Simulated annealing can escape local optima by probabilistically accepting worse solutions, with the acceptance probability decreasing over time (controlled by a "temperature" parameter).

Usage

sim_anneal(
  temp_init = 10,
  cooling_rate = 0.95,
  max_iter = 1000L,
  neighbor_sd = 1,
  min_temp = 1e-10,
  verbose = FALSE
)

Arguments

temp_init

Initial temperature (higher = more exploration)

cooling_rate

Temperature reduction factor per iteration (0 < r < 1)

max_iter

Maximum number of iterations

neighbor_sd

Standard deviation for generating neighbor proposals

min_temp

Minimum temperature before stopping

verbose

Logical; if TRUE and the cli package is installed, display progress during optimization. Default is FALSE.

Details

At each iteration: 1. Generate a neighbor by adding Gaussian noise to current parameters 2. If the neighbor improves the objective, accept it 3. If the neighbor is worse, accept with probability exp(delta / temp) 4. Reduce temperature: temp = temp * cooling_rate

The algorithm is stochastic and may find different solutions on different runs. For best results, use with with_restarts() or combine with a local optimizer via %>>%.

Value

A solver function with signature (problem, theta0, trace) -> mle_result

See Also

with_restarts for multi-start optimization, gradient_ascent for local refinement

Examples

# Basic simulated annealing
solver <- sim_anneal()

# More exploration (higher initial temp, slower cooling)
solver <- sim_anneal(temp_init = 100, cooling_rate = 0.999)

# Coarse global search, then local refinement
strategy <- sim_anneal(max_iter = 500) %>>% gradient_ascent()


Uniform Sampler Factory

Description

Creates a sampler function for use with with_restarts that generates uniformly distributed starting points.

Usage

uniform_sampler(lower, upper)

Arguments

lower

Lower bounds for each parameter

upper

Upper bounds for each parameter

Value

A sampler function

Examples

sampler <- uniform_sampler(c(-10, 0.1), c(10, 5))
strategy <- with_restarts(gradient_ascent(), n = 20, sampler = sampler)


Conditional Refinement

Description

Applies a refinement solver only if the first solver did not converge. If refinement is applied, trace data from both solvers is merged.

Usage

unless_converged(solver, refinement)

Arguments

solver

Primary solver function

refinement

Solver to use if primary doesn't converge

Value

A new solver function with conditional refinement

Examples

# Use Newton-Raphson to refine if gradient ascent doesn't converge
strategy <- unless_converged(gradient_ascent(max_iter = 50), newton_raphson())


Update an mle_problem

Description

Create a new problem with some fields updated.

Usage

## S3 method for class 'mle_problem'
update(object, ...)

Arguments

object

An mle_problem

...

Named arguments to update

Value

New mle_problem


Verbose Output Utilities

Description

Internal functions for progress reporting during optimization.


Add penalty term to log-likelihood

Description

Transforms a log-likelihood by subtracting a penalty term. Useful for regularized estimation (e.g., LASSO, Ridge regression).

Usage

with_penalty(loglike, penalty, lambda = 1)

Arguments

loglike

Base log-likelihood function

penalty

Penalty function taking theta and returning numeric

lambda

Penalty weight (non-negative numeric, default: 1.0)

Value

Transformed log-likelihood function

Examples


# Regression with L2 penalty (Ridge)
loglike <- function(theta) -sum((theta - c(1, 2))^2)

# Add L2 penalty
loglike_penalized <- with_penalty(
  loglike,
  penalty = penalty_l2(),
  lambda = 0.1
)
loglike_penalized(c(1, 2))  # Evaluate penalized likelihood


Multiple Random Restarts

Description

Runs a solver from multiple starting points and returns the best result. Essential for problems with multiple local optima.

Usage

with_restarts(solver, n, sampler, max_reject = 100L)

Arguments

solver

A solver function

n

Number of restarts (including the provided theta0)

sampler

Function that generates random starting points. Called with no arguments, should return a parameter vector. Samples are automatically constrained using problem$constraint.

max_reject

Maximum rejection attempts per sample before projection

Details

The sampler generates candidate starting points, which are automatically filtered/projected using the problem's constraint. This means samplers can be simple distributions without constraint awareness.

Value

A new solver function with restart capability

Examples

# 20 random restarts - constraint applied automatically from problem
sampler <- uniform_sampler(c(-10, 0), c(10, 5))
strategy <- with_restarts(gradient_ascent(), n = 20, sampler = sampler)

# Can also compose with other operators
strategy <- with_restarts(gradient_ascent(), n = 10, sampler = sampler) %>>%
  newton_raphson()


Create stochastic log-likelihood with subsampling

Description

Transforms a log-likelihood function to use only a random subsample of observations. Useful for stochastic gradient ascent on large datasets.

Usage

with_subsampling(loglike, data, subsample_size, replace = FALSE)

Arguments

loglike

Base log-likelihood function. Should accept theta and data.

data

Observations (vector, matrix, or data.frame)

subsample_size

Number of observations to sample per evaluation

replace

Sample with replacement (logical, default: FALSE)

Value

Transformed log-likelihood function

Examples


# Original likelihood uses all data
data <- rnorm(10000, mean = 5, sd = 2)

loglike <- function(theta, obs = data) {
  sum(dnorm(obs, mean = theta[1], sd = theta[2], log = TRUE))
}

# Stochastic version uses random subsample
loglike_stoch <- with_subsampling(
  loglike,
  data = data,
  subsample_size = 100
)

# Each call uses different random subsample
loglike_stoch(c(5, 2))
loglike_stoch(c(5, 2))  # Different value