---
title: "Manual Symbolic Regression: Testing Hypotheses"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Manual Symbolic Regression: Testing Hypotheses}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Introduction

In addition to automated symbolic regression, `leaf` allows users to define their own candidate equations using the `"manual"` engine. This enables direct testing of hypotheses and incorporation of prior knowledge, while still leveraging `leaf`'s tools for parameter fitting, evaluation, and multi-view modeling.

# Installation

Before using `leafr`, ensure the Python backend is installed:


``` r
leafr::install_leafr()
```

# Load package 

``` r
library(leaf)
if (!backend_available()) {
  message("Install backend with leaf::install_leaf()")
}  
```




# Define the formula and custom equations

User-defined equations are specified as character strings. These can include:

- x1, x2, ... referring to inputs defined in the formula (by position)
- Variable names directly, corresponding to column names in the dataframe
- u1, u2, ... for group-specific parameters
- c1, c2, ... for global parameters


``` r
model_formula <- "y ~ f(log(A), T, T**2, A | Archipelago, species)"
eqs <- c(
  "T**2*(u1 + u2*log(A) + u3*T)",
  "x3*(u1 + u2*x1 + u3*x2)",  # same as above
  "exp(u1 + u2*log(T) + u3*A*x2)"  # can mix both, but if using A directly in the equation need to specify it in the formula
)
```

# Define the manual search

``` r
regressor <- SymbolicRegressor$new(
  engine = "manual",
  loss = "PoissonDeviance",
  equation_list = eqs
)
```

# Load the data


``` r
train_data <- leaf_data("GMD")
#> Warning in leaf_data("GMD"): Invalid data name. Run leaf_data() for a
#> full list of options.
head(train_data)
#> NULL
```

# Register equations

Even in manual mode, search_equations() is used to register and preprocess the equations. No search is performed.

``` r
regressor$search_equations(
  data = train_data,
  formula = model_formula
)
#> Error in `py_call_impl()`:
#> ! TypeError: object of type 'NoneType' has no len()
#> Run `reticulate::py_last_error()` for details.
```

# Fit parameters and inspect results


``` r
# Only one equation gets a finite loss
fit_results <- regressor$fit(data = train_data)
#> Error in `py_call_impl()`:
#> ! RuntimeError: You must run equation_search() before fitting parameters.
#> Run `reticulate::py_last_error()` for details.
pareto_front <- regressor$evaluate(metrics = c("RMSE", "PseudoR2"))
#> Error in `py_call_impl()`:
#> ! RuntimeError: You must run equation_search() before scoring.
#> Run `reticulate::py_last_error()` for details.
head(pareto_front)
#> Error:
#> ! object 'pareto_front' not found
```
