Getting Started with earthUI

Introduction

earthUI provides both an interactive Shiny GUI and a set of composable R functions for building Earth (MARS-style) models using the earth package.

This vignette demonstrates the programmatic API. To launch the interactive app, simply run:

library(earthUI)
launch()

Basic Workflow

1. Import Data

library(earthUI)

# For this example, we use the built-in mtcars dataset
df <- mtcars
head(df)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

You can also import from files:

df <- import_data("my_data.csv")        # CSV
df <- import_data("my_data.xlsx")       # Excel

2. Detect Categorical Variables

cats <- detect_categoricals(df)
cats
#>   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb 
#> FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Variables with few unique values (default: 10 or fewer) are flagged as likely categorical. Character and factor columns are always flagged.

3. Fit the Model

result <- fit_earth(
  df = df,
  target = "mpg",
  predictors = c("cyl", "disp", "hp", "wt", "qsec", "am", "gear"),
  categoricals = c("am", "gear"),
  degree = 1
)

Important defaults:

4. Examine Results

# Model summary
s <- format_summary(result)
cat(sprintf("R²: %.4f\nGRSq: %.4f\nTerms: %d\n",
            s$r_squared, s$grsq, s$n_terms))
#> R²: 0.8591
#> GRSq: 0.8143
#> Terms: 3
# Coefficients
s$coefficients
#>            term       mpg
#> 1   (Intercept) 20.436170
#> 2 h(disp-146.7) -0.024758
#> 3 h(146.7-disp)  0.145722
# Variable importance
format_variable_importance(result)
#>   variable nsubsets gcv rss
#> 1     disp        2 100 100
# ANOVA decomposition
format_anova(result)
#>   term     description variables       mpg
#> 1    1     (Intercept)           20.436170
#> 2    2 h(disp - 146.7)      disp -0.024758
#> 3    3 h(146.7 - disp)      disp  0.145722

5. Plots

plot_variable_importance(result)

plot_partial_dependence(result, "wt")

plot_actual_vs_predicted(result)

plot_residuals(result)

Controlling Interactions

When using degree >= 2, you can control which variable pairs are allowed to interact:

# Build default all-allowed matrix
preds <- c("wt", "hp", "cyl", "disp")
mat <- build_allowed_matrix(preds)

# Block wt-cyl interaction
mat["wt", "cyl"] <- FALSE
mat["cyl", "wt"] <- FALSE

# Convert to earth-compatible function
allowed_fn <- build_allowed_function(mat)

# Fit with interactions
result2 <- fit_earth(
  df = df,
  target = "mpg",
  predictors = preds,
  degree = 2,
  allowed_func = allowed_fn
)

s2 <- format_summary(result2)
cat(sprintf("Training R²: %.4f\nCV R²: %s\n",
            s2$r_squared,
            if (!is.na(s2$cv_rsq)) sprintf("%.4f", s2$cv_rsq) else "N/A"))
#> Training R²: 0.8938
#> CV R²: -1.0884

Exporting Reports

Generate publication-quality reports in HTML, PDF, or Word:

render_report(result, output_format = "html", output_file = "my_report.html")

This requires the quarto R package and a Quarto installation.