| Version: | 1.0.0 |
| Date: | 2026-05-05 |
| Title: | Random Forest Super Greedy Trees |
| Author: | Min Lu [aut], Udaya B. Kogalur [aut, cre], Hemant Ishwaran [aut] |
| Maintainer: | Udaya B. Kogalur <ubk@kogalur.com> |
| BugReports: | https://github.com/kogalur/randomForestSGT/issues/ |
| Depends: | R (≥ 4.3.0) |
| Imports: | randomForestSRC (≥ 3.6.2), varPro (≥ 3.1.0) |
| Suggests: | mlbench, interp, glmnet |
| Description: | Implements random forest Super Greedy Trees (SGTs) for regression. SGTs extend classification and regression tree splitting by fitting lasso-penalized local parametric models at tree nodes, producing sparse univariate and multivariate geometric cuts such as axis-aligned splits, hyperplanes, ellipsoids, hyperboloids, and interaction-based cuts. Trees are grown best-split-first by selecting cuts that reduce empirical risk, and ensembles provide out-of-bag error estimation, prediction on new data, variable filtering, tuning of the hcut complexity parameter, coordinate-descent lasso fitting, variable importance, and local coefficient summaries. For the underlying method, see Ishwaran (2026) <doi:10.1007/s10462-026-11541-6>. |
| License: | GPL (≥ 3) |
| URL: | https://ishwaran.org/ |
| NeedsCompilation: | yes |
| Packaged: | 2026-05-05 21:52:24 UTC; kogalur |
| Repository: | CRAN |
| Date/Publication: | 2026-05-11 18:50:07 UTC |
Coordinate Descent Lasso
Description
Fit lasso for regression using coordinate descent.
Usage
cdlasso(formula,
data,
nfolds = 0,
weights = NULL,
nlambda = 100,
lambda.min.ratio = ifelse(n < n.xvar, 0.01, 1e-04),
lambda = NULL,
threshold = 1e-7,
eps = .0001,
maxit = 5000,
efficiency = ifelse(n.xvar < 500, "covariance", "naive"),
seed = NULL,
do.trace = FALSE)
Arguments
formula |
Formula describing the model to be fit. |
data |
Data frame containing response and features. |
nfolds |
Number of cross-validation folds where default is 0 corresponding to no cross-validation. |
weights |
Observation weights. Default is 1 for each observation. |
nlambda |
The number of |
lambda.min.ratio |
Smallest value for |
lambda |
Lasso |
threshold |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change in
the objective after any coefficient update is less than
|
eps |
Multiplication factor applied to |
maxit |
Maximum number of passes over the data for all
|
efficiency |
Switches the algorithm to efficiency or naive mode
depending on number of variables. Efficiency |
seed |
Negative integer specifying seed for the random number generator. |
do.trace |
Number of seconds between updates to the user on approximate time to completion. |
Details
Use coordinate descent to fit lasso to a regression model.
Value
A list containing the fitted lasso solution path. The list contains:
convgCountConvergence counter returned by the coordinate-descent routine.
lambdaCountNumber of
lambdavalues in the fitted solution path.lambdaThe sequence of
lambdavalues used.betaMatrix of regression coefficients for the lasso solution path. Rows correspond to values in
lambda; columns contain the intercept followed by the encoded predictor variables.xvarNumeric predictor matrix used in the fit, after any internal encoding of the supplied data.
yvarResponse vector used in the fit.
yHatCross-validated fitted values or predictions by
lambdaand observation. Returned only when cross-validation output is available, such as whennfoldsis greater than 1.lambda.min.indxIndex of the
lambdavalue with minimum cross-validation error. Returned only when cross-validation output is available.lambda.1se.min.indxIndex of the minimum
lambdavalue within one standard error of the minimum cross-validation error. Returned only when cross-validation output is available.lambda.1se.max.indxIndex of the maximum
lambdavalue within one standard error of the minimum cross-validation error. Returned only when cross-validation output is available.lambda.cvmMean cross-validation error for each
lambda. Returned only when cross-validation output is available.lambda.cvsdCross-validation standard-error values for each
lambda. Returned only when cross-validation output is available.ctime.internalTiming information reported by the native coordinate-descent routine.
ctime.externalElapsed R-side timing, computed from
proc.time().
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
References
Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent, J. of Statistical Software, 33(1):1-22.
See Also
Examples
## ------------------------------------------------------------
## regression example: boston housing
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## load the data
data(BostonHousing, package = "mlbench")
## 10-fold validation
o <- cdlasso(medv ~., BostonHousing, nfolds=10)
## lasso solution
bhat <- data.frame(bhat.min=o$beta[o$lambda.min.indx,],
bhat.1se=o$beta[o$lambda.1se.max.indx[1],])
print(bhat)
## compare to results from glmnet
if (library("glmnet", logical.return = TRUE)) {
oo <- cv.glmnet(data.matrix(o$xvar), o$yvar, nfolds=10)
bhat2 <- cbind(data.matrix(coef(oo, s=oo$lambda.min)),
data.matrix(coef(oo, s=oo$lambda.1se)))
rownames(bhat2) <- rownames(bhat)
print(bhat2)
}
}
Utility Functions for Random Forest Super Greedy Trees
Description
These are internal utility functions exported for advanced usage. This Rd file is used solely to register aliases.
Value
Return values depend on the helper function used:
filter.rfsgtandfilter.custom.rfsgtCharacter vector of variables or generated base-learner terms retained by the filtering step.
get.betaA list containing
beta,betaZ,lasso.percent,predicted, andpartial. These components summarize forest-averaged local coefficients, splitting scores, the percentage of lasso terminal-node fits, reconstructed predictions, and partial term contributions.make.baselearnerA data frame or matrix containing the original predictors together with generated polynomial or interaction base-learner terms.
tune.hcutAn object of class
"tune.hcut". This is a character vector of selected terms with attributes includingall,formula,formula.all,term.map,term.map.all,cv,bsf,hcut, andhcutSeq.use.tune.hcutA
"tune.hcut"object restricted to the requestedhcut, with tuning attributes preserved.vimp.rfsgtVariable-importance scores, typically returned as a named numeric vector.
Prediction on Test Data for Super Greedy Forests
Description
Obtain predicted values on test data using a trained super greedy forest.
Usage
## S3 method for class 'rfsgt'
predict(object, newdata, get.tree = NULL,
block.size = 10, seed = NULL, do.trace = FALSE,...)
Arguments
object |
rfsgt object obtained from previous training call using
|
newdata |
Test data. If not provided the training data is used and the original training forest is restored. |
get.tree |
Vector of integer(s) identifying trees over which the ensemble is calculated over. By default, uses all trees in the forest. |
block.size |
Determines how cumulative error rate is calculated. To
obtain the cumulative error rate on every nth tree, set the value to
an integer between 1 and |
seed |
Negative integer specifying seed for the random number generator. |
do.trace |
Number of seconds between updates to the user on approximate time to completion. |
... |
Additional options. |
Details
Returns the predicted values for a super greedy forest.
Value
An object of class c("rfsgt", "predict", family) containing
predictions and prediction-time summaries. When newdata is
omitted, the object corresponds to the restored training forest; when
newdata is supplied, it corresponds to predictions on the new
data. The returned list contains:
callThe matched prediction call.
familyModel family inherited from the fitted
rfsgtobject.nNumber of observations predicted.
samptypeSampling type inherited from the fitted forest.
sampsizeTree sample size inherited from the fitted forest.
ntreeNumber of trees in the fitted forest.
hcuthcutvalue used by the fitted forest.splitruleSplit rule used by the fitted forest.
yvarResponse values used for prediction-error calculation. This is the training response in restore mode, the response from
newdatawhen present, andNULLwhennewdatahas no response column.yvar.namesResponse variable name from the fitted forest.
xvarPredictor data used for prediction, after applying the same encoding and filtering as in training.
xvar.augmentAugmented base-learner data used for prediction, or
NULLwhen no augmented terms are used.xvar.namesNames of the retained predictor variables.
xvar.augment.namesNames of augmented base-learner terms, or
NULLwhen no augmented terms are used.xvar.infoPredictor-encoding metadata used to align new data with the training design.
term.mapTerm map describing generated base-learner terms.
leaf.countNumber of terminal nodes in each tree.
forestStored forest object used to make the predictions.
membershipMatrix of terminal-node membership by observation and tree when membership output is requested; otherwise
NULL.inbagMatrix of bootstrap membership counts in restore mode when membership output is requested; otherwise
NULL.block.sizeBlock size used for cumulative prediction error calculations.
perf.typeInternal performance-measure type used for prediction-error calculation.
predictedEnsemble predicted values.
predicted.oobOut-of-bag predictions in restore mode when available; otherwise
NULL.ambrOffsetTerminal-node offset matrix used internally to recover local terminal-node quantities for new observations;
NULLin restore mode.err.ratePrediction error when response values are available; otherwise
NULL.ctime.internalTiming information reported by the native prediction routine.
ctime.externalElapsed R-side timing, computed from
proc.time().
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
References
Ishwaran H. (2025). Super greedy trees. To appear in Artifical Intelligence Review.
See Also
Examples
## ------------------------------------------------------------
##
## mtcars: for CRAN testing
##
## ------------------------------------------------------------
o <- rfsgt(mpg~., mtcars[1:20,], ntree=3, treesize=1)
p <- predict(o, mtcars[-(1:20),])
print(o)
print(p)
## ------------------------------------------------------------
##
## train/test using friedman 3
##
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## train/test using Friedman 3
d.trn <- data.frame(mlbench::mlbench.friedman3(100))
o <- rfsgt(y ~ ., d.trn, hcut = 1, ntree = 3, treesize = 1)
print(o)
d.tst <- data.frame(mlbench::mlbench.friedman3(200))
y.tst <- d.tst$y
x.tst <- d.tst[, colnames(d.tst) != "y"]
yhat <- predict(o, x.tst)$predicted
mean((yhat - y.tst)^2)
## train sgf on friedman 3
d.trn <- data.frame(mlbench::mlbench.friedman3(500))
o <- rfsgt(y~.,d.trn, hcut=1)
print(o)
## test sgf
d.tst <- data.frame(mlbench::mlbench.friedman3(1000))
y.tst <- d.tst$y
x.tst <- d.tst[, colnames(d.tst)!= "y"]
yhat <- predict(o, x.tst)$predicted
cat("test set mse:", mean((yhat - y.tst)^2), "\n")
## ------------------------------------------------------------
##
## restore a trained super greedy forest using boston
##
## ------------------------------------------------------------
## run sgf on boston
data(BostonHousing, package = "mlbench")
o <- rfsgt(medv~., BostonHousing)
print(o)
## restore the forest
print(predict(o))
## ------------------------------------------------------------
##
## coherence check using boston housing with factors
##
## ------------------------------------------------------------
## boston housing data: make factors
data(BostonHousing, package = "mlbench")
Boston <- BostonHousing[1:40,]
Boston$zn <- factor(Boston$zn)
Boston$chas <- factor(Boston$chas)
Boston$lstat <- factor(round(0.2 * Boston$lstat))
Boston$nox <- factor(round(20 * Boston$nox))
Boston$rm <- factor(round(Boston$rm))
## grow a single tree - save inbag information
o <- rfsgt(medv~., Boston, hcut=2, filter=FALSE, ntree=1, membership=TRUE, nodesize=3)
## coherence matrix
pred <- data.frame(
inbag=o$inbag,
pred.inb=o$predicted,
pred.oob=o$predicted.oob,
pred.inb.restore=predict(o)$predicted,
pred.oob.restore=predict(o)$predicted.oob,
pred.test=predict(o,Boston)$predicted)
print(pred)
## coherence check
cat("coherence for inbag data:", sum(pred$pred.inb-pred$pred.test,na.rm=TRUE)==0, "\n")
cat(" coherence for oob data:", sum(pred$pred.oob-pred$pred.test,na.rm=TRUE)==0, "\n")
## canonical example of train/test with prediction
trn <- sample(1:nrow(Boston), nrow(Boston)/2, replace=FALSE)
o.trn <- rfsgt(medv~., Boston[trn,], hcut=2)
predict(o.trn,Boston[-trn,])
## ------------------------------------------------------------
## prediction using tuning hcut and pre-filtering with tune.hcut
## ------------------------------------------------------------
## fit the forest to the tuned hcut
dta <- data.frame(mlbench::mlbench.friedman3(500))
f <- tune.hcut(y~., dta, hcut=5, verbose=TRUE)
o <- rfsgt(y~., dta, filter=f)
print(o)
## test the tuned forest on new data
print(predict(o, data.frame(mlbench::mlbench.friedman3(25000))))
## over-ride the optimized hcut
o2 <- rfsgt(y~., dta, filter=use.tune.hcut(f, hcut=2))
print(o2)
print(predict(o2, data.frame(mlbench::mlbench.friedman3(25000))))
}
Print Output from a Random Forest Super Greedy Tree Analysis
Description
Print summary output from a Random Forest SGT analysis. This is the default print method for the package.
Usage
## S3 method for class 'rfsgt'
print(x, ...)
## S3 method for class 'vimp.rfsgt'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to or from other methods. |
Value
Called for its side effect of printing a summary of an rfsgt
grow or predict object. The return value is NULL.
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
Random Forest Super Greedy Trees
Description
Grow a forest of Super Greedy Trees (SGTs) using lasso. In addition to prediction, the fitted forest supports local beta-value and partial-contribution summaries that show how predictions are assembled.
Usage
rfsgt(formula,
data,
ntree = if (hcut == 0) 500 else 100,
hcut = 1,
treesize = NULL,
nodesize = NULL,
filter = (hcut > 1),
bsf = if (hcut > 0) "oob" else "inbag",
keep.only = NULL,
fast = TRUE,
pure.lasso = FALSE,
eps = .005,
maxit = 500,
nfolds = 10,
block.size = 10,
bootstrap = c("by.root", "none", "by.user"),
samptype = c("swor", "swr"), samp = NULL, membership = TRUE,
sampsize = if (samptype == "swor") function(x){x * .632} else function(x){x},
seed = NULL,
do.trace = FALSE,
...)
Arguments
formula |
Formula describing the model to be fit. |
data |
Data frame containing response and features. |
ntree |
Number of trees to grow. |
hcut |
Integer value indexing type of parametric regression model to use for splitting. See details below. |
treesize |
Function specifying upper bound for size of tree
(number of terminal nodes) where first input is |
nodesize |
Minumum size of terminal node. Set internally if not specified. |
filter |
Logical value specifying whether dimension reduction
(filtering) of features should be performed.Can also be specified
using the helper function |
bsf |
Best split first (BSF) empirical risk minimization
strategy. Accepted values are |
keep.only |
Character vector specifying the features of interest.
The data is pre-filtered to keep only these requested variables.
Ignored if filter is specified using |
fast |
Use fast filtering? |
pure.lasso |
Logical value specifying whether lasso splitting should be strictly adhered to. In general, lasso splits are replaced with CART whenever numerical instability occurs (for example, small node sample sizes may make it impossible to obtain the cross-validated lasso parameter). This option will generally produce shallow trees which may not be appropriate in all settings. |
eps |
Parameter used by |
maxit |
Parameter used by |
nfolds |
Number of cross-validation folds to be used for the lasso. |
block.size |
Determines how cumulative error rate is calculated. To
obtain the cumulative error rate on every nth tree, set the value to
an integer between 1 and |
bootstrap |
Bootstrap protocol. Default is |
samptype |
Type of bootstrap used when |
samp |
Bootstrap specification when |
membership |
Should terminal node membership and inbag information be returned? |
sampsize |
Function specifying bootstrap size when |
seed |
Negative integer specifying seed for the random number generator. |
do.trace |
Number of seconds between updates to the user on approximate time to completion. |
... |
Further arguments passed to |
Details
Super Greedy Trees (SGTs) are tree-based models that extend ordinary CART-style splitting in a fundamental way. In a standard tree, a split typically tests one variable at a time, so tilted, curved, or interaction-driven decision boundaries must be approximated by many small axis-aligned cuts. SGTs instead learn a sparse score inside a node and then split the node using that score. This allows a split to depend on several variables at once, so the fitted tree can represent hyperplane, elliptical, hyperboloid, and other higher-order geometric boundaries much more directly.
Operationally, the procedure is organized in stages. First, a family of
candidate score functions is chosen through hcut. Second, within
each node, a lasso model is fit by coordinate descent. Third, the fitted
node-wise score is used to order observations, and an allowable threshold
on that score is searched for the best split. The resulting daughter
nodes are then re-fit and compared using empirical risk. In this way,
the difficult multivariate split problem is converted into a manageable
one-dimensional threshold search along a learned score.
Tree growth uses a best split first (BSF) strategy. Rather than expanding nodes in strict depth-first or breadth-first order, BSF scans the current terminal nodes, measures the reduction in empirical risk for each candidate split, and grows the node giving the largest gain. This makes the search aggressive but focused: computation is directed to the part of the tree that is most promising at the current stage of growth. In a forest, repeating this over bootstrap samples yields an ensemble of trees with the flexibility of multivariate cuts and the stabilizing effect of aggregation.
The main user control for split geometry is hcut. Smaller
values give simpler cut families; larger values allow richer
polynomial and interaction structure. Thus rfsgt is able to
span random-forest-like axis-aligned splits all the way to
genuinely multivariate geometric partitioning. When the signal is
approximately linear or only mildly nonlinear, smaller hcut
values may be sufficient. When the signal involves curvature or
interactions, larger hcut values can reduce bias and often
achieve the same structural fit with fewer splits.
An equally important feature of SGTs is that the fitted forest is not only a prediction device. Because each split score and each terminal-node predictor is a sparse lasso model, the forest also carries local coefficient information. For a given observation, each tree contributes the coefficient vector from the terminal node containing that observation, and averaging across trees yields forest-level beta summaries that are usually more stable than the coefficients from any single tree. These beta values are local and data-adaptive: they can change from one region of the feature space to another, so they should be viewed as coefficient functions rather than a single global regression vector.
This gives SGTs a genuinely hybrid character. The partition of the
feature space is nonparametric, as in a tree or forest, but within each
local region the fitted response is represented by a sparse parametric
expansion. Built-in helpers such as get.beta expose this
structure by returning both beta summaries and corresponding partial
term contributions. For a main effect, the contribution is the local
beta multiplied by the covariate value; for an interaction, it is the
local interaction coefficient multiplied by the associated product term.
In many applications, these summaries can be as informative as the
prediction itself because they show which variables, nonlinear terms,
and interactions are driving the fitted value near the observation of
interest.
Parametric models used for splitting are indexed by hcut
corresponding to the following geometric regions:
-
hcut=1 (hyperplane) linear model using all variables. -
hcut=2 (ellipse) plus all quadratic terms. -
hcut=3 (oblique ellipse) plus all pairwise interactions. -
hcut=4 plus all polynomials of degree 3 of two variables. -
hcut=5 plus all polynomials of degree 4 of three variables. -
hcut=6 plus all three-way interactions. -
hcut=7 plus all four-way interactions.
Setting hcut=0 gives CART splits where cuts are parallel to the
coordinate axis (axis-aligned cuts). Thus, hcut=0 is similar to
random forests and can be viewed as the baseline case of the SGT
framework.
A major part of the implementation is devoted to regularization and
stabilization, because richer cut families can otherwise become unstable
in small nodes or in the presence of collinearity. The first safeguard
is the lasso itself. At each node, the split-defining score is fit with
an \ell_1 penalty, and the penalty is chosen by cross-validation.
This induces sparsity, controls local complexity, and lets the procedure
adapt to the amount of information available in the node. Near the root,
where sample sizes are larger, the fitted score can support richer
geometry. Deeper in the tree, lasso sparsity and smaller node sizes tend
to simplify the split automatically.
A second safeguard is feature filtering. When the predictor dimension is
moderate or large, the candidate dictionary implied by hcut can be
very large. The implementation can therefore pre-filter variables using
shallow pilot fits and retain only variables that appear with nonzero
lasso coefficients. This reduces runtime and variance before the final
forest is grown. The helper tune.hcut is the intended front-end
for this step: it can pre-filter variables and also choose a suitable
hcut value before the final call to rfsgt. In practice,
this is often the safest workflow when interactions or higher-order terms
are being considered.
A third safeguard is automatic simplification of a branch when the lasso modeling is no longer paying off. The algorithm then replaces the lasso-induced split by the best CART coordinate-threshold split at that node and in place of local lasso node estimators, simple CART-style sample-average predictors are used along that branch. In other words, in the presence of numerical instability, the procedure can simplify both the split geometry and the local node model, which prevents unstable or unnecessary parametric structure from being pushed deeper into the tree.
The argument pure.lasso controls whether this simplification is
allowed. With the default behavior, a branch may switch from the
richer lasso-based representation to a simpler CART-style
representation when the latter is more stable or gives better
empirical risk reduction. Setting pure.lasso=TRUE shuts this
switch off. This is useful when a user wants a fully lasso-defined
tree for methodological reasons, but in difficult data settings it can
also lead to shallower trees because branches that would otherwise be
stabilized by CART are instead left unsplit.
The argument bsf, although related, addresses a different
question. It does not turn CART fallback on or off. Instead, it
decides which data drive empirical risk minimization during BSF
search. With bsf="inbag", the split comparison is based on
in-bag data. With bsf="oob", the same comparison uses
out-of-bag (OOB) data as a held-out check. This indirectly has
implications when deciding on CART fallback, however, because it
allows the algorithm to use out of sample data to decide whether the
current SGT candidate really improves generalization relative to the
best CART candidate at the same node. If the CART candidate wins
under the chosen risk criterion, the branch is simplified exactly as
described above: CART split geometry is used and CART-style node
estimators replace the local lasso fits down that branch. This guard
is especially helpful under potentially numerically unstable models
when hcut > 0. As with any model-selection procedure that
reuses held-out data, however, users should remember that the reported
OOB error can then be mildly optimistic.
From a practical point of view, the main tuning parameters have clear
roles. Use hcut to control cut richness, treesize and
nodesize to control tree complexity, filter or
tune.hcut when the feature space is large, and
pure.lasso only when you explicitly want to over-ride CART
fallback. Users new to the method can usually start with the
defaults. Users working with interaction-heavy or high-dimensional
data will typically benefit from filtering, tune.hcut, and
OOB-based BSF. Users interested mainly in prediction can simply call
predict.rfsgt. Users interested in the local parametric
structure can query the same fitted forest with get.beta to
recover beta summaries and partial contributions without fitting a
separate surrogate model.
Value
An object of class c("rfsgt", "grow", family) containing the
trained super greedy forest and associated training-data summaries. The
object is a list with the following components:
familyModel family.
nNumber of observations in the training data.
bootstrapBootstrap protocol used to grow the forest.
samptypeSampling type used for root-node bootstrap samples.
sampsizeTree sample size used by the bootstrap protocol.
ntreeNumber of trees grown.
hcutFinal
hcutvalue used to construct the splitting model.splitruleSplit rule used by the forest.
splitinfoInternal split-rule metadata, including the processed
hcut, split-rule index, number of random split points, and lasso cross-validation setting.yvarTraining response values.
yvar.namesResponse variable name.
yvar.factorFactor-level metadata for the response.
yvar.typesInternal response type code.
xvarTraining predictor data after hot encoding, optional filtering, and any user-requested variable restriction.
xvar.augmentAugmented base-learner data used for higher-order
hcutterms, orNULLwhen no augmented terms are used.xvar.namesNames of the retained predictor variables.
xvar.typesInternal predictor type codes.
xvar.infoPredictor-encoding metadata used internally; this is
NULLfor standard grow objects.xvar.augment.namesNames of augmented base-learner terms, or
NULLwhen no augmented terms are used.term.mapTerm map describing how generated base-learner terms correspond to original variables and powers.
forestStored forest object used by
predict.rfsgt. This component contains the native node array, per-tree leaf counts, terminal-node offsets, bootstrap membership identifiers, fitting options, and package-version metadata.nodeStatNode statistics returned when empirical-risk output is available; otherwise
NULL.empr.riskInbag empirical-risk values by candidate tree size and tree, or
NULLwhen unavailable.oob.empr.riskOut-of-bag empirical-risk values by candidate tree size and tree, or
NULLwhen unavailable.empr.risk.cartInbag empirical-risk values for CART fallback splits, or
NULLwhen unavailable.oob.empr.risk.cartOut-of-bag empirical-risk values for CART fallback splits, or
NULLwhen unavailable.bsfBest-split-first empirical-risk strategy used.
bsf.orderBest-split-first node expansion order, or
NULLwhen unavailable.predictedTraining-data ensemble predictions.
predicted.oobOut-of-bag training-data predictions when available; otherwise
NULL.membershipMatrix of terminal-node membership by observation and tree when
membership = TRUE; otherwiseNULL.inbagMatrix of bootstrap counts by observation and tree when
membership = TRUE; otherwiseNULL.ensembleInternal ensemble type used for prediction summaries.
err.rateCumulative prediction error, typically the out-of-bag mean squared error for regression when available; otherwise
NULL.ctime.internalTiming information reported by the native grow routine.
ctime.externalElapsed R-side timing, computed from
proc.time().
Author(s)
Hemant Ishwaran and Udaya B. Kogalur
References
Ishwaran H. (2025). Super greedy trees. To appear in Artificial Intelligence Review.
See Also
Examples
## ------------------------------------------------------------
##
## mtcars: for CRAN testing
##
## ------------------------------------------------------------
print(rfsgt(mpg~., mtcars, ntree=3, treesize=1))
## ------------------------------------------------------------
##
## boston housing
##
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## load the data
data(BostonHousing, package = "mlbench")
## default basic call
print(rfsgt(medv~., BostonHousing))
## variable selection
sort(vimp.rfsgt(medv~.,BostonHousing))
## examples of hcut=0 (similar to random forests ... but using BSF)
print(rfsgt(medv~., BostonHousing, hcut=0))
print(rfsgt(medv~., BostonHousing, hcut=0, nodesize=1))
## hcut=1 with smaller nodesize
print(rfsgt(medv~., BostonHousing, nodesize=1))
## ------------------------------------------------------------
##
## boston housing with factors
##
## ------------------------------------------------------------
## load the data
data(BostonHousing, package = "mlbench")
## make some features into factors
Boston <- BostonHousing
Boston$zn <- factor(Boston$zn)
Boston$chas <- factor(Boston$chas)
Boston$lstat <- factor(round(0.2 * Boston$lstat))
Boston$nox <- factor(round(20 * Boston$nox))
Boston$rm <- factor(round(Boston$rm))
## random forest: hcut=0
print(rfsgt(medv~., Boston, hcut=0, nodesize=1))
## hcut=3
print(rfsgt(medv~., Boston, hcut=3))
## ------------------------------------------------------------
##
## ozone
##
## ------------------------------------------------------------
## load the data
data(Ozone, package = "mlbench")
print(rfsgt(V4~., na.omit(Ozone), hcut=0, nodesize=1))
print(rfsgt(V4~., na.omit(Ozone), hcut=1))
print(rfsgt(V4~., na.omit(Ozone), hcut=2))
print(rfsgt(V4~., na.omit(Ozone), hcut=3))
}
## ------------------------------------------------------------
##
## non-linear boundary illustrates hcut using single tree
##
## ------------------------------------------------------------
## simulate non-linear boundary
n <- 500
p <- 5
signal <- 10
treesize <- 10
ngrid <- 200
## train
x <- matrix(runif(n * p), ncol = p)
fx <- signal * sin(pi * x[, 1] * x[, 2])
nl2d <- data.frame(y = fx, x)
## truth
x1 <- x2 <- seq(0, 1, length = ngrid)
truth <- signal * sin(outer(pi * x1, x2, "*"))
## test
x.tst <- do.call(rbind, lapply(x1, function(x1j) {
cbind(x1j, x2, matrix(runif(length(x2) * (p-2)), ncol=(p-2)))
}))
colnames(x.tst) <- colnames(x)
fx.tst <- signal * sin(pi * x.tst[, 1] * x.tst[, 2])
nl2d.tst <- data.frame(y = fx.tst, x.tst)
## SGT for different hcut values
## Enforce pure lasso
rO <- lapply(0:4, function(hcut) {
cat("hcut", hcut, "\n")
rfsgt(y~., nl2d, ntree=1, hcut=hcut, treesize=treesize, bootstrap="none",
pure.lasso = TRUE, nodesize=1, filter=FALSE)
})
## nice little wrapper for plotting results
if (library("interp", logical.return = TRUE)) {
## nice little wrapper for plotting results
plot.image <- function(x, y, z, linear=TRUE, nlevels=40, points=FALSE) {
xo <- x; yo <- y
so <- interp(x=x, y=y, z=z, linear=linear, nx=nlevels, ny=nlevels)
x <- so$x; y <- so$y; z <- so$z
xlim <- ylim <- range(c(x, y), na.rm = TRUE, finite = TRUE)
z[is.infinite(z)] <- NA
zlim <- q <- quantile(z, c(.01, .99), na.rm = TRUE)
z[z<=q[1]] <- q[1]
z[z>=q[2]] <- q[2]
levels <- pretty(zlim, nlevels)
col <- hcl.colors(length(levels)-1, "YlOrRd", rev = TRUE)
plot.new()
plot.window(xlim, ylim, "", xaxs = "i", yaxs = "i", asp = NA)
.filled.contour(x, y, z, levels, col)
axis(1);axis(2)
if (points)
points(xo,yo ,pch=16, cex=.25)
box()
invisible()
}
oldpar <- par(mfrow=c(3,2))
image(x1, x2, truth, xlab="", ylab="")
contour(x1, x2, truth, nlevels = 15, add = TRUE, drawlabels = FALSE)
mtext(expression(x[1]),1,line=2)
mtext(expression(x[2]),2,line=2)
mtext(expression("truth"),3,line=1)
pO <- lapply(0:4, function(j) {
plot.image(nl2d.tst[,"X1"],nl2d.tst[,"X2"], predict(rO[[j+1]], nl2d.tst)$predicted)
contour(x1, x2, truth, nlevels = 15, add = TRUE, drawlabels = FALSE)
mtext(expression(x[1]),1,line=2)
mtext(expression(x[2]),2,line=2)
mtext(paste0("hcut=",j),3,line=1)
NULL
})
par(oldpar)
}
## ------------------------------------------------------------
##
## friedman illustration of OOB empirical risk
##
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## simulate friedman
n <- 500
dta <- data.frame(mlbench::mlbench.friedman1(n, sd=0))
## rf versus rfsgt
o1 <- rfsgt(y~., dta, hcut=0, block.size=1)
o2 <- rfsgt(y~., dta, hcut=3, block.size=1)
## compute running average of OOB empirical risk
runavg <- function(x, lag = 8) {
x <- c(na.omit(x))
lag <- min(lag, length(x))
cx <- c(0,cumsum(x))
rx <- cx[2:lag] / (1:(lag-1))
c(rx, (cx[(lag+1):length(cx)] - cx[1:(length(cx) - lag)]) / lag)
}
risk1 <- lapply(data.frame(o1$oob.empr.risk), runavg)
leaf1 <- o1$forest$leafCount
risk2 <- lapply(data.frame(o2$oob.empr.risk), runavg)
leaf2 <- o2$forest$leafCount
## compare OOB empirical tree risk to OOB forest error
oldpar <- par(mfrow=c(2,2))
plot(c(1,max(leaf1)), range(c(risk1)), type="n",
xlab="Tree size", ylab="RF OOB empirical risk")
l1 <- do.call(rbind, lapply(risk1, function(rsk){
lines(rsk,col=grey(0.8))
cbind(1:length(rsk), rsk)
}))
lines(tapply(l1[,2], l1[,1], mean), lwd=3)
plot(c(1,max(leaf2)), range(c(risk2)), type="n",
xlab="Tree size", ylab="SGF OOB empirical risk")
l2 <- do.call(rbind, lapply(risk2, function(rsk){
lines(rsk,col=grey(0.8))
cbind(1:length(rsk), rsk)
}))
lines(tapply(l2[,2], l2[,1], mean), lwd=3)
plot(1:o1$ntree, o1$err.rate, type="s", xlab="Trees", ylab="RF OOB error")
plot(1:o2$ntree, o2$err.rate, type="s", xlab="Trees", ylab="SGF OOB error")
par(oldpar)
}
## ------------------------------------------------------------
##
## synthetic regression examples with different hcut
##
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## simulation functions
sim <- list(
friedman1=function(n){data.frame(mlbench::mlbench.friedman1(n))},
friedman2=function(n){data.frame(mlbench::mlbench.friedman2(n))},
friedman3=function(n){data.frame(mlbench::mlbench.friedman3(n))},
peak=function(n){data.frame(mlbench::mlbench.peak(n, 10))},
linear=function(n, sd=.1){
x=matrix(runif(n*10), n)
y=3*x[,1]^3-2*x[,2]^2+3*x[,3]+rnorm(n,sd=sd)
data.frame(y=y,x)
})
## run rfsgt on the simulations
n <- 500
max.hcut <- 3
results <- setNames(lapply(names(sim), function(nm) {
cat("simulation:", nm, "\n")
d <- sim[[nm]](n=n)
rO <- data.frame(do.call(rbind, lapply(0:max.hcut, function(hcut) {
cat(" hcut:", hcut, "\n")
o <- rfsgt(y~.,d,hcut=hcut)
c(hcut, tail(o$err.rate, 1), tail(o$err.rate, 1) / var(o$yvar))
})))
colnames(rO) <- c("hcut", "mse", "smse")
rO
}), names(sim))
## print results
print(results)
## ------------------------------------------------------------
##
## synthetic regression example showing how to tune hcut
##
## ------------------------------------------------------------
hcut.opt <- setNames(sapply(names(sim), function(nm) {
cat("optimize hcut for simulation:", nm, "\n")
f <- tune.hcut(y~., sim[[nm]](n=n), hcut=4)
attr(f, "hcut")
}), names(sim))
## print the optimal hcut
print(hcut.opt)
}
## ------------------------------------------------------------
##
## iowa housing data
##
## ------------------------------------------------------------
data(housing, package = "randomForestSRC")
## remove PID
housing$PID <- NULL
## rough missing data imputation
d <- randomForestSRC::impute(data = data.frame(data.matrix(housing)))
d$SalePrice <- log(d$SalePrice)
d <- data.frame(data.matrix(d))
print(rfsgt(SalePrice~.,d))
## ------------------------------------------------------------
##
## high-dimensional model with variable selection
##
## ------------------------------------------------------------
## simulate big p small n data
n <- 50
p <- 500
d <- data.frame(y = rnorm(n), x = matrix(rnorm(n * p), n))
## we have a big p small n pure noise setting: let's see how well we do
cat("variables selected by vimp.rfsgt:\n")
vmp <- sort(vimp.rfsgt(y~.,d))
print(vmp[vmp > .05])
## internal filtering function can also be used
cat("variables selected by filter.rfsgt:\n")
print(filter.rfsgt(y~.,d, method="conserve"))
## ------------------------------------------------------------
##
## pre-filtering using keep.only
##
## ------------------------------------------------------------
if (requireNamespace("mlbench", quietly = TRUE)) {
## simulate the data
n <- 100
p <- 50
noise <- matrix(runif(n * p), ncol=p)
dta <- data.frame(mlbench::mlbench.friedman1(n, sd=0), noise=noise)
## filter the variables
f <- filter.rfsgt(y~., dta)
## use keep.only to pre-filter the features
print(rfsgt(y~.,dta, keep.only=f, hcut=1))
print(rfsgt(y~.,dta, keep.only=f, hcut=2))
print(rfsgt(y~.,dta, keep.only=f, hcut=3))
## ------------------------------------------------------------
##
## tuning hcut and pre-filtering using tune.hcut
##
## ------------------------------------------------------------
## simulate the data
n <- 100
p <- 50
noise <- matrix(runif(n * p), ncol=p)
dta <- data.frame(mlbench::mlbench.friedman1(n, sd=0), noise=noise)
## tune hcut
f <- tune.hcut(y~., dta, hcut=3)
## use the optimized hcut
print(rfsgt(y~.,dta, filter=f))
## over-ride the tuned hcut value
print(rfsgt(y~.,dta, filter=use.tune.hcut(f, hcut=1)))
print(rfsgt(y~.,dta, filter=use.tune.hcut(f, hcut=2)))
print(rfsgt(y~.,dta, filter=use.tune.hcut(f, hcut=3)))
## ------------------------------------------------------------
##
## get local beta values and partial contributions
##
## SGTs are not only predictive; the same fit can be queried
## for local parametric summaries. We use friedman 1 for
## illustration.
##
## ------------------------------------------------------------
n <- 100
dta <- data.frame(mlbench::mlbench.friedman1(n))
o <- rfsgt(y~., dta, hcut=3, pure.lasso=TRUE, treesize=10)
bO <- get.beta(o)
print(str(bO$beta))
print(str(bO$partial))
}
Show the NEWS file
Description
Show the NEWS file of the randomForestSGT package.
Usage
rfsgt.news(...)
Arguments
... |
Further arguments passed to or from other methods. |
Value
None.
Author(s)
Hemant Ishwaran and Udaya B. Kogalur