The ml package implements the split-fit-evaluate-assess
workflow from Hastie, Tibshirani, and Friedman (2009), Chapter 7. The
key idea: keep a held-out test set sacred until you are done
experimenting, then assess once.
Formula interfaces are not supported. Pass the data
frame and target column name as a string:
ml_fit(data, "target", seed = 42).
Before modeling, understand what you have:
Three-way split (60/20/20). Stratified by default for classification.
Access partitions with $train, $valid,
$test. The $dev property combines train and
valid for final retraining.
Find candidates quickly before tuning:
Iterate freely on the validation set:
Gate your model before final assessment:
The final exam. Call this only when done experimenting.
All functions are also available via the ml$verb()
pattern, which mirrors Python’s import ml; ml.fit(...):
The same workflow applies to regression:
| Algorithm | Classification | Regression | Package |
|---|---|---|---|
| “logistic” | yes | – | base R (‘nnet’) |
| “xgboost” | yes | yes | ‘xgboost’ |
| “random_forest” | yes | yes | ‘ranger’ |
| “linear” (Ridge) | – | yes | ‘glmnet’ |
| “elastic_net” | – | yes | ‘glmnet’ |
| “svm” | yes | yes | ‘e1071’ |
| “knn” | yes | yes | ‘kknn’ |
| “naive_bayes” | yes | – | ‘naivebayes’ |
LightGBM support is planned for v1.1.