Sorted L-One Penalized Estimation (SLOPE) is the procedure of minimizing objectives of the type \[ \mathrm{minimize}\left\{F(\beta_0, \beta; X, y) + J(\beta; \alpha,\lambda) \right\}, \] where \[ J(\beta; \alpha,\lambda) = \alpha \sum_{j=1}^p \lambda_j \lvert \beta \rvert_{(j)}, \] with \(\alpha \in \mathbb{R}_{+}\), \(\lambda \in \mathbb{R}_+^p\) and \((j)\) represents an rank of the magnitudes of \(\beta\) in descending order. \(\lambda\) controls the shape of the penalty sequence, which needs to be non-increasing, and \(\alpha\) controls the scale of that sequence.
\(X\) and \(Y\) are the design matrix and response matrix, respectively, which are of dimensions \(n \times p\) and \(n \times m\) respectively. Except for multinomial logistic regression, \(m = 1\).
We assume that \(F\) takes the following form: \[ F(\beta_0, \beta) = \frac{1}{n} \sum_{i=1}^n f(\beta_0 + x_i^\intercal \beta, y_i), \] where \(f\) is a smooth and convex loss function from the family of generalized linear models (GLMs), \(x_i\) is the \(i\)th row of the design matrix \(X\), and \(y_i\) is the \(i\)th row of the response matrix \(Y\).
SLOPE currently supports four different models from the family of generalized linear models (GLMs):
Due to the way GLMs are formulated, each model is defined by three components: a loss function \(f(\eta, y)\), a link function \(g(\mu)\), and an inverse link function \(g^{-1}(\eta)\). Here, \(\eta\) is the linear predictor, and \(\mu\) is the mean of the response variable.
| Model | \(f(\eta, y)\) | \(g(\mu)\) | \(g^{-1}(\eta)\) |
|---|---|---|---|
| Gaussian | \(\frac{1}{2}(y - \eta)^2\) | \(\mu\) | \(\eta\) |
| Binomial | \(\log(1 + e^\eta) - \eta y\) | \(\log \left(\frac{\mu}{1 - \mu}\right)\) | \(\frac{e^\eta}{1 + e^\eta}\) |
| Poisson | \(e^\eta - \eta y\) | \(\log(\mu)\) | \(e^\eta\) |
| Multinomial | \(\sum_{k=1}^{m-1}\left( \log \left( 1 + \sum_{j=1}^{m-1} e^{\eta_j}\right) - y_k \eta_k \right)\) | \(\log\left(\frac{\mu}{1 - \mu}\right)\) | \(\frac{\exp(\eta)}{1 + \sum_{j=1}^{m-1} e^{\eta_j}}\) |
Gaussian regression, also known as least-squares regression, is used for continuous response variables, and takes the following form: \[ f(\beta, \beta_0; X, y) = \frac{1}{2n} \sum_{i=1}^n \left(y_i - x_i^\intercal \beta - \beta_0\right)^2. \]
You select it by setting family = "gaussian" in the
SLOPE() function. In the following example, we fit a
Gaussian SLOPE model to the bodyfat data set, which is
included in the package:
Often it’s instructive to look at the solution path of the fitted model, which you can do by simply plotting the fitted object:
We can also print a summary of the fitted model:
summary(fit_gaussian)
#>
#> Call:
#> SLOPE(x = bodyfat$x, y = bodyfat$y, family = "gaussian")
#>
#> Family: gaussian
#> Observations: 252
#> Predictors: 13
#> Intercept: Yes
#>
#> Regularization path:
#> Length: 82 steps
#> Alpha range: 0.00136 to 2.55
#> Deviance ratio range: 0 to 0.749
#> Null deviance: 69.8
#>
#> Path summary (first and last 5 steps):
#> alpha deviance_ratio n_nonzero
#> 2.55000 0.000 0
#> 2.32000 0.112 1
#> 2.12000 0.206 1
#> 1.93000 0.283 1
#> 1.76000 0.347 1
#> . . .
#> 0.00197 0.749 13
#> 0.00180 0.749 13
#> 0.00164 0.749 13
#> 0.00149 0.749 13
#> 0.00136 0.749 13SLOPE also contains standard methods for returning coefficients and
making predictions. We can use coef() with a specified
level of regularization alpha, to obtain the coefficients
at that level (or omit alpha to get the coefficients for
the full path).
coef(fit_gaussian, alpha = 0.05)
#> 14 x 1 sparse Matrix of class "dgCMatrix"
#>
#> [1,] -8.23495024
#> [2,] 0.06136933
#> [3,] -0.02255158
#> [4,] -0.14420771
#> [5,] -0.33756633
#> [6,] .
#> [7,] 0.78053158
#> [8,] -0.06724996
#> [9,] 0.04675465
#> [10,] .
#> [11,] .
#> [12,] 0.05947536
#> [13,] 0.31293830
#> [14,] -1.51314119To make predictions on new data, we can use the
predict() function, specifying the fitted model, new design
matrix x, and the level of regularization
alpha:
Logistic regression is used for binary classification problems, and takes the following form: \[ f(\beta_0, \beta; X, y) = \frac{1}{n} \sum_{i=1}^n \left\{ \log\left(1 + \exp\left(x_i^T \beta + \beta_0\right)\right) - y_i x_i^T \beta \right\}, \] where \(X_i\) is the \(i\)th row of the design matrix \(X\).
You select it by setting family = "binomial" in the
SLOPE() function.
You can plot the solution path, print a summary, extract coefficients, and make predictions in the same way as for Gaussian regression:
Poisson regression is used for count data, and takes the following form: \[ f(\beta_0, \beta; X, y) = \frac{1}{n} \sum_{i=1}^n \left\{ \exp\left(x_i^T \beta + \beta_0\right) - y_i x_i^T \beta \right\}. \]
You select it by setting family = "poisson" in the
SLOPE() function. Note that the solving the Poisson
regression problem is a notoriously difficult optimization problem,
which is not Lipschitz-smooth. As such, convergence may be slow. SLOPE
features safeguards to ensure convergence by modifying step sizes, but
this may lead to long computation times in some cases.
In the next example, we fit a Poisson SLOPE model to the
abalone data set, which consists of observations of
abalones.
Sometimes it may be beneficial to pick a gradient descent solver instead of the default coordinate descent solver.
Multinomial logistic regression is used for multi-class classification problems, and takes the following form: \[ f(\beta_0, \beta; X, Y) = \frac{1}{n} \sum_{i=1}^n \left\{ \log\left(1 + \sum_{k=1}^{m-1} \exp\left(x_i^T \beta_k + \beta_{0k}\right)\right) - \sum_{k=1}^{m-1} y_{ik} \left(x_i^T \beta_k + \beta_{0k}\right) \right\}, \] where \(X_i\) is the \(i\)th row of the design matrix \(X\), and \(Y\) is the response matrix of dimension \(n \times m\), where \(m\) is the number of classes.
In our package, we use the non-redundant formulation, where the last
class is treated as the baseline class. Thus, only \(m-1\) sets of coefficients are estimated.
This is not the case in other packages, such as glmnet
(Friedman, Hastie, and Tibshirani
2010).
To select multinomial logistic regression, set
family = "multinomial" in the SLOPE()
function. In the following example, we fit a multinomial logistic SLOPE
model to the wine data set, which consists of 178
observations of wines from three different cultivars. The task is to
classify the cultivar based on 13 different chemical properties.
Note that the coefficients are now a matrix. To retain sparsity, we have chosen to return the full set of coefficients along the path as a list of sparse matrices. Here we extract the coefficients for the second step in the solution path, which consists of one matrix, with one column for the coefficients of the first two classes (the third class is the baseline class):
fit_multinomial$coefficients[[2]]
#> 13 x 2 sparse Matrix of class "dgCMatrix"
#>
#> [1,] . -0.07330687
#> [2,] . .
#> [3,] . .
#> [4,] . .
#> [5,] . .
#> [6,] . .
#> [7,] 0.0259610440 .
#> [8,] . .
#> [9,] . .
#> [10,] . -0.02567089
#> [11,] . .
#> [12,] . .
#> [13,] 0.0003541809 .By default, coef() simplifies the output and returns
three lists of coefficient vectors, one for each class (excluding the
baseline class).
For similar reasons, plotting the paths is also a bit different and
in SLOPE we plot a path for each class (excluding the baseline class),
which resembles the result from calling coef(). Because we
use the base R plotting system, you need to setup a multi-panel layout
to see all the paths: