Using the fabPrediction Package to obtain Nonparametric Prediction Regions Using Indirect Information

Elizabeth Bersson

December 2023

The fabPrediction package provides functions to implement FAB conformal prediction algorithms for continuous data as in Bersson and Hoff, 2022 and categorical counts data as in Bersson and Hoff, 2023. The package also includes capabilities to obtain standard direct and Bayesian prediction regions for both types of data.

Installation

devtools::install_github("betsybersson/fabPrediction")
library(fabPrediction)

or

install_packages("fabPrediction")
library(fabPrediction)

Demo - Continuous Data

We will demonstrate FAB prediction with log radon values from counties in Minnesota. See Price et al., 1996.

data(radon)
data(W)

Single Group Analysis

If we have prior information regarding the population mean of the response, we can obtain FAB prediction regions for one county.

y_county9 = radon$radon[radon$group==9]
A FAB prediction region, although nonparametric, constructs prediciton intervals based on a standard normal-normal model, \[\begin{aligned} y \sim{}& N(\theta,\sigma^2)\\ \theta \sim{}& N(\mu,\tau^2\sigma^2), \end{aligned}\]

and the algorithm requires input values of \(\{\mu,\tau^2\}\).

Say we are fairly confident log radon values in County 9 should be centered at 0.5. Then, we can obtain a prediction region with the fab_prediction function with a nonparametric guarantee of \(1-\alpha\) conservative coverage.

fab.region = predictionInterval(y_county9,method = "FAB",
                          alpha = .15,
                          mu = 0.5,tau2 = 1)
fab.region$bounds

So, with at least \(85\%\) probability, the next observed radon value in County 9 will be in the range contained in the object fab.region. We can easily plot the output:

plot(fab.region,
     main="FAB Prediction Interval For County 9",xlab="log(radon)")

We can easily compare this the distance-to-average conformal prediction region.

plot(predictionInterval(y_county9,method = "DTA",alpha = .15),
     main="DTA Prediction Interval For County 9",xlab="log(radon)")

The functions include capabilities to obtain and plot standard frequentist and Bayesian prediction intervals based on a normal model. Unlike the FAB and DTA methods, the coverage of these methods rely on, among other things, accuracy of distributional assumptions.

predictionInterval(y_county9,method = "direct",alpha = .15)$bounds
predictionInterval(y_county9,method = "Bayes",alpha = .15,
                   mu=0.5, tau2=1)$bounds

Multiple Group Analysis

The package also features capability to handle multiple groups. Spatial relationships among the groups and covariates can be used to estimate conformal parameters.

params = fayHerriotEB(9,radon$radon,radon$group,W,X=rep(1,nrow(W)))
plot(predictionInterval(y_county9,method = "FAB",alpha = .15,
                        mu = params$mu, tau2 = params$tau2),
     main="FAB Prediction Interval For County 9 Using Indirect Information", xlab="log(radon)")

Demo - Categorical Data

We will demonstrate constructing prediction sets for categorical data on a simulated dataset consisting of 50 categories and 5 groups with varying within-group sample sizes.

N.groups = c(10,50,75,100,150)

set.seed(1)
prob = rdirichlet(50:1)
y = t(sapply(N.groups,function(j)rmultinom(1,j,prob)))

Single Group Analysis

We construct prediction sets for categorical data based on a Multinomial-Dirichlet working model: \[\begin{aligned} y\sim{}& MN_k(\theta,N)\\ \theta\sim{}& Dirichlet_K(\gamma). \end{aligned}\]

If we have prior information regarding the prior concentration parameter for a given group, we can construct a FAB prediction set for that group.

y_group3 = y[3,]
fab.set = predictionSet(y_group3,method = "FAB",
                        gamma = c(50:1))

And, we can plot the categories included in the prediction set:

plot(fab.set, main = "FAB Prediction Set for Group 3",
     cex.axis=.5)

Similarly, we can construct and plot a direct prediction set for the same data:

plot(predictionSet(y_group3,method = "direct"), 
     main = "Direct Prediction Set for Group 3",
     cex.axis=.5)

We can also construct a Bayesian prediction set based on this model, using the same prior as before in the FAB approach. Note, though, that this prediction set does not necessarily maintain the nominal coverage rate.

plot(predictionSet(y_group3,method = "Bayes",gamma = c(50:1)), 
     main = "Bayes Prediction Set for Group 3",
     cex.axis=.5)

Multiple Group Analysis

The package also features capability to handle multiple groups. To construct a prediction set for group 3 in this way, the prior concentration \(\gamma\) can be estimated with data from all groups except 3.

gamma0 = polyaMLE(y[-3,], method="separate")

This prior can be used as the prior information in constructing a FAB prediction set.

plot(predictionSet(y_group3,method = "FAB",gamma = gamma0), 
     main = "FAB Prediction Set for Group 3 using Indirect Information",
     cex.axis=.5)