sd2R is an R package that provides a native, GPU-accelerated Stable Diffusion pipeline by wrapping the C++ implementation from stable-diffusion.cpp and using ggmlR as the tensor backend.
sd2R exposes a high-level R interface for text-to-image and image-to-image generation, while all heavy computation (tokenization, encoders, denoiser, sampler, VAE, model loading) is implemented in C++. Supports SD 1.x, SD 2.x, SDXL, and Flux model families. Targets local inference on Linux with Vulkan-enabled AMD GPUs (with automatic CPU fallback via ggml), without relying on external Python or web APIs.
Flux without Python:
R → sd2R → ggmlR → ggml → Vulkan → GPU
src/sd/): tokenizers, text
encoders (CLIP, Mistral, Qwen, UMT5), diffusion UNet/MMDiT denoiser,
samplers, VAE encoder/decoder, and model loading for
.safetensors and .gguf weights.LinkingTo) and libggml.a, reusing the same
GGML/Vulkan stack that also powers llamaR and other ggmlR-based
packages.sd_generate() — single entry
point for all generation modes. Automatically selects the optimal
strategy (direct, tiled sampling, or highres fix) based on output
resolution and available VRAM (vram_gb parameter in
sd_ctx()). Users don’t need to think about tiling at
all.verbose = FALSE
by default — no console output unless explicitly enabled. Cross-platform
build system with configure/configure.win
generating Makevars from templates.vram_gb in
sd_ctx() to override auto-detection.sd_generate_multi_gpu() distributes prompts across Vulkan
GPUs via callr, one process per GPU, with progress
reporting.device_layout parameter in sd_ctx()
distributes sub-models across multiple Vulkan GPUs within a single
process. Presets: "mono" (all on one GPU),
"split_encoders" (CLIP/T5 on GPU 1, diffusion + VAE on GPU
0), "split_vae" (CLIP/T5 + VAE on GPU 1, diffusion on GPU
0), "encoders_cpu" (text encoders on CPU). Manual override
via diffusion_gpu, clip_gpu,
vae_gpu.sd_profile_start() / sd_profile_stop() /
sd_profile_summary(). Tracks model loading, text encoding
(with CLIP/T5 breakdown), sampling, and VAE decode/encode stages.vae_decode_only = FALSE in context.vae_mode = "auto"
(default) queries free GPU memory before VAE decode and enables tiling
only when estimated peak usage exceeds available VRAM (with a 50 MB
safety reserve). Falls back to a pixel-area threshold
(vae_auto_threshold) when Vulkan memory query is
unavailable (CPU backend, no GPU). Supports per-axis relative tile
sizing (vae_tile_rel_x, vae_tile_rel_y) for
non-square aspect ratios.sd_system_info(), reporting GGML/Vulkan capabilities as
detected by ggmlR at build time.sd_pipeline() +
sd_node() for composable, sequential multi-step workflows
(txt2img → upscale → img2img → save). Pipelines are serializable to JSON
via sd_save_pipeline() /
sd_load_pipeline().pipe <- sd_pipeline(
sd_node("txt2img", prompt = "a cat in space", width = 512, height = 512),
sd_node("upscale", factor = 2),
sd_node("img2img", strength = 0.3),
sd_node("save", path = "output.png")
)
# Save / load as JSON
sd_save_pipeline(pipe, "my_pipeline.json")
pipe <- sd_load_pipeline("my_pipeline.json")
# Run
ctx <- sd_ctx("model.safetensors")
sd_run_pipeline(pipe, ctx, upscaler_ctx = upscaler)src/sd2R_interface.cpp
defines a thin bridge between R and the C API in
stable-diffusion.h, returning XPtr objects
with custom finalizers for correct lifetime management of
sd_ctx_t and upscaler_ctx_t.configure /
configure.win generate Makevars from
.in templates, resolving ggmlR paths, OpenMP, and Vulkan at
configure time. Per-target -include r_ggml_compat.h applied
only to sd/*.cpp sources to avoid macro conflicts with
system headers.DESCRIPTION declares
Rcpp and ggmlR in LinkingTo, and NAMESPACE is
generated via roxygen2 with useDynLib and Rcpp
imports..onLoad() initializes logging
and registers constant values that mirror the underlying C++ enums using
0-based indices.verbose = FALSE by default — no output unless
requested.-Winconsistent-missing-override, deprecated
codecvt).# Install ggmlR first (if not already installed)
remotes::install_github("Zabis13/ggmlR")
# Install sd2R
remotes::install_github("Zabis13/sd2R")During installation, the configure script automatically
downloads tokenizer vocabulary files (~128 MB total) from GitHub
Releases. This requires curl or wget.
If you don’t have internet access during installation, download the
vocabulary files manually and place them into src/sd/
before building:
# Download from https://github.com/Zabis13/sd2R/releases/tag/assets
# Files: vocab.hpp, vocab_mistral.hpp, vocab_qwen.hpp, vocab_umt5.hpp
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_mistral.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_qwen.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_umt5.hpp -P src/sd/
R CMD INSTALL .curl or wget (for downloading vocabulary
files during installation)libvulkan-dev +
glslc (Linux) or Vulkan SDK (Windows)CLIP-L + T5-XXL text encoders, VAE.
sample_steps = 10.
| Test | AMD RX 9070 (16 GB) | Tesla P100 (16 GB) | 2x Tesla T4 (16 GB) |
|---|---|---|---|
| 1. 768x768 direct | 44.2 s | 94.0 s | 133.1 s |
| 2. 1024x1024 tiled VAE | 163.6 s | 151.4 s | 243.6 s |
| 3. 2048x1024 highres fix | 309.7 s | 312.5 s | 492.2 s |
| 4. img2img 768x768 direct | 29.6 s | 51.0 s | 73.5 s |
| 5. 1024x1024 direct | 163.0 s | 152.2 s | 243.3 s |
| 6. Multi-GPU 4 prompts | – | – | 284.9 s (4 img) |
CLIP-L + T5-XXL (Q5_K_M) text encoders, VAE.
sample_steps = 25.
| Test | AMD RX 9070 (16 GB) | 2x Tesla T4 (16 GB) |
|---|---|---|
| 768x768 direct | 110.8 s | – |
| 1024x1024 direct | – | 553.1 s |
| SD 1.5 | Flux Q4_K_S | |
|---|---|---|
| Diffusion params | ~860 MB | ~6.5 GB |
| Text encoders | CLIP ~240 MB | CLIP-L + T5-XXL ~3.9 GB |
| Sampling per step (768x768) | ~0.1–0.3 s | ~3.9 s |
| Architecture | UNet | MMDiT (57 blocks) |
For a live, runnable demo see the Kaggle notebook: Stable Diffusion in R (ggmlR + Vulkan GPU).
MIT