localLLM provides an easy-to-use interface to run
large language models (LLMs) directly in R. It uses the performant
llama.cpp library as the backend and allows you to generate
text and analyze data with LLMs. Everything runs locally on your own
machine, completely free, with reproducibility by default.
Getting started requires two simple steps: installing the R package and downloading the backend C++ library.
The install_localLLM() function automatically detects
your platform and downloads the appropriate pre-compiled library. GPU
acceleration is selected automatically when a compatible GPU driver is
detected:
| Platform | GPU backend | Detection method |
|---|---|---|
| macOS (Apple Silicon) | Metal | always enabled |
| macOS (Intel) | Metal | always enabled |
| Windows (x86-64) | Vulkan | vulkan-1.dll present in System32 |
| Linux (x86-64) | Vulkan | Vulkan loader + hardware ICD file present |
On Windows and Linux, if no GPU driver is found, the CPU build is installed automatically.
The simplest way to get started is with
quick_llama():
#> The capital of France is Paris.
quick_llama() is a high-level wrapper designed for
convenience. On first run, it automatically downloads and caches the
default model (Llama-3.2-3B-Instruct-Q5_K_M.gguf).
A common use case is classifying text. Here’s a sentiment analysis example:
response <- quick_llama(
'Classify the sentiment of the following tweet into one of two
categories: Positive or Negative.
Tweet: "This paper is amazing! I really like it."'
)
cat(response)#> The sentiment of this tweet is Positive.
quick_llama() can handle different types of input:
# Process multiple prompts at once
prompts <- c(
"What is 2 + 2?",
"Name one planet in our solar system.",
"What color is the sky?"
)
responses <- quick_llama(prompts)
print(responses)#> [1] "2 + 2 equals 4."
#> [2] "One planet in our solar system is Mars."
#> [3] "The sky is typically blue during the day."
The localLLM backend only supports models in the GGUF
format. You can find thousands of GGUF models on Hugging Face:
.gguf file# From Hugging Face URL
response <- quick_llama(
"Explain quantum physics simply",
model_path = "https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF/resolve/main/gemma-3-4b-it-qat-Q5_K_M.gguf"
)
# From local file
response <- quick_llama(
"Explain quantum physics simply",
model_path = "/path/to/your/model.gguf"
)
# From cache (name fragment)
response <- quick_llama(
"Explain quantum physics simply",
model_path = "Llama-3.2"
)#> name size_bytes modified
#> 1 Llama-3.2-3B-Instruct-Q5_K_M.gguf 2322153920 2025-12-05 20:01:18
#> 2 gemma-3-4b-it-qat-Q5_K_M.gguf 2829698176 2025-12-14 19:21:11
Control the output with various parameters: