Getting started with ibger

Overview

ibger provides a tidyverse-friendly interface to the IBGE Aggregate Data API (version 3). This is the same API that powers SIDRA — the automatic data retrieval system for all surveys and censuses conducted by the Brazilian Institute of Geography and Statistics (IBGE).

Each SIDRA table corresponds to an aggregate in the API. With ibger you can browse aggregates, inspect their metadata, and retrieve tidy data — all from R.

Installation

# install.packages("remotes")
remotes::install_github("StrategicProjects/ibger")
library(ibger)

A typical workflow

Step 1 — Find an aggregate

Use ibge_aggregates() to list every aggregate grouped by survey. Optional filters let you narrow the search:

# All aggregates
ibge_aggregates()
#> ✔ 1420 aggregates found.
#> # A tibble: 1,420 × 4
#>   survey_id survey_name          aggregate_id aggregate_name
#>   <chr>     <chr>                <chr>        <chr>
#> 1 AB        Abate de animais     1705         Animais abatidos …
#> 2 AB        Abate de animais     1706         Peso total das ca…
#> ...

# Monthly aggregates only
ibge_aggregates(periodicity = "P5")

# Aggregates with municipality-level data
ibge_aggregates(level = "N6")

Step 2 — Inspect the metadata

Once you have an aggregate ID, ibge_metadata() tells you everything about its structure:

meta <- ibge_metadata(1705)
meta

The print method shows a structured summary:

── Animais abatidos ──
ID: 1705
Survey: Pesquisa Trimestral do Abate de Animais
Periodicity: trimestral (200101 to 202404)
Territorial levels: N1, N2, N3

── Variables (2) ──
  284: Número de informantes (Unidades)
  285: Cabeças abatidas (Cabeças)

── Classifications (1) ──
  12529: Tipo de rebanho bovino (9 categories)
    115236: Total [level 0]
    115237: Bois [level 1]
    115238: Vacas [level 1]
    ...

Each component is accessible directly:

meta$variables
#> # A tibble: 2 × 3
#>   id    name                  unit
#>   <chr> <chr>                 <chr>
#> 1 284   Número de informantes Unidades
#> 2 285   Cabeças abatidas      Cabeças

meta$classifications
#> # A tibble: 1 × 3
#>   id    name                        categories
#>   <chr> <chr>                       <list>
#> 1 12529 Tipo de rebanho bovino      <tibble [9 × 4]>

# Unnest to see every category
tidyr::unnest(meta$classifications, categories)

# Geographic levels
meta$territorial_level
#> $administrative
#> [1] "N1" "N2" "N3"

# Time range
meta$periodicity
#> $frequency [1] "trimestral"
#> $start     [1] "200101"
#> $end       [1] "202404"

Step 3 — Retrieve data

ibge_variables() is the main workhorse. It sends a single request and returns a tidy tibble:

ibge_variables(1705, localities = "BR")
#> ✔ 12 records retrieved.
#> # A tibble: 12 × 9
#>   variable_id variable_name      variable_unit classification_12529
#>   <chr>       <chr>              <chr>         <chr>
#> 1 284         Número de inform…  Unidades      Total
#> 2 285         Cabeças abatidas   Cabeças       Total
#> ...
#>   locality_id locality_name locality_level period value
#>   <chr>       <chr>         <chr>          <chr>  <chr>
#> 1 1           Brasil        Brasil         202303 2584
#> 2 1           Brasil        Brasil         202303 7802044
#> ...

Specifying localities

The localities parameter accepts several convenient formats:

# Country total
ibge_variables(1705, localities = "BR")

# All states
ibge_variables(8884, localities = "N3")

# Specific states (RJ = 33, SP = 35)
ibge_variables(8884, localities = list(N3 = c(33, 35)))

# Mix levels: metropolitan areas + a specific municipality
ibge_variables(1705, localities = list(N7 = c(3501, 3301), N6 = 5208707))

The geographic level codes follow the IBGE convention:

Code Level Example
N1 Brazil "BR" or list(N1 = 1)
N2 Major region list(N2 = 1) — North
N3 State (UF) list(N3 = 33) — Rio de Janeiro
N6 Municipality list(N6 = 3550308) — São Paulo/SP
N7 Metropolitan area list(N7 = 3501) — RM São Paulo

Tip: Not every aggregate is available at every level. Aggregate 1705 has data for N1, N2, and N3 but not N6. Use ibge_metadata() to check.

Specifying periods

Periods follow the API convention — negative values mean “last N”:

# Last 6 periods (the default)
ibge_variables(1705, periods = -6, localities = "BR")

# Last 12 periods
ibge_variables(1705, periods = -12, localities = "BR")

# Specific period codes
ibge_variables(8884, periods = c(202301, 202302, 202303), localities = "BR")

# Range (inclusive)
ibge_variables(8884, periods = "202101-202304", localities = "BR")

# Range + extra period
ibge_variables(8884, periods = "202101-202106|202301", localities = "BR")

Note: Negative values cannot be mixed with specific periods. Period codes encode both the date and the periodicity — 202001 could mean January 2020 (monthly), Q1 2020 (quarterly), or S1 2020 (semi-annual), depending on the aggregate.

Filtering with classifications

Many aggregates break their data further by classifications (dimensions). For instance, aggregate 1712 (crop production) has a classification for the type of product (226) and another for the producer condition (218).

# Single category: pineapple (4844) from product classification (226)
ibge_variables(
  aggregate      = 1712,
  localities     = "BR",
  classification = list("226" = 4844)
)

# Multiple categories
ibge_variables(
  aggregate      = 1712,
  localities     = "BR",
  classification = list("226" = c(4844, 96608, 96609))
)

# Multiple classifications
ibge_variables(
  aggregate      = 1712,
  localities     = "BR",
  classification = list("226" = c(4844, 96608), "218" = 4780)
)

# All categories of a classification (can be large!)
ibge_variables(
  aggregate      = 1712,
  periods        = -1,
  localities     = "BR",
  classification = list("226" = "all")
)

When no classification is specified, the API returns the Total category (ID = 0) — an aggregate across all categories.

Automatic validation

Before sending any request, ibge_variables() and ibge_localities() validate your parameters against the aggregate’s metadata. If something doesn’t match, you get a clear error with the allowed values:

# N3 (states) is not available for aggregate 1705
ibge_variables(1705, localities = "N3")
#> Error:
#> ! Geographic level(s) "N3" not available for aggregate 1705.
#> ℹ Available levels: "N1", "N6", and "N7".

# Period out of range
ibge_variables(1705, periods = 199901, localities = "BR")
#> Error:
#> ! Period(s) "199901" out of range for aggregate 1705.
#> ℹ Valid range: "201202" to "202001" (monthly).

# Non-existent variable
ibge_variables(1705, variable = 999, localities = "BR")
#> Error:
#> 355 - IPCA15 - Variação mensal (%)
#> 356 - IPCA15 - Variação acumulada no ano (%)
#> 1120 - IPCA15 - Variação acumulada em 12 meses (%)
#> 357 - IPCA15 - Peso mensal (%)

Metadata is fetched once per session and cached. To force a refresh:

ibge_clear_cache()

Skip validation entirely with validate = FALSE:

ibge_variables(1705, localities = "BR", validate = FALSE)

Browsing the survey catalog

Beyond aggregate-level data, ibger also provides access to the IBGE Metadata API (v2), which catalogs IBGE’s surveys with institutional and methodological information such as status, category, collection frequency, and thematic classifications.

This is useful when you want to understand what surveys exist and how they are structured before diving into specific aggregates.

# List all 98 IBGE surveys
ibge_surveys()
#> # A tibble: 98 × 8
#>   id    name                                 status category    ...
#>   <chr> <chr>                                <chr>  <chr>
#> 1 AC    Pesquisa Anual da Indústria da Cons… Ativa  Estrutural
#> 2 AA    Pesquisa Nacional de Saúde do Escol… Ativa  Especial
#> ...

# Filter active monthly surveys
library(dplyr)
ibge_surveys(thematic_classifications = FALSE) |>
  filter(status == "Ativa", category == "Conjuntural")

# Check which periods have metadata for the Censo Demográfico
ibge_survey_periods("CD")
#> # A tibble: 9 × 3
#>    year month order
#>   <int> <int> <int>
#> 1  2022    NA     0
#> 2  2010    NA     0
#> ...

# Get full institutional metadata for a specific period
meta <- ibge_survey_metadata("CD", year = 2022)
meta
#> ── CD ──
#> Status: Ativa
#> Category: Estrutural
#> ...
#> ── Metadata occurrences (1) ──
#> Use `meta$occurrences` to explore the full metadata.

# Explore methodology fields
names(meta$occurrences[[1]])

Survey codes are validated before each request. If you use a wrong code, the error suggests similar alternatives:

ibge_survey_periods("PMS")
#> Error: Survey code "PMS" not found in the IBGE catalog.
#> ℹ Did you mean one of these?
#>   * SC - Pesquisa Mensal de Serviços
#>   * MC - Pesquisa Mensal de Comércio
#>   ...

API limits and special values

Each request can return at most 100,000 values, computed as:

categories × periods × localities ≤ 100,000

If exceeded, the API returns HTTP 500. Split your request into smaller chunks when working with many localities or categories.

The value column may contain special characters instead of numbers:

Value Meaning
- Numeric zero (not from rounding)
.. Not applicable
... Data not available
X Suppressed to avoid identifying individual respondents

These come through as character strings in the value column. Use parse_ibge_value() to convert to numeric in one step:

ibge_variables(7060, localities = "BR") |>
  dplyr::mutate(value = parse_ibge_value(value))