High-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs) in R.
The punycoder package addresses critical gaps in R’s URL
processing capabilities by providing reliable, fast conversion between
Unicode and ASCII representations of domain names. It follows RFC 3492
standards and is designed for robust handling of internationalized
domain names in web scraping, data analysis, and URL processing
workflows.
punycoder has a small dependency footprint:
R (>= 3.5.0),
Rcpplibidn2 (detected at
compile time)pkg-config (used by
configure to detect libidn2)testthat, knitr,
rmarkdownYou can install the development version of punycoder from GitHub with:
# install.packages("remotes")
remotes::install_github("bart-turczynski/punycoder")libidn2)punycoder works without extra system libraries. If
libidn2 is available at build time, the package enables a
native backend automatically; otherwise it uses the built-in C++
fallback backend.
To install the recommended optional dependency:
brew install libidn2 pkg-configsudo apt-get install libidn2-0-dev pkg-configsudo dnf install libidn2-devel pkgconf-pkg-configsudo pacman -S libidn2 pkgconfVerify the library is visible before installing
punycoder from source:
system("pkg-config --modversion libidn2")Then install/reinstall punycoder:
remotes::install_github("bart-turczynski/punycoder")library(punycoder)
# Basic encoding
puny_encode("café.com")
#> [1] "xn--caf-dma.com"
# Check if domain is punycode
is_punycode("xn--example")
#> [1] TRUE
# Validate domains
validate_domain("test.com")
#> Punycoder Domain Validation Results
#> ==================================
#>
#> Domain: test.com
#> Valid: TRUElibidn2 when available, with a built-in fallback
backendProcess international websites with Unicode domain names:
international_urls <- c(
"https://café.paris.fr/menu",
"https://москва.рф/news",
"https://北京.中国/info"
)
# Convert for HTTP requests
ascii_urls <- url_encode(international_urls)Clean and standardize URL datasets:
# Identify international domains
is_idn(c("café.com", "example.com", "москва.рф"))
# Validate domain names
validate_domain(c("valid.com", "invalid..domain"))punycoder currently provides:
puny_encode(),
puny_decode()url_encode(),
url_decode(), parse_url()is_punycode(),
is_idn(), validate_domain()libidn2 when present,
built-in fallback otherwise)Rcpp.libidn2.punycoder is inspired by urltools and is
designed to provide a robust fix for punycode encode/decode issues that
may arise in urltools workflows.We welcome contributions. See CONTRIBUTING.md for the current development workflow.
MIT