Domain Name Lookup with the IP Package

Besides IP addresses parsing and manipulation, the IP package also provides the following methods and function for querying information about hosts :

In addition, this vignette demonstrate how to use some of IP package built-in tables :

as well as matching addresses and range addresses with the ip.match() methods.

Please note that, depending on your platform, support for some functions may still be experimental or missing. In addition, this vignette was precomputed so that the results do match the text and results may differ if things changed in the meantime.

First Example

Let’s start with DN lookup :

library(IP)
rhost     <- host('r-project.org')
rhost
##   r-project.org 
## "137.208.57.37"
class(rhost)
## [1] "host"
## attr(,"package")
## [1] "IP"

In this case, there is only one IPv4 address. But in some case, some hosts may have either one or more IPv4 addressess or one or more IPv6 addressess or both and this is why the host() methods does not return an IP object but a specialized host object. Therefore, we need to use the ipv4(), ipv6() and ip() methods the extract IP address —please refer to the second example below—.

Now, let’s perform reverse DN lookup on the returned IPv4 address :

rhost.hnm <- host(ipv4(rhost))
rhost.hnm 
##        r-project.org 
## "cran.wu-wien.ac.at"

According to this, the server is located in Austria :

fqdn(rhost.hnm)
## [1] "ac.at"

But, matching this ip range to the RIR address space

ipv4.rir()[ip.match(ipv4(rhost), ipv4.rir())]
##          ARIN 
## "137.0.0.0/8"

returns ARIN (echt?) which serves North America and not RIPE NCC which serves Europe. Note that in this case ip.match() checks whether the addresses given falls within one of the RIR ranges. If the second argument is a IP address, ip.match() looks for addresses that are equal to x in table.

Now, according to this

ip.match(ipv4(rhost), ipv4.recovered())
## r-project.org 
##            NA

this address was not recovered. Now, let’s take a look at the whois tables :

rdom.whois   <- whois('r-project.org', output=1)
rdom.whois[['r-project.org']]['Registrant Country']
## Registrant Country 
##               "AT"

Österreich, alles klar. And

rhost.whois <- whois(ipv4(rhost),verbose = 2, output=1)
## whois: 137.208.57.37 NA 
## refer:' whois.arin.net whois.arin.net '
## refer:'whois.arin.net'
## query:' n +  137.208.57.37 
##  '
rhost.whois[['r-project.org']]['Organization']
##                              Organization 
## "RIPE Network Coordination Centre (RIPE)"

yields “RIPE Network Coordination Centre (RIPE)” as expected.

The results of those queries may look a little bit confusing at first. The whois queries tells us that r-project.org site is hosted by the Wirtschaftsuniversität Wien in Austria (and so does the extension of the primary domain — “.at”) and that its address is accordingly managed by the RIPE-NCC. But RIR lookup on the address of the server tells us that its address falls within a range managed by ARIN which serves North America. What’s happening here is that some address ranges were assigned by ARIN in the 80’s to European organizations such as universities before RIPE NCC began its operations in 1992. Those ranges were later transferred to the RIPE NCC as shown by

rhost.whois[['r-project.org']]['NetType']
##                                        NetType 
## "Early Registrations, Transferred to RIPE NCC"

but this range still belongs to the ARIN address space.

Second Example : Multiple Addresses

As stated before, host() queries may return one or more address for a single host :

h <- host(dn <- c("r-project.org", "cloud.r-project.org" ))
h
##                                                                                                                                                                                                                                                                                                                                                  r-project.org 
##                                                                                                                                                                                                                                                                                                                                                "137.208.57.37" 
##                                                                                                                                                                                                                                                                                                                                            cloud.r-project.org 
## "18.244.28.31,18.244.28.115,18.244.28.78,18.244.28.49--2600:9000:262b:2200:6:c2d3:f940:93a1,2600:9000:262b:1c00:6:c2d3:f940:93a1,2600:9000:262b:e00:6:c2d3:f940:93a1,2600:9000:262b:7400:6:c2d3:f940:93a1,2600:9000:262b:c200:6:c2d3:f940:93a1,2600:9000:262b:b800:6:c2d3:f940:93a1,2600:9000:262b:1000:6:c2d3:f940:93a1,2600:9000:262b:8800:6:c2d3:f940:93a1"
length(h)
## [1] 2

But the returned object has the same length as the input vector so we can use in a data.frame :

data.frame(dn, h)
##                                      dn
## r-project.org             r-project.org
## cloud.r-project.org cloud.r-project.org
##                                                                                                                                                                                                                                                                                                                                                                                h
## r-project.org                                                                                                                                                                                                                                                                                                                                                      137.208.57.37
## cloud.r-project.org 18.244.28.31,18.244.28.115,18.244.28.78,18.244.28.49--2600:9000:262b:2200:6:c2d3:f940:93a1,2600:9000:262b:1c00:6:c2d3:f940:93a1,2600:9000:262b:e00:6:c2d3:f940:93a1,2600:9000:262b:7400:6:c2d3:f940:93a1,2600:9000:262b:c200:6:c2d3:f940:93a1,2600:9000:262b:b800:6:c2d3:f940:93a1,2600:9000:262b:1000:6:c2d3:f940:93a1,2600:9000:262b:8800:6:c2d3:f940:93a1

Use the following methods to get the actual addresses :

ipv4(h)
##       r-project.org cloud.r-project.org cloud.r-project.org cloud.r-project.org 
##     "137.208.57.37"      "18.244.28.31"     "18.244.28.115"      "18.244.28.78" 
## cloud.r-project.org 
##      "18.244.28.49"
ipv6(h)
##                          r-project.org                    cloud.r-project.org 
##                                     NA "2600:9000:262b:2200:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:1c00:6:c2d3:f940:93a1"  "2600:9000:262b:e00:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:7400:6:c2d3:f940:93a1" "2600:9000:262b:c200:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:b800:6:c2d3:f940:93a1" "2600:9000:262b:1000:6:c2d3:f940:93a1" 
##                    cloud.r-project.org 
## "2600:9000:262b:8800:6:c2d3:f940:93a1"
ip(h)
##                          r-project.org                    cloud.r-project.org 
##                        "137.208.57.37"                         "18.244.28.31" 
##                    cloud.r-project.org                    cloud.r-project.org 
##                        "18.244.28.115"                         "18.244.28.78" 
##                    cloud.r-project.org                          r-project.org 
##                         "18.244.28.49"                                     NA 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:2200:6:c2d3:f940:93a1" "2600:9000:262b:1c00:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
##  "2600:9000:262b:e00:6:c2d3:f940:93a1" "2600:9000:262b:7400:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:c200:6:c2d3:f940:93a1" "2600:9000:262b:b800:6:c2d3:f940:93a1" 
##                    cloud.r-project.org                    cloud.r-project.org 
## "2600:9000:262b:1000:6:c2d3:f940:93a1" "2600:9000:262b:8800:6:c2d3:f940:93a1"

As we have seen before, the r-project.org host has only one IPv4 address and no IPv6. On the other end, the cloud.r-project.org host has four IPv4 addresses and height IPv6 addresses.

RIR lookup returns ARIN again for all addresses

ipv4.rir()[ip.match(ipv4(h),ipv4.rir())]
##          ARIN          ARIN          ARIN          ARIN          ARIN 
## "137.0.0.0/8"  "18.0.0.0/8"  "18.0.0.0/8"  "18.0.0.0/8"  "18.0.0.0/8"

But this times rightfully so for the cloud.r-project.org domain which is hosted by Amazon as shown by this whois query :

w <- whois(ipv6(h)["cloud.r-project.org"][1])
w[[1]]['OrgName']
##            OrgName 
## "Amazon.com, Inc."

Domain Name Internationalization

Per RFC, Domain names are limited to a subset of US-ASCII code points. This basically means that you cannot use code points that represent, say, diacritical symbols or CJK characters in a DN. But the thing is, since 2003, we can use characters outside the authorized range by using a trick called pudny encoding. Pudny encoding uses a one way invertible function that converts every non-ASCII character to ASCII in order to output a legal domain name. And this string can be decoded to retreive the original DN.

Let’s see how this work :

dn <- c("bücher.de")
(dni <- toIdna(dn))
##          bücher.de 
## "xn--bcher-kva.de"

“bücher.de” becomes “xn–bcher-kva.de”. And now, back to the original string :

fromIdna(dni)
## xn--bcher-kva.de 
##      "bücher.de"

Unfortunately, this doesnot always work —believe it or not, this is an actual domain name— :

dn <- c("💩.la")
toIdna(dn)
## Warning in toIdna(dn): String preparation failed for '💩.la'
## 💩.la 
##    NA

In that case, we need to use this flag :

toIdna(dn,  "IDNA_ALLOW_UNASSIGNED")
##         💩.la 
## "xn--ls8h.la"

Now, let’s see what happens when trying to get the hosts IP address :

dn <- c("bücher.de", "💩.la")
flags <-rep( c( "IDNA_DEFAULT" , "IDNA_ALLOW_UNASSIGNED"), each = length(dn))
dni <- c(dn, toIdna( dn, flags))
host(dni)
##        bücher.de            💩.la xn--bcher-kva.de             <NA> 
##               NA               NA    "45.87.158.7"               NA 
## xn--bcher-kva.de      xn--ls8h.la 
##    "45.87.158.7"  "38.103.165.38"

Note that starting with glibc 2.3.4, the underlying getaddrinfo() function has been extended to allow hostnames to be transparently converted. In any other case —glibc<2.3.4, Windows,…—, internationalization must be done explicitly before calling the host() method at the moment.