| Title: | Analyze and Visualize Toponyms |
| Version: | 2.0.1 |
| Description: | A tool to analyze and visualize toponym distributions. This package is intended as an interface to the GeoNames data. A regular expression filters data and in a second step a map is created displaying all locations in the filtered data set. The functions make data and plots available for further analysis—either within R or in a chosen directory. Users can select regions within countries, provide coordinates to define regions, or specify a region within the package to restrict the data selection to that region or compare regions with the remainder of countries. This package relies on the R packages 'geodata' for map data and 'ggplot2' for plotting purposes. For more information on the study of toponyms, see Wichmann & Chevallier (2025) <doi:10.5195/names.2025.2616>. |
| Depends: | R (≥ 4.1) |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| LazyData: | true |
| Imports: | geodata, ggplot2, grDevices, sf, spatstat.geom, spatstat.utils, stats, terra, utils |
| URL: | https://github.com/Lennart05/toponym |
| BugReports: | https://github.com/Lennart05/toponym/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-04-07 21:53:33 UTC; Chavallier |
| Author: | Lennart Chevallier
|
| Maintainer: | Lennart Chevallier <mail@lchevallier.de> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-13 14:40:02 UTC |
toponym: Analyze and Visualize Toponyms
Description
A package to analyze and visualize toponym distributions.
The main functions are the following:
topreturns selected toponyms.countryhelps in navigating designations of countries and regions used by the package.createPolygonlets users create a polygon by point-and-click or directly retrieve polygon data.mapperplots data onto a map.topCompcompares toponym substrings in a polygon and in the remainder of a country (or countries).topFreqretrieves most frequent toponym substrings.topZtestlets users apply a Z-test on toponym distributions.toponymOptionslets users modify settings for managing data.
For more detailed descriptions please read the respective documentation.
Author(s)
Maintainer: Lennart Chevallier mail@lchevallier.de (ORCID)
Authors:
Søren Wichmann wichmannsoeren@gmail.com (ORCID)
See Also
Useful links:
Identifying scripts
Description
This function detects if the script might be a latinate or not.
Usage
IS(input)
Arguments
input |
character string |
Value
A character string indicating if it's a latinate or not.
Alternatenames filter
Description
Checks alternatenames column
Usage
altNames(gn, strings)
Arguments
gn |
data frame(s), which will be accessed. |
strings |
character string vector with regular expressions to filter data. |
Value
A list of two vectors, logical values and matched strings.
Check path for downloaded data
Description
This function checks the download path for the toponym package.
Usage
checkPath(toponym_path = NULL)
Arguments
toponym_path |
character string. Path name for downloaded data. If not specified, this function will call |
Details
If a path is provided using a function, this function checks it for validity. If no path is provided, this function retrieve the path from the .Rds file stored in the package directory and checks it for validity.
Value
Character string of the used path for downloaded data.
Country designations
Description
This function returns country and region designations used by the toponym package.
Usage
country(query = NULL, ...)
Arguments
query |
character string vector. Enter queries to access information on countries. |
... |
Additional parameter:
|
Details
If you enter an individual country designation, you receive the three different designations (IS02, ISO3, name).
If you enter "ISO2" or "ISO3", you receive a vector of all ISO-codes of the respective length.
If you enter "names", you receive a vector of all country names.
If you enter "country table", you receive a data frame with all three designations for every country.
Region designations are retrieved from the geodata package map data. The list of region designations may be incomplete. For mapping purposes, geodata is used throughout this package.
If regions is set to value higher than 0, map data by the geodata is required to be downloaded.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
Value
Returns country designations selected from a data frame. If regions is set to 1, returns region designations in a matrix selected from the geodata map data.
Examples
country(query = "ISO3")
## returns a vector of all ISO3 codes
country(query = "Thailand")
## returns a list with a data frame with ISO2 code, ISO3 code and the full name of Thailand
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following example:
country(query = "Thailand", regions = 1, toponym_path = tempdir())
## returns a list with a matrix with all region designations of Thailand
Country information
Description
A data frame which lists all countries available in the geodata package and GeoNames data set.
The first column lists the ISO2 codes, the second the ISO2 codes and the third one all country names in full length.
Usage
countryInfo
Format
An object of class data.frame with 249 rows and 3 columns.
Creates a polygon
Description
This function lets users create a polygon by point-and-click or directly retrieve polygon data.
Usage
createPolygon(countries, ...)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
... |
Additional parameters:
|
Details
Parameter countries accepts all designations found in country(query = "country table").
region_ID and region_name accepts region designations for the selected countries, which can be retrieved by country().
The function prioritizes any region_ID and ignores region_name if users provide both.
The matrix from country() listing all region designations may be incomplete as the geodata map data is incomplete in this regard. For mapping purposes, geodata is used throughout this package.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
In RGui, users exit the point selection by middle-clicking or right-clicking and then pressing stop.
In RStudio, users exit the point selection by pressing ESC or Finish in the top right corner of the plot. Users whose points are shifted away, are advised to set the zoom settings of RStudio and of their device to 100%:
Tools -> Global Options -> Appearance -> Zoom
This function uses the function spatstatLocator provided by the spatstat.utils package for the point-and-click functionality.
For further details on the point-and-click mechanism, please refer to the help page for spatstatLocator.
Value
A data frame with the coordinates of the polygon.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes in the following examples,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded
## 3. or if(interactive) because it is interactive:
if(interactive()){
createPolygon("NA", region_ID = "NAM.7_1", toponym_path = tempdir())
# a plot of the region Ohangwena in Namibia appears.
# by point-and-click a polygon can be created
# upon completion, a data frame with the coordinates of the polygon returns
}
Ohangwena_polygon <- createPolygon(
"NA", region_ID = "NAM.7_1", retrieve = TRUE, toponym_path = tempdir())
# no plot appears
# the coordinates of the whole region are stored in the object named `Ohangwena_polygon`
# and can be used by other functions
Coordinates of the Danelaw
Description
A small data frame containing coordinates for the polygon framing the Danelaw. There is one column for the longitudes ('lons') and one for the latitudes ('lats'). The coordinates were retrieved from a personal polygon created with Google My Maps.
Usage
danelaw_polygon
Format
An object of class data.frame with 11 rows and 2 columns.
Coordinates of Flanders
Description
A small data frame containing coordinates for the polygon framing Flanders. There is one column for the longitudes ('lons') and one for the latitudes ('lats'). The coordinates were retrieved from a personal polygon created with Google My Maps.
Usage
flanders_polygon
Format
An object of class data.frame with 22 rows and 2 columns.
Downloads GeoNames data
Description
This function downloads toponym data for the package.
Usage
getData(countries, overwrite = FALSE, toponym_path = NULL)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
overwrite |
logical. If |
toponym_path |
character string. Path name for downloaded data. If not specified, this function will call |
Details
The data is downloaded from the GeoNames download page and thereby made accessible to readFiles(). The function allows users to update GeoNames data and to set the date of access to that database to the current date.
Parameter countries accepts all designations found in country(query = "country table"). If "all", data from all countries stored in the package folder is selected.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym data downloaded by getData() across sessions. See help(toponymOptions).
Value
No return value.
See Also
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following examples:
getData(countries = "NL", toponym_path = tempdir())
## downloads and extracts data for NL to the temporary directory
getData(countries = c("DK", "DE"), toponym_path = tempdir())
## downloads and extracts data for DK and DE to the temporary directory
getData(countries = c("DK", "DE"), overwrite = TRUE, toponym_path = tempdir())
## downloads, extracts, and overwrites data for DK and DE in the temporary directory
Plots toponyms onto a map
Description
This function generates a map plotting all locations in a given data frame. This function uses map data from the geodata package.
Usage
map_simple(mapdata)
Arguments
mapdata |
list. A list passed down by |
Details
This is an internal function which is only used by mapper().
Value
A plot.
Plots data onto a map
Description
This function plots a user-specific data frame onto a map.
Usage
mapper(mapdata, ...)
Arguments
mapdata |
data frame. A user-specific data frame with coordinates. |
... |
Additional parameters:
|
Details
This function's purpose is to allow users to provide data frames by the function top(), edited ones as well as own data frames.
The data frame must have at least two columns called latitude & longitude.
Data frames output by the function top() consist of, among others, a latitude, longitude, country code and group column.
If the input data frame has a column color, the function will assign every value in that column to the respective coordinates. However, if specified, the additional parameter color will be used instead of the column color (see above).
If the input data frame has a column group, the function will group data and display a legend.
If the input data frame has a color and a group column, the assignment must match each other. Every group (every unique string in that column) must be assigned a unique color throughout the data frame.
If regions is set to a value greater than 0, the data frame must have a column country code.
Parameter frame accepts data frames containing coordinates which define the frame of the plot. The data frame must have a column called latitude & a column called longitude. The latitudinal and longitudinal ranges define the frame of the plot.
Parameter plot_size accepts numeric values of greater than -1. The plot's size is scaled by the given value. Thus, a value of 0 extends the size by 0%. A value of .1 extends the size by 10%. A value of -.1 reduces the size by 10% and so on.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
Value
A plot.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following examples:
mapper(
top("itz$", "DE", toponym_path = tempdir()),
toponym_path = tempdir())
# returns a plot with all populated places
# in Germany ending in "itz"
UG_data <- top(c("et$", "wa$"), "UG", toponym_path = tempdir())
UG_data$color <- "blue"
UG_data[UG_data$group == "wa", "color"] <- "grey"
mapper(UG_data,
legend_title = "two strings",
title = "Some locations in grey and blue",
toponym_path = tempdir())
# returns a plot with all populated places
# in Uganda ending in "wa" (grey) and "et" (blue)
# the plot is titled "Some locations in grey and blue"
# the legend title is "two strings"
Orthographical symbols
Description
This function retrieves all symbols used in country data.
Usage
ortho(countries, ...)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
... |
Additional parameter:
|
Details
Parameter countries accepts all designations found in country(query = "country table").
The default column is "alternatenames". Other columns of possible interest are "name" and "asciiname".
It outputs an ordered frequency table of all symbols used in a given column of the GeoNames data for one or more countries specified.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
The data used is downloaded by getData() and is accessible on the GeoNames download server.
Value
A table with frequencies of all symbols.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following example:
ortho(countries = "MC", toponym_path = tempdir())
# returns a table with frequencies of all symbols
# in the "alternatenames" column for the Monaco data set
Transforms coordinates into a window
Description
This function transforms polygonal data into an object of class owin for spatstat.geom functions.
If data points run clockwise, data will be reversed for owin function. It requires counterclockwise data.
Usage
poly(polygon)
Arguments
polygon |
data frame, coordinates of the polygon. |
Value
An object of class owin which is a polygonal window.
Reads GeoNames data
Description
This function reads toponym data for the package.
Usage
readFiles(countries, feat.class = "P", toponym_path = NULL)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
feat.class |
character string vector. Selects data only of those feature classes (check http://download.geonames.org/export/dump/readme.txt for the list of all feature classes). By default, it is |
toponym_path |
character string. Path name for downloaded data. |
Details
This function accesses the data saved by getData(), reads it as data frame and stores it in the package environment. Here is further information on the used column names.
Parameter countries accepts all designations found in country(query = "country table").
Value
A data frame with GeoNames data.
Save path for downloaded data.
Description
This function saves the download path for the toponym package.
Usage
save_path(toponym_path, toponym_options)
Arguments
toponym_path |
character string. Path name for downloaded data. |
toponym_options |
character string. Current path from .rds file. |
Value
A character string indicating the current path for downloaded data.
Selection of Toponyms
Description
This function returns coordinates of selected toponyms (strings).
Usage
top(strings, countries, ...)
Arguments
strings |
character string vector with regular expressions to filter data. |
countries |
character string vector with country designations (names or ISO-codes). |
... |
Additional parameters:
|
Details
This function selects locations which match the regular expression from strings.
Parameter countries accepts all designations found in country(query = "country table").
Polygons passed through the polygon parameter need to intersect or be within a country specified by the countries parameter.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
The data used is downloaded by getData() and is accessible on the GeoNames download server.
Value
A data frame of selected toponym(s).
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following examples:
itz_data <- top("itz$", "DE", toponym_path = tempdir())
# returns a data frame with all populated places
# in Germany ending in "itz"
vlad_data <- top("^Vlad", "RU", toponym_path = tempdir())
# returns a data frame with all populated places
# in Russia starting with "Vlad" (case sensitive)
itz_ice_data <- top(c("itz$", "ice$"), c("DE", "PL"), toponym_path = tempdir())
# returns a data frame with all populated places
# in Germany and Poland ending in either "itz" or "ice"
maw_data <- top("Maw$", "MM", column = "alternatenames", toponym_path = tempdir())
# returns a data frame with all populated places
# in Myanmar listed in the "alternatenames" column
# and ending in "Maw" (case sensitive)
Compares toponyms in a polygon and the remainder of countries
Description
This function retrieves the most frequent toponym substrings in a given polygon relative to country frequencies.
Usage
topComp(countries, len, rat, polygon, ...)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
len |
numeric. The length of the substring within toponyms. |
rat |
numeric. The cut-off ratio (a number between 0.0 and 1 for |
polygon |
data frame. Defines the polygon for comparison with the remainder of a country (or countries). |
... |
Additional parameters:
|
Details
This function sorts the toponym substrings in the given countries by frequency. It then tests which ones lie in the given polygon and prints out a data frame with those that match the ratio criterion.
Parameter countries accepts all designations found in country(query = "country table").
Polygons passed through the polygon parameter need to intersect or be within a country specified by the countries parameter.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
The data used is downloaded by getData() and is accessible on the GeoNames download server.
Value
A data frame printed out and saved in the global environment. It shows toponym substrings surpassing the ratio, the ratio and the frequency.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following examples:
topComp("GB",
limit = 100,
len = 4,
rat = .7,
polygon = toponym::danelaw_polygon,
toponym_path = tempdir()
)
## returns a data frame of the top 100 four-character-long endings in the United Kingdom
## if more than 70% of them belong to the polygon
## corresponding to the Danelaw area.
topComp("GB",
limit = 100,
len = 3,
rat = 1,
polygon = toponym::danelaw_polygon,
freq.type = "rel",
toponym_path = tempdir()
)
## returns a data frame of the top 100 three-character-long endings in the United Kingdom
## if they have greater relative frequencies within Danelaw than outside of Danelaw.
topComp(c("BE", "NL"),
limit = 50,
len = 3,
rat = .8,
polygon = toponym::flanders_polygon,
toponym_path = tempdir()
)
## returns a data frame of the top 50 three-character-long endings
## in Belgium and Netherlands viewed as a unit if more than 80% of them belong to the polygon
## corresponding to Flanders.
Retrieves the most frequent toponyms
Description
This function returns the most frequent toponym substrings in countries or a polygon.
Usage
topFreq(countries, len, limit, ...)
Arguments
countries |
character string vector with country designations (names or ISO-codes). |
len |
numeric. The length of the substring within toponyms. |
limit |
numeric. The number of the most frequent toponym substrings. |
... |
Additional parameters:
|
Details
Parameter countries accepts all designations found in country(query = "country table").
Polygons passed through the polygon parameter need to intersect or be within a country specified by the countries parameter.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
The data used is downloaded by getData() and is accessible on the GeoNames download server.
Value
A table with toponym substrings and their frequency.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following examples:
topFreq(
countries = "Ecuador",
len = 3,
limit = 10,
toponym_path = tempdir())
## returns the top 10 most frequent toponym endings
## of three-character length in Ecuador
topFreq(
countries = "GB",
len = 3,
limit = 10,
polygon = toponym::danelaw_polygon,
toponym_path = tempdir())
## returns the top 10 most frequent toponym endings
## in the polygon which is inside the United Kingdom.
Applies Z-test
Description
This function applies a Z-test.
Usage
topZtest(strings, countries, polygon, ...)
Arguments
strings |
character string with a regular expression to be tested. |
countries |
character string vector with country designations (names or ISO-codes). |
polygon |
data frame. Defines the polygon for comparison with the remainder of a country (or countries). |
... |
Additional parameter:
|
Details
This function lets users apply a Z-test (two proportion test), comparing the frequency of a given string in a polygon to the frequency in the rest of the country.
Parameter countries accepts all designations found in country(query = "country table").
Polygons passed through the polygon parameter need to intersect or be within a country specified by the countries parameter.
Parameter toponym_path accepts "pkgdir" for the package directory or a full, alternative path.
With toponymOptions(), users can specify the path for toponym and map data downloaded by this package across sessions. See help(toponymOptions).
The data used is downloaded by getData() and is accessible on the GeoNames download server.
Value
An object of class htest containing the results.
Examples
## We recommend setting a persistent path for downloaded data by using toponymOptions()
## Users can always set the path manually when a function is used
## For illustration purposes,
## 1. the path is manually set each time
## 2. and wrapped in donttest because data will be downloaded in the following example:
topZtest("thorpe$",
"GB",
toponym::danelaw_polygon,
toponym_path = tempdir())
## returns an object of class htest containing the results.
Manage Options of toponym
Description
This function allows users to set the download path for the toponym package. Downloaded data includes toponym data and map data.
Usage
toponymOptions(toponym_path = NULL)
Arguments
toponym_path |
character string. Path name for downloaded data. This setting is saved across sessions. |
Details
Most functions require external data which will be downloaded and stored for later use. This is described in the respective functions.
For this reason, after installation, users will be asked to specify the path for downloaded data.
Parameter toponym_path accepts either the character string "pkgdir" or a full, alternative path.
"pkgdir" is interpreted as the extdata folder in the toponym package directory, i.e.:
system.file("extdata", package = "toponym")
Thus, users can set the path to the package directory with this command:
toponymOptions(toponym_path = "pkgdir")
If a path is provided, users are prompted to confirm their choice. This function will write the new path into toponym_options.rds in the package directory; the path is saved across sessions.
To locate toponym_options.rds, enter:
system.file("extdata", package = "toponym")
To check the path that is currently set, enter:
toponymOptions()
Value
A character string indicating the current path for downloaded data.
Examples
if(interactive()){
# Set the path to the temporary directory
# Users are prompted to confirm their choice.
# Upon confirmation, toponym_options.rds will be edited in the package directory
toponymOptions(toponym_path = tempdir())
# Show the current path
toponymOptions()
}