---
title: "Sotkanet API R tools"
output:
  rmarkdown::html_vignette:
    toc: TRUE
vignette: >
  %\VignetteIndexEntry{Sotkanet API R tools}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
eval_packages <- requireNamespace(c("kableExtra", "ggplot2"))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  purl = NOT_CRAN,
  eval = NOT_CRAN && eval_packages
)
```


```{r eval = !eval_packages, echo = FALSE}
message("Packages `kableExtra` and `ggplot2` required for building the vignette. Code chunks will not be evaluated.")
```


This [sotkanet](https://github.com/rOpenGov/sotkanet) R package provides access to data from the [Sotkanet portal](https://sotkanet.fi/sotkanet/en/index). Your [contributions](https://ropengov.org/community/) and [bug reports and other feedback](https://github.com/rOpenGov/sotkanet) are welcome.

## Introduction

The Sotkanet portal provides over 2000 demographic indicators across Finland and Europe. It is maintained by the National Institute for Health and Welfare (THL). For more information, see [Information about Sotkanet](https://sotkanet.fi/sotkanet/en/tietoa-palvelusta) and [API description](https://sotkanet.fi/sotkanet/en/ohje/74). 

The `sotkanet` R package enables access to the Sotkanet API using R facilitating the use of the data from the API. This package is part of [rOpenGov](https://ropengov.org).


## Installation

To install latest release version from CRAN, use:

```{r install, eval = FALSE}
install.packages("sotkanet")
```

To install development version from GitHub, use:

```{r install2, eval = FALSE}
library(remotes)
remotes::install_github("ropengov/sotkanet")
```

Test the installation by loading the package:

```{r load, eval = FALSE}
library(sotkanet)
```


## Usage


### Listing availabe indicators

Load sotkanet and other packages used in the vignette.

```{r libraries, warning = FALSE, message = FALSE}
library(sotkanet)
library(kableExtra)
library(ggplot2)
```



List available Sotkanet indicators using `sotkanet_indicators()`:

```{r sotkanet_indicators, warning = FALSE}
# Using a preset list of indicators to avoid a large download
indicators <- sotkanet_indicators(id = c(4, 5, 6, 127, 10012, 10027),
                                  type = "table", lang = "en")
kable(head(indicators))
```

List geographical regions with available indicators using `sotkanet_regions()`:

```{r sotkanet_regions, warning = FALSE}
# List of the first few regions
regions <- sotkanet_regions(type = "table", lang = "en")
kable(head(regions))
```

### Querying Sotkanet data

To download the data, we need to know the indicator for it. You can look for the right indicator using aforementioned `sotkanet_indicators()` or by browsing the [Sotkanet website](https://sotkanet.fi/sotkanet/en/index). For example, the indicator no. 10012 responds to the (EU) GPD per capita in Purchasing Power Standards (PPS) dataset. The data can be downloaded with `get_sotkanet()` function. If we want, for example, the GPD data from Finland for 2000-2010, the function call is:

```{r get_sotkanet, warning = FALSE}
# Get the indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
                    genders = c("total"), lang = "en", regions = "Finland")

# The first few lines of the data
kable(head(dat)) %>%
  kable_styling() %>%
  scroll_box(width = "100%")
```

The data can also be downloaded by using interactive function `sotkanet_interactive()`. It gives user interactive alternative for downloading the dataset. This function can also print dataset citation, code for the `get_sotkanet()` call and fixity checksum. 

Dataset citation can be printed for any indicator using the function `sotkanet_cite()`. The citation for the GPD data is:

```{r, warning = FALSE, message = FALSE}
sotkanet_cite(10012, lang = "en")
```


## Examples

Let's now demonstrate the use of the package with two examples. For the first example we will use the GPD data from Nordic countries (Pohjoismaat) for 2000-2010 and draw a time series of the data comparing the countries.

```{r, fig.width = 10, fig.height = 5, out.width = "100%", warning = FALSE}
# Get indicator data
dat <- get_sotkanet(indicators = 10012, years = 2000:2010,
                    genders = "total", lang = "en", region.category = "POHJOISMAAT")

indicator_name <- as.character(unique(dat$indicator.title))
indicator_source <- as.character(unique(dat$indicator.organization.title))

# Retrive metadata
dat_meta <- sotkanet_indicator_metadata(id = 10012)

# Visualize
library(ggplot2)
p <- ggplot(dat, aes(x = year, y = primary.value,
                     group = region.title, color = region.title)) + 
  geom_line() + ggtitle(paste0(indicator_name, " \n", indicator_source)) +
  labs(x = "Year", y = "Value",caption = paste0(
    "Data source: https://sotkanet.fi/sotkanet",  "\n", "Data date: ", dat_meta$`data-updated`)) +
  scale_x_continuous(breaks = seq(2000,2010, by = 1)) +
  theme(title = element_text(size = 10)) +
  theme(axis.title.x = element_text(size = 15)) +
  theme(axis.title.y = element_text(size = 15)) +
  theme(legend.title = element_text(size = 15))
print(p)
```

For the second example we will plot the population of Finnish municipalities against a measure of educational level.

```{r, fig.width = 10, fig.height = 5, out.width = "100%", warning = FALSE}
# Get the data for the two indicators
dat <- get_sotkanet(indicators = c(127, 180), 
                    years = 2022, lang = "en",
                    genders = c("total"), region.category = c("KUNTA"))
# Pick the fields of interest and remove duplicates
datf <- dat[,c("region.title", "indicator.title", "primary.value")]
datf <- datf[!duplicated(datf),]
dw <- reshape(datf, idvar = "region.title",
              timevar = "indicator.title", direction = "wide")
names(dw) <- c("Municipality", "Population", "Education_level")

# Vizualise
p <- ggplot(dw, aes(x = log(Population), y = Education_level)) + geom_point(size = 3) +
  ggtitle("Education level vs. population size") +
    theme(title = element_text(size = 10)) +
  labs(y = "Education level", caption = "Data source: https://sotkanet.fi/sotkanet") +
  theme(axis.title.x = element_text(size = 15)) +
  theme(axis.title.y = element_text(size = 15)) +
  theme(legend.title = element_text(size = 15))
plot(p)
```


## Licensing and Citations

### Sotkanet data

Cite Sotkanet and link to [https://sotkanet.fi/sotkanet/fi/index](https://sotkanet.fi/sotkanet/). Also mention indicator provider.

* [Full license and terms of use](https://sotkanet.fi/sotkanet/en/tietoa-palvelusta).

Central points:

 * SOTKAnet REST API is meant for non-regular data queries. Avoid regular and repeated downloads.
 * SOTKAnet API can be used as the basis for other systems
 * Metadata for regions and indicators are under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) 
 * THL indicators are under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) 
 * Indicators provided by third parties can be used only by separate agreement!


### Sotkanet R package

This work can be freely used, modified and distributed under the [Two-clause BSD license](https://en.wikipedia.org/wiki/BSD_licenses).

```{r citation, message=FALSE, eval=TRUE}
citation("sotkanet")
```


## Suggestions and bug reports 

You can check the package [GitHub page](https://github.com/rOpenGov/sotkanet/issues) for known issues. You can can also use it to report new bugs and to make suggestions for improving the package.

## Session info

This vignette was created with 

```{r sessioninfo, warning = FALSE}
sessionInfo()
```

