Downloading and using data from bdl

Krzysztof Kania

2019-09-11

The bdl package is an interface to Local Data Bank(Bank Danych Lokalnych - bdl) API with set of useful tools like quick plotting using data from data bank.

Intro

Working with bdl is based on id codes. Most of data downloading functions requires specifying one or vector of multiple unit or variable ids as a string.

It is recommended to use private API key which u can get here. To apply use: options(bdl.api_private_key ="your_key")

Also every function returns data by default in Polish. If you would like get data in English just add lang = "en" to any function.

Any metadata information (unit levels, aggregates, NUTS code explanation etc.) can be found here.

Searching unit id

When searching for unit id we can use two methods:

Units consist of 6 levels:

get_levels()
#> # A tibble: 8 x 2
#>      id name                
#>   <int> <chr>               
#> 1     0 Poziom Polski       
#> 2     1 Poziom Makroregionów
#> 3     2 Poziom Województw   
#> 4     3 Poziom Regionów     
#> # ... with 4 more rows

Lowest - seventh level has own separate functions with suffix localities. Warning - localities functions have different set of arguments. Check package or API documentation for more info.

Tree listing

To get all units available in local data bank run get_units() without any argument(warning - it can eat data limit very fast around 4.5k rows):

To narrow the list add unitParentId. Function will return all children units for a given parent at all levels. Add level argument to filter units even further.

Searching subject and variable id

Subjects are themed directories of variables.

We have two searching methods for both subjects and variables:

Subjects

To directly search for subject we just provide search phrase:

Subjects consist of 3 levels (categories, groups, subgroups) - K, G and P respectively. The fourth level of subject (child of subgroup) would be variables.

To list all top level subjects use get_subjects():

To list sub-subjects to given category or group use get_subjects() with parentId argument:

Variables

Firstly you can list variables for given subject (subgroup):

Secondly you can direct search variables with search_variables(). You can use empty string as name to list all variables but I strongly advise against as it has around 40 000 rows and you will probably hit data limit.

You can narrow search to given subject - subgroup:

Downloading data

If you picked unit and variable codes you are ready to download data. You can do this two ways: - Download data of multiple variables on single unit get_data_by_unit() - Download data of single variable on multiple units get_data_by_variable()

Single unit, multiple variables

We will use get_data_by_unit(). We specify our single unit as unitId string argument and variables by vector of strings. Optionally we can specify interested years of data. If not all available years.

To get more information about data we can add type argument and set it to "label" to add additional column with variable info.

Multiple units, single variable

We will use get_data_by_variable(). We specify our single variable as varId string argument. If no unitParentId is provided function will return all available units for given variable. Setting unitParentId will return all available children units (on all levels). To narrow units level set unitLevel. Optionally we can specify interested years of data. If not all available years.

Useful tools

The bdl package provide couple of additional functions for summarizing and visualizing data.

Summary

Data downloaded via get_data_by_unit() or get_data_by_variable() and their locality versions can be easily summarized by summary():

Plotting

Plotting functions in this package are interfaces to data downloading functions. Some of them require specifying data_type - method for downloading data and rest of arguments will be relevant to specified data_type function. Check documentation for more details.

Scatter plot is unique - requires vector of only 2 variables.

Map generation

With bdl package comes bdl.maps dataset containing spatial maps for each Poland’s level. generate_map() use them to generate maps filled with bdl data. Use unitLevel to change type of map. The lower level is chosen the slower map will generate as it has more spatial data to process. This function requires external map data “bdl.maps” loaded to global environment. You can get data here:

Map download.

Download data file and double-click to load it to environment.

Multi download

Downloading functions get_data_by_unit() and get_data_by_variable() have alternative “multi” downloading mode. Function that would work for example single unit, if provided a vector will make additional column with values for each unit provided:

Or multiple variables for get_data_by_variable():

This type of downloading is removing some of columns from original “single” data download.

This mode works for locality version as well.