lingtypology
: easy mapping for Lingustic TypologyThe lingtypology
package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology
is a package for interactive linguistic cartography. I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.
Since lingtypology
is an R package, you should install R on your PC if you haven’t already done so. To install the lingtypology
package, run the following command at your R IDE, so you get the stable version from CRAN:
install.packages("lingtypology")
You can also get the development version from GitHub:
install.packages("devtools")
devtools::install_github("agricolamz/lingtypology")
Load package:
library(lingtypology)
This package is based on the Glottolog database (v. 2.7), so lingtypology
has several functions for accessing data from that database. In the Glottolog database, the term languoid is used to catalogue languages, dialects and language families alike.
Most of the functions in lingtypology
have the same syntax: what you need.what you have. Most of them are based on languoid name.
Some of them help to define a vector of languoids.
The most important functionality of lingtypology
is the ability to create interactive maps based on features and sets of languoids (see the third section):
Glottolog database (v. 2.7) provides lingtypology
with languoid names, ISO codes, genealogical affiliation, macro area, countries and coordinates.
All functions introduced in the previous section are regular functions, so they can take the following objects as input:
iso.lang("Adyghe")
## Adyghe
## "ady"
lang.iso("ady")
## ady
## "Adyghe"
country.lang("Adyghe")
## Adyghe
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## [1] "Adyghe" "Ubykh" "Abkhaz" "Abaza" "Kabardian"
I would like to point out that strings in R can be created using single or double quotes. Since inserting single quotes in a string created with single quotes causes an error in R, I use double quotes in my tutorial. You can use single quotes, but be careful and remember that 'Ma'ya'
is an incorrect string in R.
area.lang(c("Adyghe", "Aduge"))
## Adyghe Aduge
## "Eurasia" "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
## Adyghe
## "Abkhaz-Adyge, Circassian"
## Russian
## "Indo-European, Balto-Slavic, Slavic, East Slavic"
iso.lang(lang.aff("East Slavic"))
## Russian Rusyn Ukrainian Belarusian Old Russian
## "rus" "rue" "ukr" "bel" "orv"
The behavior of most functions is rather predictable, but the function country.lang
has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE
, then the function returns a vector of countries where all languoids from the query are spoken.
country.lang(c("Udi", "Laz"))
## Udi
## "Russia, Georgia, Azerbaijan, Turkmenistan"
## Laz
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"
There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:
lang.country("Cape Verde")
## [1] "Kabuverdianu" "Portuguese"
lang.country("Cabo Verde")
## [1] "Kabuverdianu" "Portuguese"
head(lang.country("UK"))
## [1] "Angloromani" "Welsh" "English"
## [4] "French" "Assyrian Neo-Aramaic" "Northern Kurdish"
All functions which take a vector of languoids are enriched with a kind of a spell checker. If a languoid from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the languoid from the query.
aff.lang("Adyge")
## Warning: Languoid Adyge is absent in our database. Did you mean Adyghe,
## Aduge?
## Adyge
## NA
Unfortunately, the Glottolog database (v. 2.7) is not perfect for all my tasks, so I changed it a little bit:
After Robert Forkel’s issue I decided to add an argument glottolog.source
, so that everybody has access to “original” and “modified” (by default) glottolog versions:
is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "original")
## [1] FALSE TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "modified")
## [1] TRUE FALSE
It is common practice in R to reduce both function arguments and its values, so this can also be done with the following lingtypology functions.
is.glottolog(c("Tabasaran", "Tabassaran"), g = "o")
## [1] FALSE TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), g = "m")
## [1] TRUE FALSE
map.feature
The most important part of the lingtypology
package is the function map.feature
. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:
map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))
As shown in the picture above, this function generates an interactive Leaflet map with a control box that allows users to toggle the visibility of any group of points on the map. All specific points on the map have a pop-up box that appears when markers are clicked (see section 3.3 for more information about editing pop-up boxes). By default, they contain languoid names linked to the glottolog site.
If you are new to R, please find some information about how to import data to R. It is simple to make a .csv, .ods or .xls files containing lists of languages and features and read it from R (.csv is the easiest way).
If for some reasons you are not using RStudio or you want to automatically create and save a lot of maps, you can save a map to a variable and use the htmlwidgets
package for saving created maps to an .html file. I would like to thank Timo Roettger for mentioning this problem.
m <- map.feature(c("Adyghe", "Korean"))
# install.packages("htmlwidgets")
library(htmlwidgets)
saveWidget(m, file="/home/agricolamz/_DATA/OneDrive1/_Work/github/lingtypology/m.html")
There is an export button in RStudio, but for some reason it is not so easy to save a map as a .png or .jpg file using code. Here is a possible solution.
The goal of this package is to allow typologists (or any other linguists) to map language features. A list of languoids and correspondent features can be stored in a data.frame
as follows:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
## language features
## 1 Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3 Polish fusional
## 4 Russian fusional
## 5 Bulgarian fusional
Now we can draw a map:
map.feature(languages = df$language, features = df$features)
If you have a lot of features and they appear in the legend in a senseless order (by default it is ordered alphabetically), you can reorder them using factors (a vector with ordered levels, for more information see ?factor
). For example, I want the feature polysynthetic to be listed first, followed by fusional:
df$features <- factor(df$features, levels = c("polysynthetic", "fusional"))
map.feature(languages = df$language, features = df$features)
Since the correspondence between a color palette and a mapped features is chosen randomly by default, it is better to use the function set.seed
, to get a reproducible map (or choose colors yourself, see section 3.5):
set.seed(42)
map.feature(languages = df$language, features = df$features)
Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:
set.seed(42)
map.feature(df$language, df$features)
Sometimes it is a good idea to add some additional information (e.g. language affiliation, references or even examples) to pop-up boxes that appear when points are clicked. In order to do so, first of all we need to create an extra vector of strings in our dataframe:
df$popup <- aff.lang(df$language)
The function aff.lang()
creates a vector of genealogical affiliations that can be easily mapped:
set.seed(42)
map.feature(languages = df$language, features = df$features, popup = df$popup)
Like before, it is not necessary to name all arguments, so the same result can be obtained by this:
set.seed(42)
map.feature(df$language, df$features, df$popup)
Pop-up strings can contain HTML tags, so it is easy to insert a link, a couple of lines, a table or even a video and sound. Here is how pop-up boxes can demonstrate language examples:
# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"id-ę<br> go-1sg.npst<br> 'I go'",
"ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
"id-a<br> go-1sg.prs<br> 'I go'")
# create a map
set.seed(42)
map.feature(df$language, df$features, df$popup)
How to say moon in Sign Languages? Here is an example:
# Lets create a dataframe with links to video
sign_df <- data.frame(languages = c("American Sign Language", "Russian Sign Language", "French Sign Language"),
popup = c("https://media.spreadthesign.com/video/mp4/13/48600.mp4", "https://media.spreadthesign.com/video/mp4/12/17639.mp4", "https://media.spreadthesign.com/video/mp4/10/17638.mp4"))
# Change popup to an HTML code
sign_df$popup <- paste("<video width='200' height='150' controls> <source src='",
as.character(sign_df$popup),
"' type='video/mp4'></video>", sep = "")
# create a map
map.feature(languages = sign_df$languages, popup = sign_df$popup)
An alternative way to add some short text to a map is to use the label
option.
set.seed(42)
map.feature(df$language, df$features,
label = df$language)
There are some additional arguments for customization: label.fsize
for setting font size, label.position
for controlling the label position, and label.hide
to control the appearance of the label: if TRUE
, the labels are displayed on mouse over (as on the previous map), if FALSE
, the labels are always displayed (as on the next map).
set.seed(42)
map.feature(df$language, df$features,
label = df$language,
label.fsize = 20,
label.position = "left",
label.hide = TRUE)
Users can set their own coordinates using the arguments latitude
and longitude
. I will illustrate this with the dataset circassian
built into the lingtypology
package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:
set.seed(42)
map.feature(languages = circassian$language,
features = circassian$languoid,
popup = circassian$village,
latitude = circassian$latitude,
longitude = circassian$longitude)
By default the color palette is created by the rainbow()
function, but users can set their own colors using the argument color
:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language,
features = df$features,
color = c("yellowgreen", "navy"))
The package can generate a control box that allows users to toggle the visibility of points and features. To enable it, there is an argument control
in the map.feature
function:
set.seed(42)
map.feature(languages = df$language,
features = df$features,
control = TRUE)
The map.feature
function has an additional argument stroke.features
. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color
:
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude)
It is important to note that stroke.features
can work with NA
values. The function won’t plot anything if there is an NA
value. Let’s set a language value to NA
in all Baksan villages from the circassian
dataset
# create newfeature variable
newfeature <- circassian[,c(5,6)]
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature <- replace(newfeature$language, newfeature$languoid == "Baksan", NA)
# create a map
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
latitude = circassian$latitude,
longitude = circassian$longitude,
stroke.features = newfeature)
All markers have their own radius and opacity, so it can be set by users. Just use the arguments radius
, stroke.radius
, opacity
and stroke.opacity
:
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
radius = 7, stroke.radius = 13)
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
opacity = 0.7, stroke.opacity = 0.6)
By default the legend appears in the bottom left corner. If there are stroke features, two legends are generated. There are additional arguments that control the appearence and the title of the legends.
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
legend = FALSE, stroke.legend = TRUE)
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
title = "Circassian dialects", stroke.title = "Languages")
legend.position
and stroke.legend.position
allow users to change a legend’s position using “topright”, “bottomright”, “bottomleft” or “topleft” strings.
A scale bar is automatically added to a map, but users can control its appearance (set scale.bar
argument to TRUE
or FALSE
) and its position (use scale.bar.position
argument values “topright”, “bottomright”, “bottomleft” or “topleft”).
set.seed(42)
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
scale.bar = TRUE,
scale.bar.position = "topright")
It is possible to use different tiles on the same map using the tile
argument. For more tiles see here.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = "Thunderforest.OpenCycleMap")
It is possible to use different map tiles on the same map. Just add a vector with tiles.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"))
It is possible to name tiles using the tile.name
argument.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
tile.name = c("b & w", "colored"))
It is possible to combine the tiles’ control box with the features’ control box.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
control = TRUE)
It is possible to add a minimap to a map.
set.seed(42)
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
minimap = TRUE)
Users can control its appearance (by setting the minimap
argument to TRUE or FALSE), its position (by using the values “topright”, “bottomright”, “bottomleft” or “topleft” of the minimap.position
argument) and its height and width (with the arguments minimap.height
and minimap.width
).
set.seed(42)
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
minimap = TRUE,
minimap.position = "topright",
minimap.height = 100,
minimap.width = 100)
The argument images.url allows users to add their own pictures to a map, using an url. In this part I will use two histograms on the most numerous nationalities in Moscow and St. Petersburg, based on data from the last Russian Census:
Let’s create a dataframe.
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shortener by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls)
Users can change the size of the pictures.
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 200,
image.height = 200)
It can be moved from the actual point:
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 150,
image.height = 150,
image.X.shift = 10,
image.Y.shift = 0)
Using this argument, users can plot their own markers, any chart connected to a point or even their own legend. It is important to know that by using transparent .png files, the user can plot an additional legend text on the map.
lingtyplogy
It is important to cite R and R packages when you use them. For this purpose use the citation
function in R:
citation("lingtypology")
## Warning in citation("lingtypology"): no date field in DESCRIPTION file of
## package 'lingtypology'
## Warning in citation("lingtypology"): could not determine year for
## 'lingtypology' from package DESCRIPTION file
##
## To cite package 'lingtypology' in publications use:
##
## George Moroz (NA). lingtypology: Linguistic Typology and
## Mapping. https://CRAN.R-project.org/package=lingtypology,
## https://github.com/agricolamz/lingtypology/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {lingtypology: Linguistic Typology and Mapping},
## author = {George Moroz},
## note = {https://CRAN.R-project.org/package=lingtypology, https://github.com/agricolamz/lingtypology/},
## }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.