lingtypology
: easy mapping for Linguistic Typologylingtypology
?The lingtypology
package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology
is a package for interactive linguistic mapping. I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.
Since lingtypology
is an R package, you should install R on your PC if you haven’t already done so. To install the lingtypology
package, run the following command at your R IDE, so you get the stable version from CRAN:
install.packages("lingtypology")
You can also get the development version from GitHub:
install.packages("devtools")
devtools::install_github("ropensci/lingtypology")
Sometimes installation failed because of the absence of the package crosstalk
or any other. Just install it using command install.packages("crosstalk")
.
Load package:
library(lingtypology)
This package is based on the Glottolog database (v. 2.7), so lingtypology
has several functions for accessing data from that database.
Most of the functions in lingtypology
have the same syntax: what you need.what you have. Most of them are based on language name.
Some of them help to define a vector of languages.
Additionaly there are some functions to convert glottocodes to ISO 639-3 codes and vice versa:
The most important functionality of lingtypology
is the ability to create interactive maps based on features and sets of languages (see the third section):
Glottolog database (v. 2.7) provides lingtypology
with language names, ISO codes, genealogical affiliation, macro area, countries, coordinates, and many information. This set of functions doesn’t have a goal to cover all possible combinations of functions. Check out an additional information that is preserved in version of the Glottolog database used in lingtypology
:
names(glottolog.original)
## [1] "language" "iso" "glottocode"
## [4] "longitude" "latitude" "alternate_names"
## [7] "area" "affiliation" "affiliation-HH"
## [10] "country" "dialects" "language_development"
## [13] "language_status" "language_use" "location"
## [16] "other_comments" "population" "population_numeric"
## [19] "timespan" "typology" "writing"
Using R functions for data manipulation you can create your own database for your purpose.
All functions introduced in the previous section are regular functions, so they can take the following objects as input:
iso.lang("Adyghe")
## Adyghe
## "ady"
lang.iso("ady")
## ady
## "Adyghe"
country.lang("Adyghe")
## Adyghe
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## character(0)
I would like to point out that strings in R can be created using single or double quotes. Since inserting single quotes in a string created with single quotes causes an error in R, I use double quotes in my tutorial. You can use single quotes, but be careful and remember that 'Ma'ya'
is an incorrect string in R.
area.lang(c("Adyghe", "Aduge"))
## Adyghe Aduge
## "Eurasia" "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
## Adyghe
## "North Caucasian, West Caucasian, Circassian"
## Russian
## "Indo-European, Slavic, East"
iso.lang(lang.aff("East Slavic"))
## Old Russian
## "orv"
If you are a new to R, it is important to mention that you can create a table with languages, features and other parametres with any spreadsheet software you used to work. Then you can import created file to R using a standard tools.
The behavior of most functions is rather predictable, but the function country.lang
has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE
, then the function returns a vector of countries where all languages from the query are spoken.
country.lang(c("Udi", "Laz"))
## Udi
## "Russia, Georgia, Azerbaijan, Turkmenistan"
## Laz
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"
There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:
lang.country("Cape Verde")
## [1] "Kabuverdianu" "Portuguese"
lang.country("Cabo Verde")
## [1] "Kabuverdianu" "Portuguese"
head(lang.country("UK"))
## [1] "Old English (ca. 450-1100)" "French"
## [3] "Parsi" "Somali"
## [5] "Angloromani" "Northern Pashto"
All functions which take a vector of languages are enriched with a kind of a spell checker. If a language from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the language from the query.
aff.lang("Adyge")
## Warning: Language Adyge is absent in our version of the Glottolog database.
## Did you mean Adyghe, Aduge?
## Adyge
## NA
Unfortunately, the Glottolog database (v. 2.7) is not perfect for all my tasks, so I changed it a little bit:
More ditailed information about how our database was created can be seen from GitHub folder.
After Robert Forkel’s issue I decided to add an argument glottolog.source
, so that everybody has access to “original” and “modified” (by default) glottolog versions:
is.glottolog(c("Abkhaz", "Abkhazian"), glottolog.source = "original")
## [1] FALSE TRUE
is.glottolog(c("Abkhaz", "Abkhazian"), glottolog.source = "modified")
## [1] TRUE FALSE
It is common practice in R to reduce both function arguments and its values, so this can also be done with the following lingtypology
functions.
is.glottolog(c("Abkhaz", "Abkhazian"), g = "o")
## [1] FALSE TRUE
is.glottolog(c("Abkhaz", "Abkhazian"), g = "m")
## [1] TRUE FALSE
map.feature
The most important part of the lingtypology
package is the function map.feature
. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:
map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))
As shown in the picture above, this function generates an interactive Leaflet map. All specific points on the map have a pop-up box that appears when markers are clicked (see section 3.3 for more information about editing pop-up boxes). By default, they contain language names linked to the glottolog site.
If for some reasons you are not using RStudio or you want to automatically create and save a lot of maps, you can save a map to a variable and use the htmlwidgets
package for saving created maps to an .html file. I would like to thank Timo Roettger for mentioning this problem.
m <- map.feature(c("Adyghe", "Korean"))
# install.packages("htmlwidgets")
library(htmlwidgets)
saveWidget(m, file="TYPE_FILE_PATH/m.html")
There is an export button in RStudio, but for some reason it is not so easy to save a map as a .png or .jpg file using code. Here is a possible solution.
The goal of this package is to allow typologists (or any other linguists) to map language features. A list of languages and correspondent features can be stored in a data.frame
as follows:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
## language features
## 1 Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3 Polish fusional
## 4 Russian fusional
## 5 Bulgarian fusional
Now we can draw a map:
map.feature(languages = df$language,
features = df$features)
If you have a lot of features and they appear in the legend in a senseless order (by default it is ordered alphabetically), you can reorder them using factors (a vector with ordered levels, for more information see ?factor
). For example, I want the feature polysynthetic to be listed first, followed by fusional:
df$features <- factor(df$features, levels = c("polysynthetic", "fusional"))
map.feature(languages = df$language, features = df$features)
Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:
map.feature(df$language, df$features)
There are several types of variables in R and map.feature
works differently depending on the variable type. I will use a build in data set ejective_and_n_consonants
that contains 27 languages from LAPSyD database. This dataset have two variables: categorical variable ejectives
indicates whether language have any ejective sound, numeric variable n.cons.lapsyd
contain information about number of consonants (based on LAPSyD database). We can create two maps with categorical variable and with numeric variable:
map.feature(ejective_and_n_consonants$language,
ejective_and_n_consonants$ejectives) # categorical
map.feature(ejective_and_n_consonants$language,
ejective_and_n_consonants$n.cons.lapsyd) # numeric
Default colors are not perfect for this goal, but the main point is clear. For creating correct map, you should correctly define the type of the variable.
This dataset also can be used to show one other parameter of the map.feature
function. There are two possible ways to show the World map: with the Atlantic sea or with the Pacific sea in the middle. If you don’t need default Pacific view use the map.orientation
parameter (thanks @languageSpaceLabs and @tzakharko for that idea):
map.feature(ejective_and_n_consonants$language,
ejective_and_n_consonants$n.cons.lapsyd,
map.orientation = "Atlantic")
Sometimes it is a good idea to add some additional information (e.g. language affiliation, references or even examples) to pop-up boxes that appear when points are clicked. In order to do so, first of all we need to create an extra vector of strings in our dataframe:
df$popup <- aff.lang(df$language)
The function aff.lang()
creates a vector of genealogical affiliations that can be easily mapped:
map.feature(languages = df$language, features = df$features, popup = df$popup)
Like before, it is not necessary to name all arguments, so the same result can be obtained by this:
map.feature(df$language, df$features, df$popup)
Pop-up strings can contain HTML tags, so it is easy to insert a link, a couple of lines, a table or even a video and sound. Here is how pop-up boxes can demonstrate language examples:
# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"id-ę<br> go-1sg.npst<br> 'I go'",
"ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
"id-a<br> go-1sg.prs<br> 'I go'")
# create a map
map.feature(df$language, df$features, df$popup)
How to say moon in Sign Languages? Here is an example:
# Lets create a dataframe with links to video
sign_df <- data.frame(languages = c("American Sign Language", "Russian Sign Language", "French Sign Language"),
popup = c("https://media.spreadthesign.com/video/mp4/13/48600.mp4", "https://media.spreadthesign.com/video/mp4/12/17639.mp4", "https://media.spreadthesign.com/video/mp4/10/17638.mp4"))
# Change popup to an HTML code
sign_df$popup <- paste("<video width='200' height='150' controls> <source src='",
as.character(sign_df$popup),
"' type='video/mp4'></video>", sep = "")
# create a map
map.feature(languages = sign_df$languages, popup = sign_df$popup)
An alternative way to add some short text to a map is to use the label
option.
map.feature(df$language, df$features,
label = df$language)
There are some additional arguments for customization: label.fsize
for setting font size, label.position
for controlling the label position, and label.hide
to control the appearance of the label: if TRUE
, the labels are displayed on mouse over (as on the next map), if FALSE
, the labels are always displayed (as on the previous map).
map.feature(df$language, df$features,
label = df$language,
label.fsize = 20,
label.position = "left",
label.hide = TRUE)
Users can set their own coordinates using the arguments latitude
and longitude
. It is important to note, that lingtypology
works only with decimal degrees (something like this: 0.1), not with degrees, minutes and seconds (something like this: 0° 06′ 0″). I will illustrate this with the dataset circassian
built into the lingtypology
package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:
map.feature(languages = circassian$language,
features = circassian$dialect,
popup = circassian$village,
latitude = circassian$latitude,
longitude = circassian$longitude)
By default the color palette is created by the rainbow()
function, but users can set their own colors using the argument color
:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language,
features = df$features,
color = c("yellowgreen", "navy"))
There are some built in packages that also can be used as a color argument: RColorBrewer or viridis.
map.feature(ejective_and_n_consonants$language,
ejective_and_n_consonants$n.cons.lapsyd,
color = "magma")
The package can generate a control box that allows users to toggle the visibility of points and features. To enable it, there is an argument control
in the map.feature
function:
map.feature(languages = df$language,
features = df$features,
control = TRUE)
The map.feature
function has an additional argument stroke.features
. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color
:
map.feature(circassian$language,
features = circassian$dialect,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude)
It is important to note that stroke.features
can work with NA
values. The function won’t plot anything if there is an NA
value. Let’s set a language value to NA
in all Baksan villages from the circassian
dataset
# create newfeature variable
newfeature <- circassian[,c(5,6)]
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature <- replace(newfeature$language, newfeature$language == "Baksan", NA)
# create a map
map.feature(circassian$language,
features = circassian$dialect,
latitude = circassian$latitude,
longitude = circassian$longitude,
stroke.features = newfeature)
All markers have their own radius and opacity, so it can be set by users. Just use the arguments radius
, stroke.radius
, opacity
and stroke.opacity
:
map.feature(circassian$language,
features = circassian$dialect,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
radius = 7, stroke.radius = 13)
map.feature(circassian$language,
features = circassian$dialect,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
opacity = 0.7, stroke.opacity = 0.6)
By default the legend appears in the bottom left corner. If there are stroke features, two legends are generated. There are additional arguments that control the appearence and the title of the legends.
map.feature(circassian$language,
features = circassian$dialect,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
legend = FALSE, stroke.legend = TRUE)
map.feature(circassian$language,
features = circassian$dialect,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
title = "Circassian dialects", stroke.title = "Languages")
legend.position
and stroke.legend.position
allow users to change a legend’s position using “topright”, “bottomright”, “bottomleft” or “topleft” strings.
A scale bar is automatically added to a map, but users can control its appearance (set scale.bar
argument to TRUE
or FALSE
) and its position (use scale.bar.position
argument values “topright”, “bottomright”, “bottomleft” or “topleft”).
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
scale.bar = TRUE,
scale.bar.position = "topright")
It is possible to use different tiles on the same map using the tile
argument. For more tiles see here.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup,
tile = "Thunderforest.OpenCycleMap")
It is possible to use different map tiles on the same map. Just add a vector with tiles.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"))
It is possible to name tiles using the tile.name
argument.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
tile.name = c("b & w", "colored"))
It is possible to combine the tiles’ control box with the features’ control box.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
control = TRUE)
It is possible to add a minimap to a map.
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
minimap = TRUE)
Users can control its appearance (by setting the minimap
argument to TRUE or FALSE), its position (by using the values “topright”, “bottomright”, “bottomleft” or “topleft” of the minimap.position
argument) and its height and width (with the arguments minimap.height
and minimap.width
).
map.feature(c("Adyghe", "Polish", "Kabardian", "Russian"),
minimap = TRUE,
minimap.position = "topright",
minimap.height = 100,
minimap.width = 100)
The argument images.url allows users to add their own pictures to a map, using an url. In this part I will use two histograms on the most numerous nationalities in Moscow and St. Petersburg, based on data from the last Russian Census:
Let’s create a dataframe.
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shortener by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls)
Users can change the size of the pictures.
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 200,
image.height = 200)
It can be moved from the actual point:
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 150,
image.height = 150,
image.X.shift = 10,
image.Y.shift = 0)
Using this argument, users can plot their own markers, any chart connected to a point or even their own legend. It is important to know that by using transparent .png files, the user can plot an additional legend text on the map.
Sometimes it is easear to look at density contour plot. It can be created using density.estimation
argument:
map.feature(circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = TRUE,
color = "darkgreen")
If there are some levels of the features
argument, kernal density estimation plot will be created for each level:
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = TRUE,
color = c("darkgreen", "blue"))
It is possible to remove points and display only kernal density estimation plot, using the "blank"
value:
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = "blank",
color = c("darkgreen", "blue"))
It is possible to change kernal density estimation plot opacity using density.estimation.opacity
argument:
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = "blank",
density.estimation.opacity = 0.5,
color = c("darkgreen", "blue"))
Since this type of visualisation is based on kernal density estimation, there are parametres density.longitude.width
and density.latitude.width
that increase/decrease area:
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = T,
density.longitude.width = 0.3,
density.latitude.width = 0.3,
color = c("darkgreen", "blue"))
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = T,
density.longitude.width = 0.7,
density.latitude.width = 0.7,
color = c("darkgreen", "blue"))
map.feature(circassian$language,
features = circassian$language,
longitude = circassian$longitude,
latitude = circassian$latitude,
density.estimation = T,
density.longitude.width = 1.3,
density.latitude.width = 0.9,
color = c("darkgreen", "blue"))
It is important to note, that this type of visualisation have some shortcomings. Kernal density estimation is calculated without any adjustment, so longitude and latitude values used as a values in Cartesian coordinate system. To reduce consequences of that solution it is better to use different coordinate projection. That allows not to treat Earth as a flat object.
dplyr
integrationIt is possible to use dplyr
functions and pipes with lingtypology
. It is widely used, so I give some examples, how to use it with lingtypology
package. Using query “list of languages csv” I found Vincent Garnier’s languages-list repository. Lets download and map all languages from that set. Lets download data:
new_data <- read.csv("https://goo.gl/GgscBE")
tail(new_data)
## X639.1 X639.2.T X639.2.B Language.name Native.name
## 180 xh xho xho Xhosa isiXhosa
## 181 yi yid yid Yiddish ייִדיש
## 182 yo yor yor Yoruba Yorùbá
## 183 za zha zha Zhuang, Chuang Saɯ cueŋƅ, Saw cuengh
## 184 zh zho chi Chinese 中文 (Zhōngwén), 汉语, 漢語
## 185 zu zul zul Zulu isiZulu
As we see, some values of the Language.name
variable contain more then one language name. Some of the names probably have different name in our database. Imagine that we want to map all languages from Africa. For correct work of the following examples use library(dplyr)
.
new_data %>%
mutate(Language.name = gsub(pattern = " ", replacement = "", Language.name)) %>%
filter(is.glottolog(Language.name) == TRUE) %>%
filter(area.lang(Language.name) == "Africa") %>%
select(Language.name) %>%
map.feature()
We start with a dataframe, here new_data
. First we remove spaces on the end of each string. Then we check, whether the language names are in glottolog database. Then we select only row that cantain languages of Africa. Then we select the Language.name
variable. And the last line map all selected languages.
By default, values that came from the pipe are treated as a first argument of a function. But when there are some additional arguments, point sign specify what exact position should be piped to. Lets produce the same map with a minimap.
new_data %>%
mutate(Language.name = gsub(pattern = " ", replacement = "", Language.name)) %>%
filter(is.glottolog(Language.name) == TRUE) %>%
filter(area.lang(Language.name) == "Africa") %>%
select(Language.name) %>%
map.feature(., minimap = TRUE)
lingtyplogy
It is important to cite R and R packages when you use them. For this purpose use the citation
function:
citation("lingtypology")
##
## Moroz G (2017). _lingtypology: easy mapping for Linguistic
## Typology_. <URL: https://CRAN.R-project.org/package=lingtypology>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {lingtypology: easy mapping for Linguistic Typology},
## author = {George Moroz},
## year = {2017},
## url = {https://CRAN.R-project.org/package=lingtypology},
## }