lingtypology
: easy mapping for Lingustic TypologyThe lingtypology
package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology
is a package for interactive linguistic cartography.
I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.
Get the stable version from CRAN:
install.packages("lingtypology")
… or get the development version from GitHub:
install.packages("devtools")
devtools::install_github("agricolamz/lingtypology", dependencies = TRUE)
Load package:
library(lingtypology)
This package is based on the Glottolog database (v. 2.7), so lingtypology
has several functions for accessing data from that database. In the Glottolog database, the term languoid is used to catalogue languages, dialects and language families alike.
Most of the functions in lingtypology
have the same syntax: what you need.what you have. Most of them are based on languoid name.
Some of them help to define a vector of languoids.
The most important functionality of ‘lingtypology’ is the ability to create interactive maps based on features and sets of languoids (see the the next section) * map.feature()
Glottolog database (v. 2.7) provides ‘lingtypology’ with languoid names, ISO codes, genealogical affiliation, macro area, countries and coordinates.
All functions introduced in the previous section are regular functions, so they can take the following objects as input:
iso.lang("Adyghe")
## Adyghe
## "ady"
lang.iso("ady")
## ady
## "Adyghe"
country.lang("Adyghe")
## Adyghe
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## [1] "Ubykh" "Abkhazian" "Abaza" "Adyghe" "Kabardian"
area.lang(c("Adyghe", "Aduge"))
## Adyghe Aduge
## "Eurasia" "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
## Adyghe
## "Abkhaz-Adyge, Circassian"
## Russian
## "Indo-European, Balto-Slavic, Slavic, East Slavic"
iso.lang(lang.aff("East Slavic"))
## Belarusian Old Russian Russian Rusyn Ukrainian
## "bel" "orv" "rus" "rue" "ukr"
The behavior of most functions is rather predictable, but the function country.lang
has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE
, then the function returns a vector of countries where all languoids from the query are spoken.
country.lang(c("Udi", "Laz"))
## Udi
## "Russia, Georgia, Azerbaijan, Turkmenistan"
## Laz
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"
There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:
lang.country("Cape Verde")
## [1] "Portuguese" "Kabuverdianu"
lang.country("Cabo Verde")
## [1] "Portuguese" "Kabuverdianu"
head(lang.country("UK"))
## [1] "Somali" "'Ta'izzi-Adeni Arabic'"
## [3] "Judeo-Iraqi Arabic" "Maltese"
## [5] "Moroccan Arabic" "Assyrian Neo-Aramaic"
All functions which take a vector of languoids are enriched with a kind of a spell checker. If a languoid from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the languoid from the query.
aff.lang("Adyge")
## Warning in FUN(X[[i]], ...): Languoid Adyge is absent in our database. Did
## you mean Adyghe, Aduge?
## Adyge
## NA
Unfortunately, the Glottolog database (v. 2.7) is not perfect, so some changes had to be made:
map.feature
The most important part of the lingtypology
package is the function map.feature
. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:
map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))
As shown in the picture above, this function generates an interactive Leaflet map with a control box that allows users to toggle the visibility of any group of points on the map. All specific points on the map have a pop-up box that appears when markers are clicked (more about editing pop-up boxes see below). By default, they contain languoid names linked to the glottolog site.
The goal of this package is to allow typologists to map language types. A list of languoids and correspondent features can be stored in a data.frame
as follows:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
## language features
## 1 Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3 Polish fusional
## 4 Russian fusional
## 5 Bulgarian fusional
Now we can draw a map:
map.feature(languages = df$language, features = df$features)
Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:
map.feature(df$language, df$features)
There is a title
argument for adding a title to the legend:
map.feature(df$language, df$features, title = "morphological type")
Sometimes it is a good idea to add some additional information to pop-up boxes, e.g. language affiliation, references or even examples. In order to do so, first of all we need to create an extra vector of strings in our dataframe:
df$popup <- aff.lang(df$language)
The function aff.lang()
creates a vector of genealogical affiliations that can be easily mapped:
map.feature(languages = df$language, features = df$features, popup = df$popup)
Like before, it is not necessary to name all arguments, so the same result can be obtained by this:
map.feature(df$language, df$features, df$popup)
# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"id-ę<br> go-1sg.npst<br> 'I go'",
"ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
"id-a<br> go-1sg.prs<br> 'I go'")
# create a map
map.feature(df$language, df$features, df$popup)
Users can set their own coordinates using the arguments latitude
and longitude
. I will illustrate this with the dataset circassian
built into the lingtypology
package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:
map.feature(languages = circassian$language,
features = circassian$languoid,
popup = circassian$village,
latitude = circassian$latitude,
longitude = circassian$longitude)
By default colors are chosen randomly, but user can set their own colors using argument color
:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language, features = df$features, color = c("yellowgreen", "navy"))
Since colors are chosen randomly by default, it is better to use the function set.seed
to get reproducible color palette:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
set.seed(48)
map.feature(languages = df$language, features = df$features)
The automatically generated control box that allows users to toggle the visibility of points and features can become inconvenient when there is a large amount of features on the map. To disable it there is an argument control
in the map.feature
function:
map.feature(lang.aff("Sign Language"), control = FALSE)
To disable the automatically generated legend there is an argument legend
in the map.feature
function that can be set to FALSE
.
The map.feature
function has an additional argument stroke.features
. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color
:
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude)
It is important to note that stroke.features
can work with NA
values. The function won’t plot anything if there is an NA
value. Let’s set a language value to NA
in all Baksan villages from the circassian
dataset
library(dplyr)
# create newfeature variable
newfeature <- circassian
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature %>%
mutate(language = replace(language, languoid == "Baksan", NA)) %>%
select(language) %>%
unlist() ->
newfeature
# create a map
map.feature(circassian$language,
features = circassian$languoid,
latitude = circassian$latitude,
longitude = circassian$longitude,
stroke.features = newfeature)
All markers have their own radius and opacity, so it can be set by users. Just use arguments radius
, stroke.radius
, opacity
and stroke.opacity
:
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
radius = 7, stroke.radius = 13)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
opacity = 0.7, stroke.opacity = 0.6)
Since the lingtypology
package is based on the leaflet
package, it is possible to add some map features from that package using the magrittr pipe operator (%>%
). For example many popular base-maps can be added using the addProviderTiles()
function from the leaflet
package (here is the complete set of base-maps).
library(leaflet) # for correct work of %>% operator
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup) %>%
addProviderTiles("Stamen.Toner")