lingtypology: easy mapping for Lingustic Typology

George Moroz

2016-12-29

What is lingtypology?

The lingtypology package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology is a package for interactive linguistic cartography.

I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.

1. Installation

Get the stable version from CRAN:

install.packages("lingtypology")

… or get the development version from GitHub:

install.packages("devtools")
devtools::install_github("agricolamz/lingtypology", dependencies = TRUE)

Load package:

library(lingtypology)

2. Glottolog functions

This package is based on the Glottolog database (v. 2.7), so lingtypology has several functions for accessing data from that database. In the Glottolog database, the term languoid is used to catalogue languages, dialects and language families alike.

2.1 Command name’s syntax

Most of the functions in lingtypology have the same syntax: what you need.what you have. Most of them are based on languoid name.

Some of them help to define a vector of languoids.

The most important functionality of ‘lingtypology’ is the ability to create interactive maps based on features and sets of languoids (see the the next section) * map.feature()

Glottolog database (v. 2.7) provides ‘lingtypology’ with languoid names, ISO codes, genealogical affiliation, macro area, countries and coordinates.

2.2 Using base functions

All functions introduced in the previous section are regular functions, so they can take the following objects as input:

iso.lang("Adyghe")
## Adyghe 
##  "ady"
lang.iso("ady")
##      ady 
## "Adyghe"
country.lang("Adyghe")
##                                                                                                                  Adyghe 
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## [1] "Ubykh"     "Abkhazian" "Abaza"     "Adyghe"    "Kabardian"
area.lang(c("Adyghe", "Aduge"))
##    Adyghe     Aduge 
## "Eurasia"  "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
##                                             Adyghe 
##                         "Abkhaz-Adyge, Circassian" 
##                                            Russian 
## "Indo-European, Balto-Slavic, Slavic, East Slavic"
iso.lang(lang.aff("East Slavic"))
##  Belarusian Old Russian     Russian       Rusyn   Ukrainian 
##       "bel"       "orv"       "rus"       "rue"       "ukr"

The behavior of most functions is rather predictable, but the function country.lang has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE, then the function returns a vector of countries where all languoids from the query are spoken.

country.lang(c("Udi", "Laz"))
##                                                        Udi 
##                "Russia, Georgia, Azerbaijan, Turkmenistan" 
##                                                        Laz 
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"

2.3 Spell Checker: look carefully at warnings!

There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:

lang.country("Cape Verde")
## [1] "Portuguese"   "Kabuverdianu"
lang.country("Cabo Verde")
## [1] "Portuguese"   "Kabuverdianu"
head(lang.country("UK"))
## [1] "Somali"                 "'Ta'izzi-Adeni Arabic'"
## [3] "Judeo-Iraqi Arabic"     "Maltese"               
## [5] "Moroccan Arabic"        "Assyrian Neo-Aramaic"

All functions which take a vector of languoids are enriched with a kind of a spell checker. If a languoid from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the languoid from the query.

aff.lang("Adyge")
## Warning in FUN(X[[i]], ...): Languoid Adyge is absent in our database. Did
## you mean Adyghe, Aduge?
## Adyge 
##    NA

2.4 Changes in the glottolog database

Unfortunately, the Glottolog database (v. 2.7) is not perfect, so some changes had to be made:

3. Map features with map.feature

3.1 Base map

The most important part of the lingtypology package is the function map.feature. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:

map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))

As shown in the picture above, this function generates an interactive Leaflet map with a control box that allows users to toggle the visibility of any group of points on the map. All specific points on the map have a pop-up box that appears when markers are clicked (more about editing pop-up boxes see below). By default, they contain languoid names linked to the glottolog site.

3.2 Set features

The goal of this package is to allow typologists to map language types. A list of languoids and correspondent features can be stored in a data.frame as follows:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
##    language      features
## 1    Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3    Polish      fusional
## 4   Russian      fusional
## 5 Bulgarian      fusional

Now we can draw a map:

map.feature(languages = df$language, features = df$features)

Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:

map.feature(df$language, df$features)

As shown in the picture above, all points are grouped by feature, colored and counted. As before, a pop-up box appears when markers are clicked. A control feature allows users to toggle the visibility of points grouped by feature.

There is a title argument for adding a title to the legend:

map.feature(df$language, df$features, title = "morphological type")

3.3 Set pop-up boxes

Sometimes it is a good idea to add some additional information to pop-up boxes, e.g. language affiliation, references or even examples. In order to do so, first of all we need to create an extra vector of strings in our dataframe:

df$popup <- aff.lang(df$language)

The function aff.lang() creates a vector of genealogical affiliations that can be easily mapped:

map.feature(languages = df$language, features = df$features, popup = df$popup)

Like before, it is not necessary to name all arguments, so the same result can be obtained by this:

map.feature(df$language, df$features, df$popup)

Pop-up strings can contain HTML tags, so it is easy to insert a link, a couple of lines or a table. Here is how pop-up boxes can demonstrate language examples:

# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
               "sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
               "id-ę<br> go-1sg.npst<br> 'I go'",
               "ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
               "id-a<br> go-1sg.prs<br> 'I go'")
# create a map
map.feature(df$language, df$features, df$popup)

3.4 Set coordinates

Users can set their own coordinates using the arguments latitude and longitude. I will illustrate this with the dataset circassian built into the lingtypology package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:

map.feature(languages = circassian$language,
            features = circassian$languoid,
            popup = circassian$village,
            latitude = circassian$latitude,
            longitude = circassian$longitude)

3.5 Set colors

By default colors are chosen randomly, but user can set their own colors using argument color:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language, features = df$features, color = c("yellowgreen", "navy"))

Since colors are chosen randomly by default, it is better to use the function set.seed to get reproducible color palette:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
set.seed(48)
map.feature(languages = df$language, features = df$features)

3.6 Set control box and legend

The automatically generated control box that allows users to toggle the visibility of points and features can become inconvenient when there is a large amount of features on the map. To disable it there is an argument control in the map.feature function:

map.feature(lang.aff("Sign Language"), control = FALSE)

To disable the automatically generated legend there is an argument legend in the map.feature function that can be set to FALSE.

3.7 Set an additional set of features using strokes

The map.feature function has an additional argument stroke.features. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color:

map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude)

It is important to note that stroke.features can work with NA values. The function won’t plot anything if there is an NA value. Let’s set a language value to NA in all Baksan villages from the circassian dataset

library(dplyr)
# create newfeature variable
newfeature <- circassian
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature %>% 
  mutate(language = replace(language, languoid == "Baksan", NA)) %>% 
  select(language) %>% 
  unlist() ->
  newfeature
# create a map
map.feature(circassian$language,
            features = circassian$languoid, 
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            stroke.features = newfeature)

3.8 Set radius and opacity

All markers have their own radius and opacity, so it can be set by users. Just use arguments radius, stroke.radius, opacity and stroke.opacity:

map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            radius = 7, stroke.radius = 13)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            opacity = 0.7, stroke.opacity = 0.6)

3.9 Set layouts and other leaflet features

Since the lingtypology package is based on the leaflet package, it is possible to add some map features from that package using the magrittr pipe operator (%>%). For example many popular base-maps can be added using the addProviderTiles() function from the leaflet package (here is the complete set of base-maps).

library(leaflet) # for correct work of %>% operator
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
   feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
   popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
map.feature(df$lang, df$feature, df$popup) %>% 
addProviderTiles("Stamen.Toner")