The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

metaphonebr

CRAN status Codecov test coverage check CRAN/METACRAN Total downloads Lifecycle: experimental Project Status: Active – The project has reached a stable, usable state and is being actively developed.

The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.

Installation

The stable version of the package can be installed with:

install.packages("metaphonebr")

You can install the development version of metaphonebr from GitHub with :

# install.packages("remotes")
remotes::install_github("ipeadata-lab/metaphonebr")

Example

This is a basic example which shows how to use the main function:

example_names <- c("João da Silva", "Maria", "Marya",
                    "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr::metaphonebr(example_names)
print(data.frame(original = example_names, metaphonebr = phonetic_codes))

The metaphoneBR phonetic encoding algorithm proceeds as follows:

  1. Initial Cleanup & Preparation:
  2. Silent Letter Removal:
  3. Digraph Simplification (Sound Grouping):
  4. Similar Consonant Simplification:
  5. Terminal Nasal Sound Simplification:
  6. Duplicate Vowel Removal:
  7. Final Cleanup (Duplicate Letters & Spaces):

The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).

Note Ipea

metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.