The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
fastTextR is an R interface to the fastText library. It can be used to word representation learning (Bojanowski et al., 2016) and supervised text classification (Joulin et al., 2016). Particularly the advantage of fastText to other software is that, it was designed for biggish data.
The following example is based on the examples provided in the fastText library, the example shows how to use fastTextR for word representation. For more informations about word representations can be found at the fastText homepage.
library("fastTextR")
The training of these models can be quite time consuming therefore pre-trained models are a good option.
model <- ft_load("cc.en.300.bin")
ft_word_vectors(model, c("asparagus", "pidgey", "yellow"))[,1:5]
## [,1] [,2] [,3] [,4] [,5]
## asparagus 0.0292057190 -0.0114405714 -0.003201437 0.03087331 0.127229080
## pidgey 0.0452978685 0.0090015158 0.067562237 0.11123407 -0.008441916
## yellow 0.0007776691 -0.0001886144 0.001824494 0.03869999 0.036413591
ft_sentence_vectors(model, c("Poets have been mysteriously silent on the subject of cheese", "Who did not let the gorilla into the ballet"))[,1:5]
???
ft_nearest_neighbors(model, 'asparagus', k = 5L)
## aspargus broccolini artichokes asparagus. asparagas
## 0.7316202 0.6995656 0.6930545 0.6915916 0.6911229
ft_analogies(model, c("berlin", "germany", "france"))
## paris france. avignon montpellier paris.
## 0.6831182 0.6408537 0.6288283 0.6138449 0.6059716
## rennes london Paris. toulon montparnasse
## 0.5884554 0.5832924 0.5743204 0.5727922 0.5715630
[1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}
[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
@article{joulin2016bag,
title={Bag of Tricks for Efficient Text Classification},
author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.01759},
year={2016}
}
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.