The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.
Version: | 2.0.0 |
Depends: | R (≥ 3.5.0) |
Suggests: | testthat (≥ 3.0.0) |
Published: | 2022-03-03 |
DOI: | 10.32614/CRAN.package.wordpiece.data |
Author: | Jonathan Bratt [aut], Jon Harmon [aut, cre], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph], Google, Inc [cph] (original BERT vocabularies) |
Maintainer: | Jon Harmon <jonthegeek at gmail.com> |
BugReports: | https://github.com/macmillancontentscience/wordpiece.data/issues |
License: | Apache License (≥ 2) |
URL: | https://github.com/macmillancontentscience/wordpiece.data |
NeedsCompilation: | no |
Materials: | README NEWS |
CRAN checks: | wordpiece.data results |
Reference manual: | wordpiece.data.pdf |
Package source: | wordpiece.data_2.0.0.tar.gz |
Windows binaries: | r-devel: wordpiece.data_2.0.0.zip, r-release: wordpiece.data_2.0.0.zip, r-oldrel: wordpiece.data_2.0.0.zip |
macOS binaries: | r-release (arm64): wordpiece.data_2.0.0.tgz, r-oldrel (arm64): wordpiece.data_2.0.0.tgz, r-release (x86_64): wordpiece.data_2.0.0.tgz, r-oldrel (x86_64): wordpiece.data_2.0.0.tgz |
Old sources: | wordpiece.data archive |
Reverse imports: | wordpiece |
Please use the canonical form https://CRAN.R-project.org/package=wordpiece.data to link to this page.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.