The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
tokenize_tweets() function, which is no
longer supported.tokenize_ptb() function for Penn Treebank
tokenizations (@jrnold) (#12).chunk_text() to split long documents
into pieces (#30).tokenize_tweets() preserves usernames,
hashtags, and URLS (@kbenoit) (#44).stopwords() function has been removed in favor of
using the stopwords package (#46).tif package. (#49)tokenize_skip_ngrams has been improved to generate
unigrams and bigrams, according to the skip definition (#24).tokenizers supports (@ironholds) (#26).tokenize_skip_ngrams now supports stopwords (#31).NA consistently (#33).tokenize_words() gains arguments to preserve or strip
punctuation and numbers (#48).tokenize_skip_ngrams() and
tokenize_ngrams() to return properly marked UTF8 strings on
Windows (@patperry)
(#58).tokenize_tweets() now removes stopwords prior to
stripping punctuation, making its behavior more consistent with
tokenize_words() (#76).tokenize_character_shingles() tokenizer.tokenize_words() and
tokenize_word_stems().These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.