The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
global_idf3
.bind_tf_idf2
.
norm=TRUE
. Cosine nomalization is now performed on tf_idf
values as in the RMeCab package.tf="itf"
and idf="df"
options.pack
now preserves doc_id
type when it’s factor.MECABRC
environment variable or ~/.mecabrc
to set up dictionaries.tokenize
now skips resetting the output encodings to UTF-8.split
is FALSE
.grain_size
argument to tokenize
.bind_lr
function.RcppParallel::parallelFor
instead of tbb::parallel_for
. There are no user’s visible changes.tokenize
can now accept a character vector in addition to a data.frame like object.gbs_tokenize
is now deprecated. Please use the tokenize
function instead.is_blank
.partial
argument to gbs_tokenize
and tokenize
. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.posDebugRcpp
function.bind_tf_idf2
can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset.collapse_tokens
, mute_tokens
, and lexical_density
can be used for handling a tidy text dataset of tokens.tokenize
, it still requires MeCab and its dictionaries installed and available).tokenize
now preserves the original order of docid_field
.bind_tf_idf2
function and is_blank
function.prettify
now can extract columns only specified by col_select
.NEWS.md
file to track changes to the package.tokenize
now takes a data.frame as its first argument, returns a data.frame only. The former function that gets character vector and returns a data.frame or named list was renamed as gbs_tokenize
.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.