Mixed N-Grams and Unigram Sequence Segmentation [R package NUSS version 0.1.0]

Oskar Kosch

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

NUSS: Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Version:	0.1.0
Depends:	R (≥ 3.5)
Imports:	dplyr, magrittr, Rcpp, stringr, text2vec, textclean, utils
LinkingTo:	BH, Rcpp
Suggests:	testthat (≥ 3.0.0)
Published:	2024-08-19
DOI:	10.32614/CRAN.package.NUSS
Author:	Oskar Kosch [aut, cre]
Maintainer:	Oskar Kosch <contact at oskarkosch.com>
BugReports:	https://github.com/theogrost/NUSS/issues
License:	GPL (≥ 3)
URL:	https://github.com/theogrost/NUSS
NeedsCompilation:	yes
Language:	en
Materials:	README
CRAN checks:	NUSS results

Documentation:

Reference manual:

NUSS.html , NUSS.pdf

Downloads:

Package source:	NUSS_0.1.0.tar.gz
Windows binaries:	r-devel: NUSS_0.1.0.zip, r-release: NUSS_0.1.0.zip, r-oldrel: NUSS_0.1.0.zip
macOS binaries:	r-release (arm64): NUSS_0.1.0.tgz, r-oldrel (arm64): NUSS_0.1.0.tgz, r-release (x86_64): NUSS_0.1.0.tgz, r-oldrel (x86_64): NUSS_0.1.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=NUSS to link to this page.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.