The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
audubon is Japanese text processing tools for:
Some features above are not implemented in ‘ICU’ (i.e., the stringi package), and the goal of the audubon package is to provide these additional features.
strj_fill_iter_mark
repeats the previous character and replaces the iteration marks if the element has more than 5 characters. You can use this feature with strj_normalize
or strj_rewrite_as_def
.
strj_fill_iter_mark(c(
"あいうゝ〃かき",
"金子みすゞ",
"のたり〳〵かな",
"しろ/″\とした"
))
#> [1] "あいうううかき" "金子みすず" "のたりたりかな" "しろじろとした"
strj_fill_iter_mark("いすゞエルフトラック") |>
strj_normalize()
#> [1] "いすずエルフトラック"
Character class conversion uses hakatashi/japanese.js.
strj_hiraganize("あのイーハトーヴォのすきとおった風")
#> [1] "あのいーはとーゔぉのすきとおった風"
strj_katakanize("あのイーハトーヴォのすきとおった風")
#> [1] "アノイーハトーヴォノスキトオッタ風"
strj_romanize("あのイーハトーヴォのすきとおった風")
#> [1] "anoīhatōvonosukitōtta"
strj_tokenize
splits Japanese text into some phrases using google/budoux, TinySegmenter, or other tokenizers.
strj_normalize
normalizes text following the rule based on NEologd style.
strj_rewrite_as_def
is an R port of SudachiCharNormalizer that typically normalizes characters following a ’*.def’ file.
audubon package contains several ’*.def’ files, so you can use them or write a ‘rewrite.def’ file by yourself as follows.
# single characters will **never** be normalized.
…
# if two characters are separated with a tab,
# left side forms are always rewritten to right side forms
# before normalized.
斎 斉
齋 斉
齊 斉
# supports rewriting a single character to a single character,
# i.e., this cannot work.
アッ ア
This feature is more powerful than stringi::stri_trans_*
because it allows users to control which characters are normalized. For instance, this function can be used to convert kyuji-tai characters to shinji-tai characters.
stringi::stri_trans_nfkc("Ⅹⅳ")
#> [1] "Xiv"
strj_rewrite_as_def("Ⅹⅳ")
#> [1] "Ⅹⅳ"
strj_rewrite_as_def("惡と假面のルール", read_rewrite_def(system.file("def/kyuji.def", package = "audubon")))
#> [1] "悪と仮面のルール"
© 2024 Akiru Kato
Licensed under the Apache License, Version 2.0.
Icons made by iconixar from flaticon.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.