The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
R package to perform sentiment analysis on Italian texts, including four lexicons in tidy format (tibbles): Sentix, MAL, ELIta VAD and basic emotions (Plutchik’s wheel of emotions).
The package provides two main functions to perform sentiment analysis on Italian texts:
sentix_annotate(): Annotates a text or a dataframe of
texts with sentiment scores from the selected lexicon, using
udpipe for parsing, and dplyr for joining the
lexicon.sentix_summarize(): Summarizes the sentiment scores
from the annotated dataframe, providing overall sentiment metrics per
document (or segments) and auxiliary metrics that can be used to
evaluate the results, recompute scores, or apply custom aggregation
strategies.It also provides utility functions to manage the included lexicons, allowing them to be easily used also within other frameworks such as Tidytext and Quanteda.
get_sentix(), get_elita() and
make_polarity(), to extract and use them in custom
analyses;df_to_valence(), df_to_polar() (plus the
wrapper df_to_dict()) to convert tidy lexicons into
Quanteda dictionaries.remotes::install_github("valeriobasile/sentixr", dependencies = TRUE,
build_vignettes = TRUE)While the package is published under a GPL-3.0 license, lexicons are provided under separate terms (CC BY-SA and CC0 1.0 Universal).
Please check the individual lexicon documentation for details.
| Lexicon | Entries | Features | Source and License | |
|---|---|---|---|---|
sentix
|
Sentix 3.1 (Basile and Nissim 2013; Basile et al. 2025) | 68,190 lemmas | Scores (-1, 1), polypathy index |
CC BY-SA |
MAL
|
MAL 3.1 (Vassallo et al. 2019, 2020) | 295,032 inflected forms | Inherited from Sentix |
CC BY-SA |
elita_basic
|
ELIta (Di Palma 2024) | 6,905 lemmas + emojis | Plutchik’s emotions (0, 1) |
CC0 1.0 Universal |
elita_VAD
|
— | 6,905 lemmas + emojis | VAD (-4 / +4) |
CC0 1.0 Universal |
library(sentixr)The sentiment analysis workflow consists of two main steps:
annotation (sentix_annotate) and summarization
(sentix_summarize).
# Single text
# annotate with defaults
sentix_annotate("Oggi è una bella giornata. Esco a fare una passeggiata") |>
# summarize by document - default
sentix_summarize()# A tibble: 1 × 4
doc_id score n_tokens n_scored
<chr> <dbl> <int> <int>
1 doc1 0.176 10 9
sentix_annotate() annotates texts with sentiment scores
from the selected lexicon, after parsing them with udpipe. For
large corpora, the user may optionally specify the number of cores to
use, via the argument parallel.cores, which is inherited
from udpipe and passed to udpipe::udpipe().
The model argument allows specifying a custom
udpipe model. If no model is given, the function will
automatically download the default Italian udpipe model. After the first
run, the downloaded model can be reused.
# Using a model in the working directory
sentix_annotate(testi, model = "local")# Loading a pre-downloaded model
model <- udpipe::udpipe_load_model("italian-isdt-ud-2.5-191206.udpipe")The function, like udpipe, accepts as input single texts,
multiple texts (a character vector, a list or a list of tokens), or
dataframes with text and doc_id columns.
The function also helps the user specify custom column names and
manage document identifiers, which are safely passed to
udpipe::udpipe(). Note that, however, other columns in the
dataframe are ignored.
# Example dataframe with doc_id and text fields
data(recensioni_tv)
# Annotate the dataframe directly}
anno_df <- sentix_annotate(
recensioni_tv,
# loaded model
model = model
)
head(anno_df) doc_id sentence_id token_id token lemma upos score
1 doc1 1 1 Ottimo ottimo ADJ 1.0000000
2 doc1 1 2 prodotto prodotto NOUN 0.0000000
3 doc1 1 3 , , PUNCT NA
4 doc1 1 4 la il DET NA
5 doc1 1 5 qualità qualità NOUN 0.3631757
6 doc1 1 6-7 dell' <NA> <NA> NA
By default, sentix_annotate() uses the Sentix lexicon.
To use a different lexicon, specify it with the dict
argument.
# Use ELIta lexicon with VAD scores
anno_vad <- sentix_annotate(recensioni_tv, model = model, dict = "elita_VAD")sentix_summarize() computes overall sentiment scores and
auxiliary metrics per document (or other segments, via the argument
by) from the annotated dataframe.
sentix_summarize(
anno_df,
# summarize by sentence
by = c("doc_id", "sentence_id")
)# A tibble: 7 × 5
doc_id sentence_id score n_tokens n_scored
<chr> <int> <dbl> <int> <int>
1 doc1 1 0.274 12 9
2 doc2 1 -0.253 4 3
3 doc2 2 -0.0818 11 6
4 doc3 1 0.244 15 9
5 doc4 1 0.178 4 3
6 doc4 2 0.0965 12 9
7 doc5 1 -0.0187 15 9
When using lexicons with multiple features (e.g., ELIta), all features are summarized.
sentix_summarize(anno_vad)# A tibble: 5 × 6
doc_id valenza attivazione dominanza n_tokens n_scored
<chr> <dbl> <dbl> <dbl> <int> <int>
1 doc1 0.500 -0.0971 0.285 12 6
2 doc2 0.0414 0.387 0.0957 15 7
3 doc3 0.402 0.208 0.125 15 3
4 doc4 0.0714 -0.0296 0.0954 16 7
5 doc5 -0.00667 0.0554 0.0342 15 6
If you use sentixr in your research, please cite it as follows:
Vardanega, A., Basile, V., Vassallo, M., Gabrieli, G. & Di Palma, E. (2026). sentixr (Versione 0.1.0) https://github.com/valeriobasile/sentixR
Vardanega, Agnese (Università di Teramo) (mantainer)
Basile, Valerio (Università di Torino)
Vassallo, Marco (CREA-PB)
Gabrieli, Giuliano (CREA-PB)
Di Palma, Eliana (Università di Torino)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.