The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

sentixr

License: GPL-3 Lifecycle: experimental

R package to perform sentiment analysis on Italian texts, including four lexicons in tidy format (tibbles): Sentix, MAL, ELIta VAD and basic emotions (Plutchik’s wheel of emotions).

Overview

The package provides two main functions to perform sentiment analysis on Italian texts:

It also provides utility functions to manage the included lexicons, allowing them to be easily used also within other frameworks such as Tidytext and Quanteda.

Install

remotes::install_github("valeriobasile/sentixr", dependencies = TRUE, 
                         build_vignettes = TRUE)

Lexicons

While the package is published under a GPL-3.0 license, lexicons are provided under separate terms (CC BY-SA and CC0 1.0 Universal).

Please check the individual lexicon documentation for details.

Lexicon Entries Features Source and License
sentix Sentix 3.1 (Basile and Nissim 2013; Basile et al. 2025) 68,190 lemmas Scores (-1, 1), polypathy index

GitHub, Zenodo,

CC BY-SA

MAL MAL 3.1 (Vassallo et al. 2019, 2020) 295,032 inflected forms Inherited from Sentix

Zenodo

CC BY-SA

elita_basic ELIta (Di Palma 2024) 6,905 lemmas + emojis Plutchik’s emotions (0, 1)

GitHub

CC0 1.0 Universal

elita_VAD 6,905 lemmas + emojis VAD (-4 / +4)

GitHub

CC0 1.0 Universal

Usage

library(sentixr)

Main workflow

The sentiment analysis workflow consists of two main steps: annotation (sentix_annotate) and summarization (sentix_summarize).

# Single text
# annotate with defaults
sentix_annotate("Oggi è una bella giornata. Esco a fare una passeggiata") |>
  # summarize by document - default
  sentix_summarize()
# A tibble: 1 × 4
  doc_id score n_tokens n_scored
  <chr>  <dbl>    <int>    <int>
1 doc1   0.176       10        9

Annotate

sentix_annotate() annotates texts with sentiment scores from the selected lexicon, after parsing them with udpipe. For large corpora, the user may optionally specify the number of cores to use, via the argument parallel.cores, which is inherited from udpipe and passed to udpipe::udpipe().

Managing the udpipe model

The model argument allows specifying a custom udpipe model. If no model is given, the function will automatically download the default Italian udpipe model. After the first run, the downloaded model can be reused.

# Using a model in the working directory
sentix_annotate(testi, model = "local")
# Loading a pre-downloaded model
model <- udpipe::udpipe_load_model("italian-isdt-ud-2.5-191206.udpipe")

With multiple texts and dataframe

The function, like udpipe, accepts as input single texts, multiple texts (a character vector, a list or a list of tokens), or dataframes with text and doc_id columns.

The function also helps the user specify custom column names and manage document identifiers, which are safely passed to udpipe::udpipe(). Note that, however, other columns in the dataframe are ignored.

# Example dataframe with doc_id and text fields
data(recensioni_tv)

# Annotate the dataframe directly}
anno_df <- sentix_annotate(
  recensioni_tv,
  # loaded model
  model = model
)

head(anno_df)
  doc_id sentence_id token_id    token    lemma  upos     score
1   doc1           1        1   Ottimo   ottimo   ADJ 1.0000000
2   doc1           1        2 prodotto prodotto  NOUN 0.0000000
3   doc1           1        3        ,        , PUNCT        NA
4   doc1           1        4       la       il   DET        NA
5   doc1           1        5  qualità  qualità  NOUN 0.3631757
6   doc1           1      6-7    dell'     <NA>  <NA>        NA

With a different lexicon

By default, sentix_annotate() uses the Sentix lexicon. To use a different lexicon, specify it with the dict argument.

# Use ELIta lexicon with VAD scores
anno_vad <- sentix_annotate(recensioni_tv, model = model, dict = "elita_VAD")

Summarize

sentix_summarize() computes overall sentiment scores and auxiliary metrics per document (or other segments, via the argument by) from the annotated dataframe.

sentix_summarize(
  anno_df,
  # summarize by sentence
  by = c("doc_id", "sentence_id")
)
# A tibble: 7 × 5
  doc_id sentence_id   score n_tokens n_scored
  <chr>        <int>   <dbl>    <int>    <int>
1 doc1             1  0.274        12        9
2 doc2             1 -0.253         4        3
3 doc2             2 -0.0818       11        6
4 doc3             1  0.244        15        9
5 doc4             1  0.178         4        3
6 doc4             2  0.0965       12        9
7 doc5             1 -0.0187       15        9

When using lexicons with multiple features (e.g., ELIta), all features are summarized.

sentix_summarize(anno_vad)
# A tibble: 5 × 6
  doc_id  valenza attivazione dominanza n_tokens n_scored
  <chr>     <dbl>       <dbl>     <dbl>    <int>    <int>
1 doc1    0.500       -0.0971    0.285        12        6
2 doc2    0.0414       0.387     0.0957       15        7
3 doc3    0.402        0.208     0.125        15        3
4 doc4    0.0714      -0.0296    0.0954       16        7
5 doc5   -0.00667      0.0554    0.0342       15        6

How to Cite

If you use sentixr in your research, please cite it as follows:

Vardanega, A., Basile, V., Vassallo, M., Gabrieli, G. & Di Palma, E. (2026). sentixr (Versione 0.1.0) https://github.com/valeriobasile/sentixR

Authors

References

Basile, Valerio, and Malvina Nissim. 2013. “Sentiment Analysis on Italian Tweets.” In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 100–107. https://aclanthology.org/W13-1614/.
Basile, Valerio, Malvina Nissim, Cristina Bosco, Marco Vassallo, and Giuliano Gabrieli. 2025. “Sentix.” https://github.com/valeriobasile/sentix.
Di Palma, Eliana. 2024. “ELIta (Emotion Lexicon for Italian).” http://hdl.handle.net/20.500.11752/OPEN-1036.
Vassallo, Marco, Giuliano Gabrieli, Valerio Basile, and Cristina Bosco. 2019. “The Tenuousness of Lemmatization in Lexicon-Based Sentiment Analysis.” In Proceedings of the Sixth Italian Conference on Computational Linguistics, 2481:1–6. Ceur. https://iris.unito.it/bitstream/2318/1725233/1/paper74.pdf.
———. 2020. “Polarity Imbalance in Lexicon-Based Sentiment Analysis.” In Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-It 2020 : Bologna, Italy, March 1-3, 2021, edited by Felice Dell’Orletta, Johanna Monti, and Fabio Tamburini, 457–63. Collana Dell’associazione Italiana Di Linguistica Computazionale. Accademia University Press. https://doi.org/10.4000/books.aaccademia.8964.
Zanchetta, Eros, and Marco Baroni. 2005. “Morph-It! A Free Corpus-Based Morphological Resource for the Italian Language.” In Proceedings of Corpus Linguistics Conference Series 2005 (ISSN 1747-9398), 1:1–12. University of Birmingham.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.