README

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

sentixr

R package to perform sentiment analysis on Italian texts, including four lexicons in tidy format (tibbles): Sentix, MAL, ELIta VAD and basic emotions (Plutchik’s wheel of emotions).

Overview

The package provides two main functions to perform sentiment analysis on Italian texts:

It also provides utility functions to manage the included lexicons, allowing them to be easily used also within other frameworks such as Tidytext and Quanteda.

Install

Lexicons

While the package is published under a GPL-3.0 license, lexicons are provided under separate terms (CC BY-SA and CC0 1.0 Universal).

Usage

Main workflow

Lexicon		Entries	Features	Source and License
`sentix`	Sentix 3.1 (Basile and Nissim 2013; Basile et al. 2025)	68,190 lemmas	Scores (-1, 1), polypathy index	GitHub, Zenodo, CC BY-SA
`MAL`	MAL 3.1 (Vassallo et al. 2019, 2020)	295,032 inflected forms	Inherited from Sentix	Zenodo CC BY-SA
`elita_basic`	ELIta (Di Palma 2024)	6,905 lemmas + emojis	Plutchik’s emotions (0, 1)	GitHub CC0 1.0 Universal
`elita_VAD`	—	6,905 lemmas + emojis	VAD (-4 / +4)	GitHub CC0 1.0 Universal

The sentiment analysis workflow consists of two main steps: annotation (sentix_annotate) and summarization (sentix_summarize).

# Single text
# annotate with defaults
sentix_annotate("Oggi è una bella giornata. Esco a fare una passeggiata") |>
  # summarize by document - default
  sentix_summarize()

Annotate

sentix_annotate() annotates texts with sentiment scores from the selected lexicon, after parsing them with udpipe. For large corpora, the user may optionally specify the number of cores to use, via the argument parallel.cores, which is inherited from udpipe and passed to udpipe::udpipe().

Managing the udpipe model

The model argument allows specifying a custom udpipe model. If no model is given, the function will automatically download the default Italian udpipe model. After the first run, the downloaded model can be reused.

# Loading a pre-downloaded model
model <- udpipe::udpipe_load_model("italian-isdt-ud-2.5-191206.udpipe")

With multiple texts and dataframe

The function, like udpipe, accepts as input single texts, multiple texts (a character vector, a list or a list of tokens), or dataframes with text and doc_id columns.

The function also helps the user specify custom column names and manage document identifiers, which are safely passed to udpipe::udpipe(). Note that, however, other columns in the dataframe are ignored.

# Example dataframe with doc_id and text fields
data(recensioni_tv)

# Annotate the dataframe directly}
anno_df <- sentix_annotate(
  recensioni_tv,
  # loaded model
  model = model
)

head(anno_df)

With a different lexicon

By default, sentix_annotate() uses the Sentix lexicon. To use a different lexicon, specify it with the dict argument.

# Use ELIta lexicon with VAD scores
anno_vad <- sentix_annotate(recensioni_tv, model = model, dict = "elita_VAD")

Summarize

sentix_summarize() computes overall sentiment scores and auxiliary metrics per document (or other segments, via the argument by) from the annotated dataframe.

When using lexicons with multiple features (e.g., ELIta), all features are summarized.

How to Cite

Vardanega, A., Basile, V., Vassallo, M., Gabrieli, G. & Di Palma, E. (2026). sentixr (Versione 0.1.0) https://github.com/valeriobasile/sentixR

Authors

References

Basile, Valerio, and Malvina Nissim. 2013. “Sentiment Analysis on Italian Tweets.” In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 100–107. https://aclanthology.org/W13-1614/.

Basile, Valerio, Malvina Nissim, Cristina Bosco, Marco Vassallo, and Giuliano Gabrieli. 2025. “Sentix.” https://github.com/valeriobasile/sentix.

Di Palma, Eliana. 2024. “ELIta (Emotion Lexicon for Italian).” http://hdl.handle.net/20.500.11752/OPEN-1036.

Vassallo, Marco, Giuliano Gabrieli, Valerio Basile, and Cristina Bosco. 2019. “The Tenuousness of Lemmatization in Lexicon-Based Sentiment Analysis.” In Proceedings of the Sixth Italian Conference on Computational Linguistics, 2481:1–6. Ceur. https://iris.unito.it/bitstream/2318/1725233/1/paper74.pdf.

———. 2020. “Polarity Imbalance in Lexicon-Based Sentiment Analysis.” In Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-It 2020 : Bologna, Italy, March 1-3, 2021, edited by Felice Dell’Orletta, Johanna Monti, and Fabio Tamburini, 457–63. Collana Dell’associazione Italiana Di Linguistica Computazionale. Accademia University Press. https://doi.org/10.4000/books.aaccademia.8964.

Zanchetta, Eros, and Marco Baroni. 2005. “Morph-It! A Free Corpus-Based Morphological Resource for the Italian Language.” In Proceedings of Corpus Linguistics Conference Series 2005 (ISSN 1747-9398), 1:1–12. University of Birmingham.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.