The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Extended Installation Guide

No one knows everything — and you don’t need to! Many of our users are not computer scientists, so some of the installation steps and technical terms may feel unfamiliar at first. That’s completely normal.

This guide is written to walk you through the process step by step, with explanations along the way. And remember: search engines (like DuckDuckGo) and chatbots (like ChatGPT) can be helpful allies if you ever get stuck or encounter a new concept.

If you run into issues that aren’t covered here, please reach out to us at: rtext.contact@gmail.com, so that we can improve the instructions for everyone.

To get started, follow these steps:

1) Install R and RStudio
Use the links below to download and install:
- R (version 4.0.0 or higher)
- RStudio (recommended)

2) Install and set up the text package
The text package gives you access to HuggingFace Transformers through reticulate (enabling advanced language analysis), which connects R to Python (while remaining in an R environment). It uses Python packages like torch and transformers.

To make this easy, run:
- textrpp_install() – installs a ready-to-use Python/Conda environment
- textrpp_initialize() – activates the environment for use in R

This setup will handle most Python and system dependencies automatically – however, you may be isntructed to install system level dependencis as further described below.

Set up a python environment for the text-package

library(text)
library(reticulate)

# Install text-required Python packages in a Conda environment
text::textrpp_install()

# Show available Conda environments
reticulate::conda_list()

# Initialize the installed Conda environment
text::textrpp_initialize(save_profile = TRUE)

# Test that textEmbed works
textEmbed("hello")

System-level dependencies

Different operating systems require different system-level dependencies, especially when working with Python packages that rely on compiled extensions or specialized libraries. Below is a summary of what’s commonly required and how to install it.

Windows

Microsoft C++ Build Tools

Some Python packages (e.g., hdbscan, flair) require compilation with Visual C++. Windows users may need to manually install Microsoft C++ Build Tools. You must have C++ build tools for Python packages that need compilation.

Install the latest version of Microsoft C++ Build Tools:
1. Download and run the installer from https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. During installation, check:
- “Desktop development with C++” or “C++ build tools”.
- Ensure “Windows 11 SDK” is also selected on the right menu.
3. Complete installation and restart your computer.

Terms of Service conflict – Annaconda forge channel

Python environments used by the text and talk packages rely on packages from the conda-forge channel.

You don’t need to install Anaconda manually – the textrpp_install() function will install Miniconda and set conda-forge as the default channel.

However, if you’ve previously installed Anaconda or changed channels, you might encounter ToS conflicts (Terms of Service).

If errors mention pkgs/main or tos accept, you can fix it by running the following in Terminal/Command Prompt (not R):

# Run this in your Terminal (not in R)
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --remove channels defaults

MacOS

On macOS, most system-level dependencies are typically pre-installed. For any missing components, the text package automatically detects them and provides clear instructions to guide the user through installation.

1 Homebrew & libomp

Install Homebrew (if not already installed):

# Run this in your Terminal (not in R)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Install libomp:

# Run this in your Terminal (not in R)
brew install libomp

Ubuntu

On recent Ubuntu distributions (e.g., 22.04+), most core dependencies are available.

If needed, you can install them with:

# Run this in your Terminal (not in R)
sudo apt update
sudo apt install build-essential libomp-dev

build-essential: provides gcc, g++, and make
libomp-dev: for OpenMP support

General troubleshooting

1. Check if you have install permissions

Can you install an R package like dplyr?

install.packages("dplyr")

Can you install system-level tools like Python/miniconda?

library(reticulate)
reticulate::install_miniconda()

If you do not have premissions, please contact your administrator for advice.

2. Remembered to initialize the python environment.

After restarting R, functions like textEmbed() can stop working again.

Solution: Persist the initialization in your R profile:

text::textrpp_initialize(
  condaenv = "textrpp_condaenv",
  refresh_settings = TRUE,
  save_profile = TRUE,
) 

3. Install the development version from GitHub.

# install.packages("devtools")
devtools::install_github("oscarkjell/text")

4. Force reinstallation of the environment

library(text)
text::textrpp_install(
  update_conda = TRUE,
  force_conda = TRUE
)

5. Install the python environment using reticulate

See the article Installing and Managing Python Environments with reticulate, for detailed information.

6. Inspect diagnostic information

If something isn’t working right, it is a good start to examine what is installed and running on your system.

library(text)
log <- text::textDiagnostics()
log 

Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our GitHub Actions. If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations.

To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested.

Virtual environments

It is also possible to use virtual environments (although it is currently only tested on MacOS).

# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
                                 python_path = "/usr/local/bin/python3.9",
                                 envname = "textrpp_virtualenv")

# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
                         condaenv = NULL,
                         save_profile = TRUE)

Solving OMP errors and R/Rstudio crashes

Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers.

Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS:

Sys.setenv(OMP_NUM_THREADS = "1")
Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1")
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE")

These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries

This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently.

The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting:

Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE")
library(text)

The exact way to install these packages may differ across systems. Please see:
Python
torch
transformers

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.