The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

ragnar

library(ragnar)

Getting Started with ragnar

Retrieval-Augmented Generation (RAG) is a practical technique for improving large language model (LLM) outputs by grounding them with external, trusted content. The ragnar package provides tools for building RAG workflows in R, with a focus on transparency and control at each step.

This guide walks through building a simple chat tool for Quarto documentation using ragnar. The code examples are simplified for clarity; for a full implementation, see https://github.com/t-kalinowski/quartohelp.

Why RAG? The Hallucination Problem

LLMs can produce remarkable outputs: fluent, confident, plausible responses to a wide range of prompts. But anyone who has spent time with ChatGPT or similar models has observed responses that are confident, plausible, and wrong.

When the generated output is wrong, we call that a hallucination, and hallucinations seem to be an inherent consequence of how LLMs work. LLMs operate on text sequences; they do not seem to possess a concept of “facts” and “truth” like humans do. They generate text with no awareness of whether it is true or false, only guided by similarity to patterns in text sequences in their training data.

Put simply, in philosopher Harry Frankfurt’s sense of the word, the models generate “bullshit” 1:

It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.

RAG addresses this by retrieving relevant excerpts from a corpus of trusted, vetted sources and asking the LLM to summarize, paraphrase, or answer the user’s question using only that material. This grounds the response in known content and reduces the risk of hallucination. RAG shifts the LLM’s job from open-ended generation to summarizing or quoting from retrieved material.

RAG reduces but does not eliminate hallucinations. For richer texts and tasks, LLMs may still miss nuance or overgeneralize. For this reason, it’s helpful if RAG-based tools present links back to the original material so users can check context and verify details.

Setting up RAG

At a high level, setting up RAG has two stages: preparing the knowledge store (a database of processed content), and establishing the workflow for retrieval and chat.

Creating the Store

First, create a store. The store holds your processed docs and embeddings. When you create the store, you select the embedding provider. This choice is fixed for the store, but you can always create a new store if you want to change it.

store_location <- "quarto.ragnar.duckdb"
store <- ragnar_store_create(
  store_location,
  embed = \(x) ragnar::embed_openai(x, model = "text-embedding-3-small")
)

To generate embeddings, you can use embed_openai(), an open-source model via embed_ollama(), or your own function.

Identify Documents for Processing

Gather a list of documents you want to insert in the database. For local files, this can be a simple list.files() on a directory of documents.

If you’re building a store from a website, you can use ragnar_find_links() to collect URLs.

paths <- ragnar_find_links("https://quarto.org/", depth = 3)

For some sites, it may be easier to clone and build the site locally, then reference the files from the local file system. You can also process the sitemap if one is available.

At the end of this step, you should have a character vector of file paths and URLs.

Convert Documents to Markdown

Convert each document to markdown. Markdown is preferred because it’s plain text, easy to inspect, keeps token counts low, and works well for both humans and LLMs.

For this step, ragnar provides read_as_markdown() and ragnar_read(), which can accept a wide variety of formats (pdf, docx, pptx, html, zip files, epubs, etc.). In many cases it works well, but for specialized needs you can opt for a more custom-tailored approach. See the help in ?read_as_markdown for some guidance on alternatives if you’d like to improve on the default conversion. (But only begin optimizing once you have a basic app working.)

ragnar_read() does the same thing as read_as_markdown(), but instead of returning a string, it returns a dataframe that also includes origin and hash columns, and possibly other metadata.

Chunk and Augment

Next, split the documents into smaller chunks. This is necessary because embedding models have context size limits, and because chunking allows you to return just the most relevant excerpts from a long document.

Chunking is delicate; Ideally, each chunk should stand alone without relying on the context of the surrounding document. We can aim to split the text at natural points like headings or paragraphs, and avoid splits in the middle of a sentence or word.

Additionally, we can augment chunks with context that describes the chunk’s origin–such as URL, title, headings, and subheadings–both so the LLM can provide links back to the source, and so the LLM and embedding models can better situate the chunk’s content.

To help with these tasks, use ragnar_read() and ragnar_chunk().

ragnar_read() can split a document using markdown headings, while also extracting the heading titles as columns in the dataframe. Use the frame_by_tags argument and select the document heading levels you want to segment by. If you’re starting from already converted markdown content, use markdown_frame() or markdown_segment() instead.

ragnar_chunk() then chunks the text further, trying to split at semantic boundaries (like paragraphs or sentences) with progressively finer granularity as needed, using the boundary types and chunk size you specify. The default chunk size is 1600 characters, which is about one page.

To augment chunks, use glue string interpolation and include the metadata extracted by ragnar_read(). Since you’ll be doing this for every document, it’s helpful to wrap this step in a function.

read_and_chunk <- function(path) {
  path |>
    ragnar_read(frame_by_tags = c("h1", "h2", "h3")) |>
    ragnar_chunk(boundaries = c("paragraph", "sentence")) |>
    dplyr::mutate(
      text = glue::glue(
        r"---(
        > Excerpt from: {origin}
        > {h1}
        > {h2}
        > {h3}
        {text}
        )---"
      )
    )
}

Note that an alternative approach for augmenting chunks with context can be to use an LLM with instructions to “situate this excerpt from this document,” or, worse, “summarize this document.” This can work but carries significant risk. Remember, the goal is to create a knowledge store–a trusted, factual, vetted source of truth. Giving an LLM an opportunity to corrupt this store with hallucinations may be necessary depending on your needs, but as an initial approximation, I recommend starting with an ingestion pipeline that does not give any opportunities for hallucinations to enter the store.

Insert in the Store

Take your augmented document chunks and insert them into the store by calling ragnar_store_insert(). This function will automatically generate embeddings using the embed function specified when the store was first created.

ragnar_store_insert(store, chunks)

Tying it Together

Repeat these steps for every document you want to insert into the store. Once you’re done processing the documents, call ragnar_store_build_index() to finalize the store and build the index.

for (path in paths) {
  chunks <- read_and_chunk(path)
  ragnar_store_insert(store, chunks)
}

ragnar_store_build_index(store)

Once the store index is built, the store is ready for retrieval.


Retrieval

To retrieve content from the store, call ragnar_retrieve(). This function uses two retrieval methods:

To limit the search to one method, use ragnar_retrieve_vss() or ragnar_retrieve_bm25().

You can register ragnar_retrieve() as an LLM tool. This is an effective technique for implementing RAG, as it allows the LLM to rephrase unclear questions, ask follow-up questions, or search more than once if needed. Register ragnar_retrieve() as a tool with ellmer::Chat using ragnar_register_tool_retrieve():

client <- ellmer::chat_openai()
ragnar_register_tool_retrieve(
  client, store, top_k = 10,
  description = "the quarto website"
)

Note that the registered tool is intentionally simple. It asks the LLM to provide one argument: the query string. LLM tool calls are just text completions after all, like any other LLM output. We minimize the complexity in the tool interface to minimize opportunities for LLM to make errors.

Rather than exposing detailed search options to the LLM, we can instead set a high top_k value to return more chunks than usually necessary. This provides some slack in the chat app, so we can gracefully handle less-than-perfectly-ranked search results.

Customizing Retrieval

For more context-specific tasks, you may want to define your own retrieval tool and pair it with a system prompt that explains how to use the results.

For example, suppose you want the LLM to perform repeated searches if the first search does not return relevant information, and you also want to ensure repeated searches do not return previously seen chunks. Here’s an example of how you might do this

First, set up the system prompt:

client <- chat_openai(model = "gpt-4.1")
client$set_system_prompt(glue::trim(
  "
  You are an expert in Quarto documentation. You are concise.
  Always perform a search of the Quarto knowledge store for each user request.
  If the initial search does not return relevant documents, you may perform
  up to three additional searches. Each search will return unique, new excerpts.
  If no relevant results are found, inform the user and do not attempt to answer the question.
  If the user request is ambiguous, perform at least one search first, then ask a clarifying question.

  Every response must cite links to official documentation sources.
  Always include a minimal, fully self-contained Quarto document in your answer.
  "
))

Next, define a custom tool:

rag_retrieve_quarto_excerpts <- local({
  retrieved_chunk_ids <- integer()
  function(text) {
    # Search, excluding previously seen chunks
    chunks <- dplyr::tbl(store) |>
      dplyr::filter(!.data$id %in% retrieved_chunk_ids) |>
      ragnar::ragnar_retrieve(text, top_k = 10)

    # Update seen chunks
    retrieved_chunk_ids <<- unique(c(retrieved_chunk_ids, chunks$id))

    # Return formatted excerpts delimited with pseudo-xml tags.
    stringi::stri_c(
      "<excerpt>",
      chunks$text,
      "</excerpt>",
      sep = "\n",
      collapse = "\n"
    )
  }
})

Register the custom tool:

client$register_tool(ellmer::tool(
  rag_retrieve_quarto_excerpts,
  glue::trim(
    "
    Use this tool to retrieve the most relevant excerpts from the Quarto
    knowledge store for a given text input. This function:
    - uses both vector (semantic) similarity and BM25 text search,
    - never returns the same excerpt twice in the same session,
    - returns results as plain text wrapped in <excerpt> tags.
    "
  ),
  text = ellmer::type_string()
))

Troubleshooting and Debugging

Developing a RAG app is an iterative process. There are many places to potentially spend effort on improvements:

It’s helpful to iterate in the context of an end-to-end application.

You can use ragnar_store_inspect() to interactively see what kinds of results are returned by the store for different queries. This helps confirm that chunking and augmentation preserve semantic meaning and that the embedding model is working as expected.

If the results shown in the inspector do not seem useful or relevant to you, they likely won’t be useful to an LLM either. Iterate on the store creation pipeline until retrieval returns meaningful excerpts.

Some things you can try:

Chat interfaces and LLM marketing invite us to think of LLMs as general-purpose agents, able to answer anything. In practice, however, as of 2025, building a reliable, accurate, LLM-powered solution where details and facts matter means carefully scoping what the model is responsible for.

With that in mind, note that this chat app described here does not intend to replace documentation or act as a general-purpose assistant. Its goal is to provide a faster, more contextual way to find the right place in the docs, with enough information for the user to decide if they need to read further. It’s designed it to allow users to naturally escalate: if the LLM is not able to provide a useful answer, the user can use the provided links and transition to reading the source material without friction.

Cost Management

Using LLMs and embeddings incurs costs, regardless of whether you use a commercial provider or an open source model on your own hardware. Some tips for managing costs:

Summary

ragnar provides a practical, transparent way to build RAG workflows in R. By combining semantic and keyword search, clear chunking and augmentation, and focused prompt and tool design, you can create fast, interactive documentation chat tools that help users find answers quickly and reliably.

Building a good RAG system is iterative. Inspect intermediate outputs, tune chunking and retrieval, and keep the user’s workflow in mind. With these guardrails, you can reduce hallucinations and deliver trustworthy, grounded answers–while also giving users a path to the original source.

For more details and a full example, see the quartohelp package.


  1. https://press.princeton.edu/books/hardcover/9780691122946/on-bullshit↩︎

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.