---
title: "Auditing an R package you have just received"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Auditing an R package you have just received}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

```{r setup}
library(checkhelper)
```

This vignette is the canonical end-to-end walkthrough: a colleague hands
you an R package and asks "is this CRAN-ready?". The goal is to surface
every CRAN-blocking issue with the **smallest possible number of
`R CMD check` runs**, then apply the safe automatic fixes.

For audits that have their own pipeline (full CRAN environment,
file-system snapshots), see the companion vignette
`vignette("pre-submission-gates", package = "checkhelper")`.

## TL;DR - the audit script

```{r}
pkg <- "/path/to/the/package"

# 1. Run R CMD check ONCE and reuse it everywhere it's needed.
chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

# 2. Static audits (no extra check needed).
audit_tags(pkg)            # exported funs without @return / internals without @noRd
audit_ascii(pkg)           # non-ASCII characters in R/, tests/, vignettes/, man/, DESCRIPTION, NAMESPACE
audit_dataset_doc(pkg)     # datasets in data/ without a roxygen block
audit_citation(pkg)        # old-style personList() / citEntry() in inst/CITATION
audit_dontrun(pkg)         # \dontrun{} blocks in man/*.Rd
audit_description(pkg)     # unquoted package names in DESCRIPTION's Description field
audit_downloads(pkg)       # network / download call sites to review for offline-safe guards

# 3. Audits that need the check output - pass `chk` to skip a 2nd run.
audit_globals(pkg, checks = chk)

# 4. Apply the safe fixes.
fix_globals(pkg, checks = chk, write = TRUE)

# Preview before applying: fix_ascii() returns invisibly, so capture
# it to see which files would change.
preview <- fix_ascii(pkg, dry_run = TRUE)
preview[preview$changed, ]
fix_ascii(pkg, dry_run = FALSE)        # then apply

fix_dataset_doc("my_data", pkg = pkg,
                description = "Description of my_data",
                source = "Internal")        # one call per undocumented dataset
```

## Why share the check object?

`audit_globals()` and `fix_globals()` parse the `notes` field of an
`rcmdcheck::rcmdcheck()` result to extract the
`no visible binding for global variable` and
`no visible global function definition` notes. By default each call
runs its own check, which is slow on a real package.

Both functions accept a `checks =` argument. When supplied, they skip
the `rcmdcheck()` call and parse the existing object. This lets you
run the check **once** and reuse the result for the whole audit.

```{r}
chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

audit_globals(pkg, checks = chk)
fix_globals(pkg,   checks = chk, write = TRUE)
```

The other audits do not need a check at all:

| Audit                | Needs `R CMD check`?  | Notes                                  |
|----------------------|-----------------------|----------------------------------------|
| `audit_tags()`       | no                    | static via roxygen2                    |
| `audit_ascii()`      | no                    | line-by-line via `stringi::stri_enc_isascii()` |
| `audit_dataset_doc()`| no                    | inspects `data/` and `R/`              |
| `audit_citation()`   | no                    | static parse of `inst/CITATION`        |
| `audit_description()`| no                    | tokenises DESCRIPTION's Description    |
| `audit_dontrun()`    | no                    | line-by-line scan of `man/*.Rd`        |
| `audit_downloads()`  | no                    | AST walk of `R/`, `tests/`, `vignettes/`, `inst/` |
| `audit_globals()`    | **yes** (reusable)    | accepts `checks =`                     |
| `audit_userspace()`  | yes (own pipeline)    | takes file-system snapshots, separate  |
| `audit_check()`      | yes                   | this **is** the check, with CRAN env   |

## Per-issue cheatsheet

### Globals (`no visible binding`)

`audit_globals()` returns a 3-element list of names CRAN flagged:

- `globalVariables` - undeclared variables that need a
  `utils::globalVariables()` declaration.
- `functions` - external functions that need an `@importFrom` line.
- `operators` - NSE tokens, data.table / rlang pronouns (`:=`,
  `.SD`, `.N`, `.data`, `!!`, ...) that also need an `@importFrom`
  rather than a `globalVariables()` entry.

`fix_globals(write = TRUE)` writes the `globalVariables` set into
`R/globals.R` (merging with whatever names that file already
declares - the freshly detected names are added on top of the
existing ones, deduplicated). The operators section is printed on
stdout so you wire each one into a roxygen `@importFrom` block by
hand:

```{r}
audit_globals(pkg, checks = chk)
fix_globals(pkg, checks = chk, write = TRUE)
```

When a token is exported by more than one candidate package
(e.g. `:=` is exported by both data.table and rlang), every
candidate is listed and you pick one consciously - no silent
guessing.

Without `write = TRUE`, `fix_globals()` only prints both blocks to
copy-paste.

### Missing roxygen tags

`audit_tags()` flags exported functions without `@return` and
documented internals without `@noRd`. Read-only - no automatic fix
because adding accurate `@return` text needs a human:

```{r}
audit_tags(pkg)
```

### Non-ASCII characters

`audit_ascii()` walks `R/`, `tests/`, `vignettes/`, `man/`,
`DESCRIPTION` and `NAMESPACE` line-by-line and reports every line
containing non-ASCII characters (columns: `file`, `line`, `text`,
`n_tokens`).
`fix_ascii()` then rewrites them - using the parser AST so each
token is rewritten per its context: string literals become `\uXXXX`
escapes, comments and roxygen get `Latin-ASCII` transliteration.
**It dry-runs by default**:

```{r}
audit_ascii(pkg)

# Always preview which files would change. fix_ascii() returns
# invisibly - capture the result to inspect per-file detail
# (path, changed, n_tokens, n_chars).
preview <- fix_ascii(pkg, dry_run = TRUE)
preview[preview$changed, ]

# Apply when you've reviewed the proposed rewrite.
fix_ascii(pkg, dry_run = FALSE)
```

Identifiers with non-ASCII characters are refused by default
(renaming would be a breaking change).

### Undocumented datasets

`audit_dataset_doc()` lists every `data/*.rda` without a matching
roxygen block under `R/`. `fix_dataset_doc()` writes a documentation
skeleton (one call per dataset, takes the dataset name):

```{r}
audit_dataset_doc(pkg)

fix_dataset_doc("my_data",
                pkg = pkg,
                description = "Description of my_data",
                source = "Internal")
```

The skeleton is editable: you fill in the description / source /
column-by-column comments by hand, then re-run
`devtools::document()`.

### Old-style `inst/CITATION`

`audit_citation()` parses `inst/CITATION` statically (no `eval()`)
and surfaces every call to `personList()`, `as.personList()` or
`citEntry()` that CRAN rejects on submission with
`Package CITATION file contains call(s) to old-style ...`. It
returns a tibble with `call`, `line` and a one-line `suggestion`
for the modern equivalent (`c()` on `person()` objects;
`bibentry()` instead of `citEntry()`):

```{r}
audit_citation(pkg)
```

Read-only - rewriting a CITATION file usually needs editorial
judgment, so there is no automated `fix_citation()`.

### Unquoted package names in `Description`

CRAN incoming pretest emits
`Package names should be quoted in the Description field` when a
package name (or any software name) appears in the `Description`
field of `DESCRIPTION` without surrounding single quotes.

`audit_description()` reads the `Description` field, tokenises it,
and surfaces every word that matches an installed package name yet
is not wrapped in single quotes. The package's own name is
intentionally skipped, and so are compound forms like
`dplyr-style` or `httr2-based` (a hyphen on either side disqualifies
the token from being a standalone package reference). Returns a
tibble with `word`, `position` and `suggestion`:

```{r}
audit_description(pkg)
```

Read-only - the fix is editorial (decide whether each hit is a
real package reference or a coincidental word, then wrap with
single quotes).

### Network / download calls

CRAN policy: package code that downloads files or hits the network
at install or runtime must degrade gracefully when the network is
unavailable (offline build farms, sandboxed CI, locked-down user
environment). Common rejection causes: downloads from inside
`.onLoad()`, `.onAttach()`, vignettes or examples that have no
`tryCatch()` / `skip_if_offline()` / `\dontrun{}` guard.

`audit_downloads()` walks `R/`, `tests/`, `vignettes/` and `inst/`,
parses each file, and surfaces every call to a known download or
HTTP function: `download.file()`, `httr::GET()`, `httr2::req_perform()`,
`curl::curl_download()`, etc. The call site (file + line) is paired
with a one-line suggestion. Detection is purely static, so a
user-defined function that shadows a downloader name
(`download.file <- function(...) { ... }`) does not trigger a false
positive on the definition site - only call sites are flagged:

```{r}
audit_downloads(pkg)
```

Read-only - the fix is editorial: decide for each call whether the
right CRAN-safe pattern is `tryCatch()` (continue on offline),
`testthat::skip_if_offline()` (skip the test), or `\dontrun{}` (drop
the example from the test surface).

### `\dontrun{}` blocks in examples

CRAN policy is that `\dontrun{}` should only wrap example code
that genuinely cannot be executed (missing API key, missing system
dependency, side effect on the user's filespace). Otherwise prefer
`\donttest{}`, which still gets exercised by
`R CMD check --run-donttest` but is skipped by default.

`audit_dontrun()` walks `man/*.Rd` line-by-line and surfaces every
`\dontrun{}` opener (commented-out `% \dontrun{` mentions are
ignored), with the source Rd file, the documented topic, the line
number and a one-line suggestion. Read-only - the call is your
review checklist:

```{r}
audit_dontrun(pkg)
```

## Minimal end-to-end on a fake package

`create_example_pkg()` builds a fake package that deliberately trips
each audit. The two `with_*` flags below activate the non-ASCII and
undocumented-dataset fixtures so every audit has something to
surface:

```{r}
pkg <- create_example_pkg(with_nonascii = TRUE,
                          with_undocumented_data = TRUE)

chk <- rcmdcheck::rcmdcheck(pkg, args = "--as-cran")

audit_tags(pkg)             # @return / @noRd issues
audit_ascii(pkg)             # accents in comments / strings
audit_dataset_doc(pkg)       # data/demo_dataset.rda has no doc
audit_citation(pkg)          # old-style personList() / citEntry()
audit_dontrun(pkg)           # \dontrun{} blocks in examples
audit_description(pkg)       # unquoted package names in Description
audit_downloads(pkg)         # network call sites to review for offline-safe guards
audit_globals(pkg, checks = chk)

fix_globals(pkg, checks = chk, write = TRUE)
fix_ascii(pkg, dry_run = FALSE)
fix_dataset_doc("demo_dataset", pkg = pkg,
                description = "A small demo dataset",
                source = "Generated by create_example_pkg()")
```

After applying the fixes, re-run the check (the package state has
changed, so a new `rcmdcheck()` is needed) and confirm 0 / 0 / 0.

## Next step: pre-submission gates

When the dev-time audits above are clean, run the heavier gates that
have their own pipeline and **cannot** reuse `chk`:

- `audit_check()` - `R CMD check` with the full CRAN incoming
  environment.
- `audit_userspace()` - checks that tests / examples / vignettes
  leave no files behind.

Both are documented in
`vignette("pre-submission-gates", package = "checkhelper")`.
