The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
If you use covr, you know that 80% coverage means 80% of
your lines ran during tests. What it does not mean is that those tests
would catch a bug.
Here is a concrete example. This function has a subtle operator bug:
And this test achieves 100% line coverage:
test_that("above_threshold works", {
result <- above_threshold(c(1, 5, 10), 3)
expect_true(is.logical(result))
expect_length(result, 3)
})The function runs. The test passes. Coverage is 100%. But
> could be replaced with >=,
<, or == and this test would still pass —
because it never checks the actual values, only the type and length.
Coverage measures execution. Mutation testing measures detection.
A mutant is a copy of your source code with one small, deliberate change — an operator swap, a flipped condition, a replaced constant. The idea is to simulate the kind of mistake a developer might actually make.
For the function above, muttest could generate mutants
like:
# mutant 1: > → >=
above_threshold <- function(x, threshold) {
x >= threshold
}
# mutant 2: > → <
above_threshold <- function(x, threshold) {
x < threshold
}Your test suite runs against each mutant. If the tests fail, the mutant is killed — your tests noticed the change. If the tests pass, the mutant survived — your tests are blind to that kind of bug.
| Outcome | Meaning |
|---|---|
| Killed | At least one test failed. Your tests caught this mutation. |
| Survived | All tests passed. Your tests did not detect this change. |
| Error | The mutated code caused an unexpected runtime error. |
Survivors are the interesting ones. Each surviving mutant points to a specific gap: a mutation your tests cannot distinguish from the original code. That is a candidate for a stronger test.
Mutation Score = (Killed Mutants / Total Mutants) × 100%
No project needs a perfect score on every file. The goal is to use the score directionally: find the files where survivors cluster, and strengthen those tests first.
Many R programmers reach for LLMs (ChatGPT, Claude, Copilot) to write tests. This can be a useful shortcut — LLMs write syntactically correct tests quickly, and for boilerplate cases they can work well.
LLMs might produce assertions that are easy to satisfy — tests that pass but don’t deeply verify correctness:
# Typical LLM output for above_threshold():
test_that("above_threshold returns logical vector", {
expect_true(is.logical(above_threshold(c(1, 5), 3)))
})
test_that("above_threshold handles length", {
expect_equal(length(above_threshold(1:5, 2)), 5)
})Both tests pass. Both would pass against every mutant of
above_threshold. These tests document the shape of the
output but say nothing about its correctness — a pattern that can appear
in LLM-generated tests.
This is not a criticism of LLMs. But it means mutation testing is a useful way to check how strong those tests actually are:
LLM-generated tests need external validation just as much as human-written tests do.
Mutation testing provides that validation. Run muttest
on any file where the tests were AI-generated. A low score does not mean
the LLM did a bad job — it means you now know exactly where to add
better assertions.
Mutation testing is most valuable when:
These tools answer different questions and complement each other:
| Tool | Question answered |
|---|---|
covr |
Which lines does my test suite execute? |
muttest |
Which bugs would my test suite detect? |
A practical workflow: use covr to find untested code,
then use muttest on the covered code to find weakly-tested
logic. High coverage + high mutation score = genuinely robust tests.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.