The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Mining Causal Association Rules

Introduction

Standard association rules (or implications in Formal Concept Analysis) identify correlations between attributes (\(A \to B\)). However, correlation does not imply causation. A rule \(A \to B\) might be strong simply because both \(A\) and \(B\) are caused by a third confounding variable \(C\).

The fcaR package now supports Mining Causal Association Rules, implementing a method to identify likely causal relationships by controlling for confounding variables. This considers the “Fair Odds Ratio” calculated on a “Fair Data Set” of matched pairs.

library(fcaR)

The Approach

To check if \(A \to B\) is causal, the algorithm:

Identifies potential confounders (controlled variables) that are not part of the premise \(A\), the conclusion \(B\), or variables irrelevant to \(B\).
Constructs a Fair Data Set by finding matched pairs of objects. Two objects \((u, v)\) form a matched pair if:
- They have the same values for all controlled variables.
- One object has the premise (\(u\) has property \(A\)).
- The other object does not (\(v\) does not have property \(A\)).
Computes the Fair Odds Ratio on these matched pairs.
Considers the rule “Causal” if the lower bound of the Confidence Interval for the Fair Odds Ratio is greater than 1.

Example 1: Direct Causality

Let’s consider a simple case where Treatment causes Recovery.

# 100 Patients
# 50 Treated, 50 Untreated
# Treated: 90% Recovery
# Untreated: 20% Recovery

n <- 100
treated <- c(rep(1, 45), rep(1, 5), rep(0, 10), rep(0, 40))
recovered <- c(rep(1, 45), rep(0, 5), rep(1, 10), rep(0, 40))

I <- matrix(c(treated, recovered), ncol = 2)
colnames(I) <- c("Treatment", "Recovery")

fc <- FormalContext$new(I)

We can mine for causal rules targeting “Recovery”:

rules <- fc$find_causal_rules(
    response_var = "Recovery",
    min_support = 0.1,
    confidence_level = 0.95
)

rules$print()
#> Rules set with 1 Rules.
#> Rule 1: {Treatment} -> {Recovery} [support = 0.5, confidence = 0.9,
#>   fair_odds_ratio = 35, ci_lower = 4.8, ci_upper = 255.47]

The algorithm correctly identifies “Treatment” as a cause for “Recovery”.

Example 2: Simpson’s Paradox (Spurious Correlation)

A classic example where standard association rules fail is Simpson’s Paradox, or confounding variables creating spurious correlations.

Consider a dataset relating Ice Cream consumption and Drowning. They are highly correlated because both increase during hot weather (the Heat variable).

Heat causes Ice Cream.
Heat causes Drowning.
Ice Cream does not cause Drowning.

However, a naive frequent itemset mining might find Ice Cream -> Drowning.

Let’s simulate this:

set.seed(123)
n <- 200
# Heat: 50% Hot, 50% Cold
heat <- c(rep(1, 100), rep(0, 100))

# Ice Cream: Strongly dependent on Heat (80% if Hot, 20% if Cold)
ic <- numeric(200)
ic[1:100] <- rbinom(100, 1, 0.8)
ic[101:200] <- rbinom(100, 1, 0.2)

# Drowning: Strongly dependent on Heat (80% if Hot, 20% if Cold)
drown <- numeric(200)
drown[1:100] <- rbinom(100, 1, 0.8)
drown[101:200] <- rbinom(100, 1, 0.2)

I <- matrix(c(heat, ic, drown), ncol = 3)
colnames(I) <- c("Heat", "IceCream", "Drowning")

fc_spurious <- FormalContext$new(I)

If we just looked at correlations, IceCream and Drowning would be correlated. But find_causal_rules controls for confounders.

When testing IceCream -> Drowning: - It controls for Heat. - It compares days with same Heat (Hot vs Hot, Cold vs Cold) but different Ice Cream consumption. - Within “Hot” days, Ice Cream consumption is random (w.r.t Drowning causal mechanism) and doesn’t increase drowning risk further. - The odds ratio should be near 1.

causal_rules <- fc_spurious$find_causal_rules(
    response_var = "Drowning",
    min_support = 0.5
)

# Should contain "Heat" but NOT "IceCream"
print(causal_rules)
#> Rules set with 1 Rules.
#> Rule 1: {Heat} -> {Drowning} [support = 0.5, confidence = 0.81, fair_odds_ratio
#>   = 27, ci_lower = 3.67, ci_upper = 198.69]

As expected, the algorithm identifies Heat as the true cause and rejects the spurious Ice Cream association.

Conclusion

The find_causal_rules method provides a powerful tool to go beyond simple association and identify rules that are robust to confounding, providing a step towards causal inference in Concept Analysis. It returns a RuleSet object with quality metrics including Support, Confidence, and the Fair Odds Ratio with its Confidence Interval.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.