The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

SCM Case Study on Alcohol Consumption

Introduction

Synthetic control provides a powerful approach at establishing a counterfactual for policy changes. One such example of a policy change is in Russian surrounding alcohol consumption. A recent Guardian article1 chronicled the fall of alcohol consumption in Russia based policy changes made by Mr Putin’s governments. These policy changes all occurred roughly around 2006 and included reclassifying what counted as a “foodstuff” and levying of hefty taxes on alcohol (Levintova 2007). Based on recent trends, observers say that these policies are having the desired effect. But how much of this drop in alcohol consumption is due to the policy implementation and not changes in consumer preference?2

This underlines the need for a counterfactual. We would like to know what would have been the alcohol consumption in Russia but for these policy changes. Here is where Synthetic Control (and the SCtools package) can help us answer these questions.

Implementing Synthetic Controls

In order to understand the policy impact on Russian Alcohol Consumption, we will first load the SCtools, Synth, and some of the tidyverse packages.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(SCtools)
#> Loading required package: future
library(Synth)
#> ##
#> ## Synth Package: Implements Synthetic Control Methods.
#> ## See https://web.stanford.edu/~jhain/synthpage.html for additional information.
theme_set(theme_minimal())

After those are loaded, we can look at the alcohol data that is inside of the SCtools package. We can build the graphic in the Guardian using these data to see the trend in alcohol consumption.

alcohol %>% 
  filter(country_name == "Russian Federation") %>% 
    ggplot(aes(year, consumption))+
    geom_line()+
    geom_vline(xintercept = 2006, color  = "orange")+
    labs(
        title = "Per Capita Alcohol Consumption in the Russian Federation Since 1990",
        subtitle = "The Russian government began a series of policy changes in 2003\nto reduce alcohol consumption",
        y = "Per Capita Consumption (L/person)",
        x = NULL,
        caption = "Data: World Health Organization"
    )
Trend of Alcohol Consumption in Russia

Trend of Alcohol Consumption in Russia

We can also examine some of the trends in our predictors.

alcohol %>% 
  filter(country_name == "Russian Federation") %>% 
    select(year, consumption,labor_force_participation_rate:manufacturing) %>% 
    tidyr::gather(predictor, value, -year) %>% 
    ggplot(aes(year, value))+
    geom_line()+
        facet_wrap(~predictor, scales = "free")
Trends in Predictors in Russian Federation

Trends in Predictors in Russian Federation

One feature of SCM is selecting donor states. These donor states help us to generate our synthetic control (synthetic version of Russian in the absence of the policy change). There are several key assumptions regarding choosing donor states including

In the absence of evidence of other changes in other states, we can look at the potential donor states graphically:

p1 <- alcohol %>% 
    mutate(my_color = ifelse(country_code == "RUS", "Russia", "Other")) %>% 
    ggplot(aes(year, consumption, group = country_code, color = my_color))+
    geom_line() +
    scale_color_manual(values = c("grey", "black", "white"))+
    ylim(5,15)+
    xlim(1990,2005)
p1
#> Warning: Removed 3579 rows containing missing values or values outside the scale range
#> (`geom_line()`).
Review of Possible Donors

Review of Possible Donors

When we look at the potential donors we see the usual suspects: France, United States, as well as some of the neighbouring Baltic states. This makes some intuitive sense. Baltic states share similar cultural habits. SCM is insensitive to having donors that are not neighbouring, which will allow us to include countries like Great Britain and the United States in the donor pool.3

Experimental Design

Now that we have an idea of the donors, we can build the pool.

comparison_states <- c("USA", "UK", "UKR", "KAZ",
                                             "GBR", "ESP", "DEU", "POL",
                                             "FIN", "FRA", "GRC", "IRL",
                                             "LTU", "ROU", "GEO", "MDA",
                                             "SWE", "BEL", "BLR", "KGZ",
                                             "CZE", "MEX", "SVN")

control_ids <- alcohol %>% 
    select(country_code,country_num) %>% 
    filter(country_code %in% comparison_states) %>% 
    distinct() %>% 
    pull(country_num)

Now we will use the dataprep function from Synth to format our data.

dataprep.out<-dataprep(
        foo = as.data.frame(alcohol),
        predictors = c("labor_force_participation_rate",
                       "inflation",
                       "mobile_cellular_subscriptions",
                       "manufacturing"),
        predictors.op = "mean",
        dependent = "consumption",
        unit.variable = "country_num",
        time.variable = "year",
        treatment.identifier = 142,
        controls.identifier = control_ids,
        time.predictors.prior = c(1991:2005),
        time.optimize.ssr = c(2000:2005),
        special.predictors = list(
      list("consumption", 2000:2005 ,"mean")),
        unit.names.variable = "country_code",
        time.plot = 1992:2015
    )
#> 
#>  Missing data- treated unit; predictor: inflation ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: inflation ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1993 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1994 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1995 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1996 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1997 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1998 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1999 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 2000 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 2001 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data - control unit: 16 ; predictor: inflation ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data - control unit: 16 ; predictor: inflation ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.

There are some missing values, but that should be ok. Now we will run the synth function to generate our synthetic controls.

out <- synth(dataprep.out, Sigf.ipop = 3)

After running the SCM, we can look at the weights from the donors and see if we see anything that is surprising.

solution <- out$solution.w %>% 
    as.data.frame()

solution$country_num <- rownames(solution)

solution %>% 
    mutate(country_num = as.numeric(country_num)) %>% 
    left_join(alcohol %>% 
    select(country_code,country_num) %>% 
    filter(country_code %in% comparison_states) %>% 
        distinct(), by = "country_num") %>% 
    ggplot(aes(reorder(country_code, w.weight), w.weight))+
    geom_col()+
    coord_flip()+
    labs(
        title = "Donor Weights by Country",
        y = NULL,
        x = "Weight"
    )

Unsurprisingly, we can see that Kazakstan and Lithuania contribute to counterfactual Russia, however, interestingly Great Britain and the Chzech Republic are amongst the top donors to the counterfactual. This highlights one of the powerful features of Synthetic Control - the donors do not need to be geographically connected, Let’s continue with our analysis.

Now we can plot our real Russia vs our Synthetic Russia using the plot.path function from Synth. Here we see that the synthetic control matches 2000-2005 fairly well (which is what we specified to optimize), but there is some opportunity to more closely match actual Russia.

path.plot(synth.res = out, dataprep.res = dataprep.out, 
                    Xlab = "per Capita Alcohol Consumption", )
Synthetic Russia vs Actual Russia for Alcohol Consumption per Capita

Synthetic Russia vs Actual Russia for Alcohol Consumption per Capita

We can also look at the delta between our real and synthetic Russia.

delta_out <- (dataprep.out$Y1plot - (dataprep.out$Y0plot %*% out$solution.w)) %>% 
    as.data.frame()

delta_out$year <- rownames(delta_out)

delta_out%>% 
    knitr::kable(caption = "Difference in Alcohol Consumption Between Synthetic and Actual Russia", digits = 2)
Difference in Alcohol Consumption Between Synthetic and Actual Russia
142 year
1992 -3.28 1992
1993 -1.71 1993
1994 -1.24 1994
1995 1.49 1995
1996 -0.41 1996
1997 -0.60 1997
1998 -0.10 1998
1999 0.77 1999
2000 -0.50 2000
2001 -0.44 2001
2002 -0.06 2002
2003 0.34 2003
2004 0.35 2004
2005 0.53 2005
2006 1.02 2006
2007 1.19 2007
2008 1.27 2008
2009 1.00 2009
2010 0.76 2010
2011 0.86 2011
2012 0.92 2012
2013 0.09 2013
2014 -1.00 2014
2015 -1.46 2015

It appears that the policy took a while to take effect, with a decrease in alcohol consumed per capita beginning in 2014. However, it must be mentioned that this isn’t the “40%” that the Guardian reports and that one may glean from the WHO graph at first glance.

The Next Steps

The next steps in a SCM controls analysis is to permute the data set to understand the sensitivity of our Synthetic controls to assess if the results observed are plausible (e.g. are the results we obtained due from chance, or reflect the true effect). The SCtools provides the generate.placebos function to automate this process. This includes the option to parallelise the operation (which greatly speeds up processing time).

placebo <- generate.placebos(dataprep.out = dataprep.out,
                             synth.out = out, strategy = "multicore")

Now that we have our placebo object, we can represent it graphically with plot_placebos. Here we can see our donors (control unit) and our actual treated group.

plot_placebos(placebo)
Placebo Plot for Control Units

Placebo Plot for Control Units

Equally important we can test the Mean Squared Prediction Error (MSPE). Additionally, because we generated the controls for the placebos, we can see how extreme our values are and thus generate a pseudo p-value to see if our results are significant. Now, it should be mentioned that this test is relatively underpowered as it looks only at the donor states.

test_out <- mspe.test(placebo)
test_out$p.val
#> [1] 0.6521739

Why look at single number summary, when you can look at a plot? With mspe.plot we can visualise the ratios of MSPE for each donor. Here we see very little difference in Russia, which supports our higher p-value.

mspe.plot(tdf = placebo)
Mean Squared Prediction Error Ratios for Russia and Donors

Mean Squared Prediction Error Ratios for Russia and Donors

Conclusion

In the case of this study, it doesn’t look like the policy had an impact. However, this brings into question the model specification. The predictors in the model did not necessarily generate the best pre-treatment MSPE. Different predictors may help us better construct a Synthetic Russia. Additionally, this model could be tuned by including some lagged predictors. We won’t go into details regarding this process, but with SCtools, you now have tools to help you with your model inference and tuning.

Levintova, M. 2007. “Russian Alcohol Policy in the Making.” Alcohol and Alcoholism 42 (5): 500–505. https://doi.org/10.1093/alcalc/agm040.

  1. Available at https://www.theguardian.com/world/2019/oct/01/russian-alcohol-consumption-down-40-since-2003-who↩︎

  2. Yes, we all know as good economists that as goods get more expensive, consumers will buy less of them. However, how much less is another question.↩︎

  3. I would recommend using the plotly package which allows you to hover over the different lines and see the country codes.↩︎

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.