The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to PlotNormTest

Introduction

This vignette shows how to use the PlotNormTest package to access the normality assumption of a multivariate dataset.

Basic example: The Cork data set

library(PlotNormTest)
cork <- matrix(c(
  72, 66, 76, 77,
  60, 53, 66, 63,
  56, 57, 64, 58,
  41, 29, 36, 38,
  32, 32, 35, 36,
  30, 35, 34, 26,
  39, 39, 31, 27,
  42, 43, 31, 25,
  37, 40, 31, 25,
  33, 29, 27, 36,
  32, 30, 34, 28,
  63, 45, 74, 63,
  54, 46, 60, 52,
  47, 51, 52, 43,
  91, 79, 100, 75,
  56, 68, 47, 50,
  79, 65, 70, 61,
  81, 80, 68, 58,
  78, 55, 67, 60,
  46, 38, 37, 38,
  39, 35, 34, 37,
  32, 30, 30, 32,
  60, 50, 67, 54,
  35, 37, 48, 39,
  39, 36, 39, 31,
  50, 34, 37, 40,
  43, 37, 39, 50,
  48, 54, 57, 43
), nrow = 28, ncol = 4, byrow = T)
colnames(cork) <- c("North", "East", "South", "West")

head(cork)
#>      North East South West
#> [1,]    72   66    76   77
#> [2,]    60   53    66   63
#> [3,]    56   57    64   58
#> [4,]    41   29    36   38
#> [5,]    32   32    35   36
#> [6,]    30   35    34   26

Marginal Univariate Normality Assessment

This section illustration how to use PlotNormTest to assess univariate normality assumption. We will perform the assessment for each variables (North, East, South, West) of the Cork dataset.

Using Score function

In score plot, evidence of non-normality is curves different from the \(45^\circ\) line \(y = x\).

library(ggplot2)
# Score function
lapply(1:4,  FUN = function(mycol) {
  re <- PlotNormTest::cox(matrix(sort(cork[, mycol])), x.dist = 0.0001)
  a <- re$a[, 1]
  p <- ggplot(data.frame(x = re$x, a = a), aes(x = x, y = a)) + 
    geom_point(color = "steelblue3", shape = 19, size = 1.5) + 
    ggtitle(paste("Score plot: ", colnames(cork)[mycol])) +
    coord_fixed() + xlab("y")+ 
    ylab("Score function") + 
    theme_bw() + 
    theme(aspect.ratio = 1/1, panel.grid = element_blank(),
          axis.line = element_line(colour = "black"), 
          axis.text=element_text(size=12),
          axis.title=element_text(size=14,face="bold"), 
          legend.background = element_rect( 
            size=0.5, linetype="solid"), 
          legend.text = element_text(size=12))
  p
  
}
)
#> Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
#> ℹ Please use the `linewidth` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> [[1]]

#> 
#> [[2]]

#> 
#> [[3]]

#> 
#> [[4]]

Using T3 plot

In \(T_3\) and \(T_4\), evidence of non-normality is either curves crossing the \(1 - \alpha = 95\%\) confidence region bands or curve with high slopes.

# T3 
lapply(1:4,  FUN = function(mycol) {
  x <- cork[, mycol]
  par(cex.axis = 1.2, cex.lab = 1.2,
               mar = c(4, 4.2, 2,1), cex.main = 1.2)
  PlotNormTest::dhCGF_plot1D(x, method = "T3") 
  namex <- colnames(cork)[mycol]
  title(main = bquote(T[3]~"plot: "~.(namex)), adj = 0)
}
)

#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL

Using T4 plot

# T4
 par(cex.axis = 1.2, cex.lab = 1.2,
             mar = c(4, 4.2, 2,1), cex.main = 1.2)
lapply(1:4,  FUN = function(mycol) {
  x <- cork[, mycol]
  PlotNormTest::dhCGF_plot1D(x, method = "T4") 
  namex <- colnames(cork)[mycol]
  title(main = bquote(T[4]~"plot: "~.(namex)), adj = 0)
}
)

#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL

Multivariate Normality Assessment

From multivariate normality to univariate normality

Under the assumption that \(n = 28\) samples Cork dataset follows a multivariate normal distribution in \(p = 4\), standardization around sample mean and sample variance results in an \(\tilde{n} = 28 \times 4 = 112\) sample approximately from \(N(0,1)\). Hence normality evidence can be found via assessment of normality of this univariate sample. From this, any univariate normality testing method can be applied.

Results below show weak evidence of non-normality, as score plot does not form a straight line and \(T_3\) and \(T_4\) plots show curves in the right tail. However as the weak nornality assumption here is ensured by large sample size, with \(n = 28\), results may not be very convincing. Hence for those small sample, \(MT_3\) and \(MT_4\) plots below should be used.

df <- Multi.to.Uni(cork)
# Cox
score_plot1D(df$x.new, ori.index = df$ind, x.dist = .001)$plot +
  theme(legend.position = "none")+ xlab("y") +
  ggtitle("Score plot")+
  ylab("Score function")


#T3 and T4
par(cex.axis = 1.2, cex.lab = 1.2, mar = c(4, 4.2, 2,1), cex.main = 1.2)
PlotNormTest::dhCGF_plot1D(df$x.new, method = "T3")

par(cex.axis = 1.2, cex.lab = 1.2, mar = c(4, 4.2, 2,1), cex.main = 1.2)
dhCGF_plot1D(df$x.new, method = "T4")

MT3 plot

Accessing multivariate normality assumption of the Cork data set directly via plots of derivatives of cumlant generating functions, shown in \(MT_3\) and \(MT_4\) plot.

The two figures from \(MT_3\) and \(MT_4\) plots support multivariate normality assumption.

par(cex.axis = 1.2, cex.lab = 1.2, mar = c(4, 4.2, 2,1), cex.main = 1.2)
PlotNormTest::d3hCGF_plot(cork)

#> [1] "accept"

MT4 plot

par(cex.axis = 1.2, cex.lab = 1.2, mar = c(4, 4.2, 2,1), cex.main = 1.2)
PlotNormTest::d4hCGF_plot(cork)

#> [1] "accept"

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.