This package provides two geoms, geom_ridgeline
and geom_joy
. The former takes height values directly to draw ridgelines, and the latter first estimates data densities and then draws those using ridgelines.
The geom geom_ridgeline
can be used to draw lines with a filled area underneath.
library(ggplot2)
library(ggjoy)
data <- data.frame(x = 1:5, y = rep(1, 5), height = c(0, 1, 3, 4, 2))
ggplot(data, aes(x, y, height = height)) + geom_ridgeline()
Negative heights are allowed, but are cut off unless the min_height
parameter is set negative as well.
# for side-by-side plotting
library(cowplot); theme_set(theme_gray())
data <- data.frame(x = 1:5, y = rep(1, 5), height = c(0, 1, -1, 3, 2))
plot_base <- ggplot(data, aes(x, y, height = height))
plot_grid(plot_base + geom_ridgeline(),
plot_base + geom_ridgeline(min_height = -2))
Multiple ridgelines can be drawn at the same time. They will be ordered such that the ones drawn higher up are in the background. When drawing multiple ridgelines at once, the group
aesthetic must be specified so that the geom knows which parts of the data belong to which ridgeline.
d <- data.frame(x = rep(1:5, 3), y = c(rep(0, 5), rep(1, 5), rep(2, 5)),
height = c(0, 1, 3, 4, 0, 1, 2, 3, 5, 4, 0, 5, 4, 4, 1))
ggplot(d, aes(x, y, height = height, group = y)) + geom_ridgeline(fill = "lightblue")
It is also possible to draw ridgelines with geom_joy
if we set stat="identity"
. In this case, the heights are automatically scaled such that the highest ridgeline just touches the one above at scale=1
.
ggplot(d, aes(x, y, height = height, group = y)) +
geom_joy(stat = "identity", scale = 1)
The geom geom_joy
calculates density estimates from the provided data and then plots those, using the ridgeline visualization. The height
aesthetic does not need to be specified in this case.
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy()
There is also geom_joy2
, which is identical to geom_joy
except it uses closed polygons instead of ridgelines for drawing.
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy2()
The grouping aesthetic does not need to be provided if a categorical variable is mapped onto the y axis, but it does need to be provided if the variable is numerical.
# modified dataset that represents species as a number
iris_num <- transform(iris, Species_num = as.numeric(Species))
# does not work, causes error
# ggplot(iris_num, aes(x = Sepal.Length, y = Species)) + geom_joy()
# works
ggplot(iris_num, aes(x = Sepal.Length, y = Species_num, group = Species_num)) + geom_joy()
Trailing tails can be cut off using the rel_min_height
aesthetic. This aesthetic sets a percent cutoff relative to the highest point of any of the density curves. A value of 0.01 usually works well, but you may have to modify this parameter for different datasets.
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy(rel_min_height = 0.01)
The extent to which the different densities overlap can be controlled with the scale
parameter. A setting of scale=1
means the tallest density curve just touches the baseline of the next higher one. Smaller values create a separation between the curves, and larger values create more overlap.
# scale = 0.9, not quite touching
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy(scale = 0.9)
# scale = 1, exactly touching
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy(scale = 1)
# scale = 5, substantial overlap
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy(scale = 5)
The scaling is calculated separately per panel, so if we facet-wrap by species each density curve exactly touches the next higher baseline. (This can be disabled by setting panel_scaling = FALSE
.)
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_joy(scale = 1) + facet_wrap(~Species)
Joyplots tend to require some theme modifications to look good. Most importantly, the y-axis labels should be vertically aligned so that they are flush with the axis ticks rather than vertically centered. The ggjoy package provides a theme theme_joy
that does this and a few other theme modifications.
ggplot(iris, aes(x = Sepal.Length, y = Species)) + geom_joy() + theme_joy()
However, without any further modifications, there are still a few issues with this plot. First, the ridgeline for the virginica species is slightly cut off at the very top point. Second, the space between the x and y axis labels and the ridgelines is too large. We can fix both issues using the expand
option for the axis scales.
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_joy() + theme_joy() +
scale_x_continuous(expand = c(0.01, 0)) +
scale_y_discrete(expand = c(0.01, 0))
By default, theme_joy
adds a grid, but the grid can be switched off when not needed.
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_joy() + theme_joy(grid = FALSE) +
scale_x_continuous(expand = c(0.01, 0)) +
scale_y_discrete(expand = c(0.01, 0))
Temperatures in Lincoln, Nebraska. Modified from a blog post by Austin Wehrwein.
ggplot(lincoln_weather, aes(x = `Mean Temperature [F]`, y = `Month`)) +
geom_joy(scale = 3, rel_min_height = 0.01) +
scale_x_continuous(expand = c(0.01, 0)) +
scale_y_discrete(expand = c(0.01, 0)) +
labs(title = 'Temperatures in Lincoln NE',
subtitle = 'Mean temperatures (Fahrenheit) by month for 2016\nData: Original CSV from the Weather Underground') +
theme_joy(font_size = 13, grid = T) + theme(axis.title.y = element_blank())
Evolution of movie lengths over time. Data from the IMDB, as provided in the ggplot2movies package.
library(ggplot2movies)
ggplot(movies[movies$year>1912,], aes(x = length, y = year, group = year)) +
geom_joy(scale = 10, size = 0.25, rel_min_height = 0.03) +
theme_joy() +
scale_x_continuous(limits=c(1, 200), expand = c(0.01, 0)) +
scale_y_reverse(breaks=c(2000, 1980, 1960, 1940, 1920, 1900), expand = c(0.01, 0))