BayesMallows
: An R Package for Probabilistic Preference Learning with the Mallows Rank ModelThe BayesMallows
package implements methods for Bayesian preference learning with the Mallows rank model, as originally described in Vitelli et al. (2018), and further developed in Asfaw et al. (2016) and Crispino et al. (2018). This vignette describes the usage of the package, starting from the complete data cases, through top-\(k\) rankings, pairwise comparisons, and finally clustering. We refer to the above mentioned papers, as well as the review Liu et al. (2018) for a thorough description of the methods. The necessary methods for data preprocessing, tuning of algorithms, and assessment of the posterior distributions will be described along the way.
We also need some packages for plotting and data wrangling.
Here is an overview of the most used functions. You can read their documentation with ?function_name
, or search for an example in this vignette.
Function Name | Description |
---|---|
compute_mallows |
Compute the posterior distribution of the Bayesian Mallows model. This is the main function of the package. Returns an object of class BayesMallows . |
compute_mallows_mixtures |
Compute multiple Mallows models with different number of mixture components. This is a convenience function for determining the number of mixtures to use. |
sample_mallows |
Sample from the Mallows model. |
plot.BayesMallows |
Quick plots of the posterior densities of parameters of the Mallows model. |
assess_convergence |
Study the convergence of the Markov chain, in order to determine burnin and other algorithm parameters. |
plot_elbow |
Create an elbow plot for comparing models with different number of clusters. |
plot_top_k |
Plot the top-\(k\) rankings. Particularly relevant when the data is in the form of pairwise comparisons. |
assign_cluster |
Compute the cluster assignment of assessors. |
compute_consensus Com |
pute the CP or MAP consensus ranking of the latent ranks. |
compute_posterior_intervals |
Compute Bayesian posterior intervals for the parameters. |
generate_initial_ranking |
Generate an initial ranking, for the case of missing data or pairwise comparisons. |
generate_transitive_closure |
Generate the transitive closure for a set of pairwise comparisons. |
estimate_partition_function |
Estimate the partition function of the Mallows model using either importance sampling or an asymptotic approximation. |
Here is an overview of the example datasets in BayesMallows
. You can read their documentation with ?dataset_name
, or search for an example in this vignette.
Dataset Name | Description |
---|---|
beach_preferences |
Stated pairwise preferences between random subsets of 15 images of beaches, by 60 assessors. |
sushi_rankings |
Complete rankings of 10 types of sushi by 5000 assessors. |
potato_visual |
Complete rankings of 20 potatoes by weight, based on visual inspection, by 12 assessors. |
potato_weighing |
Complete rankings of 20 potatoes by weight, where the assessors were allowed to weigh the potatoes in their hands, by 12 assessors. |
potato_true_ranking |
Vector of true weight rankings for the 20 potatoes in the example datasets potato_visual and potato_weighing . |
We here give an informal review of the Mallows model (Mallows (1957)). The distribution of a ranking \(r \in \mathcal{P}_{n}\) of \(n\) items is modeled as
\[ P(r | \alpha, \rho) = Z_{n}(\alpha)^{-1} \exp\left\{-\frac{\alpha}{n} d(r, \rho)\right\} 1_{\mathcal{P}_{n}}(r), \]
where \(\mathcal{P}_{n}\) is the set of all permutations of \(1, \dots, n\), \(\alpha\) is a scale parameter, \(\rho \in \mathcal{P}_{n}\) is a latent consensus ranking, \(d(\cdot, \cdot)\) is a distance measure, \(1_{S}(\cdot)\) is the indicator function for the set \(S\), and \(Z_{n}(\alpha)\) is the partition function, or normalizing constant.
Given \(N\) observed rankings, \(R_{1}, \dots, R_{N}\), the likelihood of the model is
\[ P(R_{1}, \dots, R_{N} | \alpha, \rho) = Z_{n}(\alpha)^{-N} \exp\left\{-\frac{\alpha}{n} \sum_{j=1}^{N}d(R_{j}, \rho)\right\} \prod_{j=1}^{N} \left\{1_{\mathcal{P}_{n}}(R_{j}) \right\}. \]
The rankings
argument to compute_mallows
is assumed to be a matrix of the form \((R_{1}, R_{2}, \dots, R_{N})^{T}\), i.e., each row contains a ranking and each column is an item.
For \(\alpha\) we use an exponential prior, with density \(\pi(\alpha | \lambda) = \lambda \exp(-\lambda \alpha).\) The rate parameter \(\lambda\) can be set by the user with the lambda
argument to compute_mallows
. For \(\rho\) we assume a uniform prior distribution on \(\mathcal{P}_{n}\), with density \(\pi(\rho) = 1_{\mathcal{P}_{n}}(\rho) /n!.\)
We use a Metropolis-Hastings algorithm for computing the posterior distributions of \(\alpha\) and \(\rho\). We propose \(\alpha\) from a lognormal distribution \(\log \mathcal{N}(\log(\alpha), \sigma_{\alpha}^{2})\). We propose \(\rho\) with a leap-and-shift algorithm, described in detail in Vitelli et al. (2018). The standard deviation \(\sigma_{\alpha}^{2}\) and the leap size can be set by the user with the arguments alpha_prop_sd
and leap_size
to compute_mallows
.
If each assessor \(j\) has ranked a subset of the items \(\mathcal{A}_{j}\), we use data augmentation to fill in the missing ranks. We define the augmented data vectors \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\), and use a uniform prior for each assessor with support \(\mathcal(P)_{n} \setminus R_{j}\), i.e., the set of rankings not already chosen. The Metropolis-Hastings algorithm now alternates between sampling \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\) given the current \(\alpha\) and \(\rho\), and sampling \(\alpha\) and \(\rho\) given the current \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).
When the assessors have stated a set of pairwise comparisons, rather than rankings, we use the same data augmentation ideas as for partial rankings, but the proposal distribution is slightly more complicated in order to ensure that the proposed ranking is compliant with the ordering implied by the pairwise comparisons. In addition, the transitive closure of the stated ordering has to be computed, in order to find all implied orderings. From a user perspective, no new algorithm parameters need to be considered.
When the assessor pool is heterogeneous, one might assume that there exist several latent consensus rankings, \(\rho_{1}, \dots, \rho_{C}\), one for each cluster of assessors. Letting \(z_{1}, \dots, z_{N} \in\{1, \dots, C\}\) assign each assessor to each cluster, the likelihood of the observed rankings is
\[P(R_{1}, \dots, R_{N} | \{\alpha_{c}, \rho_{c}\}_{c=1,\dots,C}, z_{1},\dots,z_{N}) = \prod_{j=1}^{N}\frac{1_{\mathcal{P}_{n}}(\rho)}{Z_{n}(\alpha_{z_{j}})}\exp\left\{ -\frac{\alpha_{z_{j}}}{n} d(R_{j}, \rho_{z_{j}})\right\}. \] For the scale parameters \(\alpha_{1}, \dots, \alpha_{C}\) we assume the exponential prior as before, all with the same rate parameter \(\lambda\). We assume that the cluster labels are a priori distributed according to \(P(z_{1}, \dots, z_{N} | \tau_{1}, \dots, \tau_{C}) = \prod_{j=1}^{N} \tau_{z_{j}}\), where \(\tau_{c}\) is the a priori probability that an assessor belongs to cluster \(c\). For \(\tau_{1}, \dots, \tau_{C}\) we assume the Dirichlet prior \(\pi(1, \dots, C) = \Gamma(\psi C)\Gamma(\psi)^{-C}\prod_{c=1}^{C}\tau_{c}^{\psi - 1}\), where \(\Gamma(\cdot)\) is the gamma function. The user can control the value of \(\psi\) with the psi
argument to compute_mallows
.
Having described the model, we now show some case studies demonstrating the use of the BayesMallows
package. To reduce the computing time, we have typically taken a fairly small number of posterior samples, and rather focused on demonstrating the different methods. In general, we recommend taking a larger number of samples than what we do here, in order to obtain good approximations of the posterior distributions.
BayesMallows
comes with example data described in Liu et al. (2018). A total of 12 assessors were asked to rank 20 potatoes based on their weight. In the first round, the assessors were only allowed to study the potatoes visually, while in the second round, the assessors were also allowed to hold the potatoes in their hands in order to compare them. The data sets are named potato_visual
and potato_weighing
, respectively. The true ordering of the potatoes’ weights are stored in the vector potato_true_ranking
.
The potato_visual
dataset is shown below. The column names P1, …, P20 represent potatoes, and the row names A1, …, A12 represent assessors. The potato_weighing
dataset has a similar structure.
P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | P17 | P18 | P19 | P20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 | 10 | 18 | 19 | 15 | 6 | 16 | 4 | 20 | 3 | 5 | 12 | 1 | 2 | 9 | 17 | 8 | 7 | 14 | 13 | 11 |
A2 | 10 | 18 | 19 | 17 | 11 | 15 | 6 | 20 | 4 | 3 | 13 | 1 | 2 | 7 | 16 | 8 | 5 | 12 | 9 | 14 |
A3 | 12 | 15 | 18 | 16 | 13 | 11 | 7 | 20 | 6 | 3 | 8 | 2 | 1 | 4 | 19 | 5 | 9 | 14 | 10 | 17 |
A4 | 9 | 17 | 19 | 16 | 10 | 15 | 5 | 20 | 3 | 4 | 8 | 1 | 2 | 7 | 18 | 11 | 6 | 13 | 14 | 12 |
A5 | 12 | 17 | 19 | 15 | 7 | 16 | 2 | 20 | 3 | 9 | 13 | 1 | 4 | 5 | 18 | 11 | 6 | 8 | 10 | 14 |
A6 | 10 | 15 | 19 | 16 | 8 | 18 | 6 | 20 | 3 | 7 | 11 | 1 | 2 | 4 | 17 | 9 | 5 | 13 | 12 | 14 |
A7 | 9 | 16 | 19 | 17 | 10 | 15 | 5 | 20 | 3 | 8 | 11 | 1 | 2 | 6 | 18 | 7 | 4 | 14 | 12 | 13 |
A8 | 14 | 18 | 20 | 19 | 11 | 15 | 6 | 17 | 4 | 3 | 10 | 1 | 2 | 7 | 16 | 8 | 5 | 12 | 9 | 13 |
A9 | 8 | 16 | 18 | 19 | 12 | 13 | 6 | 20 | 5 | 3 | 7 | 1 | 4 | 2 | 17 | 10 | 9 | 15 | 14 | 11 |
A10 | 7 | 17 | 19 | 18 | 9 | 15 | 5 | 20 | 3 | 10 | 11 | 1 | 2 | 6 | 16 | 8 | 4 | 13 | 12 | 14 |
A11 | 12 | 16 | 19 | 15 | 13 | 18 | 7 | 20 | 3 | 5 | 11 | 1 | 2 | 6 | 17 | 10 | 4 | 14 | 8 | 9 |
A12 | 14 | 15 | 19 | 16 | 12 | 18 | 8 | 20 | 3 | 4 | 9 | 1 | 2 | 7 | 17 | 6 | 5 | 13 | 10 | 11 |
The compute_mallows
function is the workhorse of BayesMallows
. It runs the Metropolis-Hastings algorithm and returns the posterior distribution of the scale parameter \(\alpha\) and the latent ranks \(\rho\) of the Mallows model. To see all its arguments, please run ?compute_mallows
in the console.
We start by using all the default values of the parameters, so we only need to supply the matrix of ranked items. We use the potato_visual
data printed above.
The argument returned is a list object of class BayesMallows
, which contains a whole lot of information about the MCMC run.
The function assess_convergence
produces plots for visual convergence assessment. We start by studying \(\alpha\), which is the default. The plot is shown below, and looks good enough, at least to begin with.
Next, we study the convergence of \(\rho\). To avoid too complicated plots, we pick 5 items to plot. Again, you can read more about this function by running ?assess_convergence
in the console.
Based on these plots, it looks like the algorithm starts to converge after around 1000 iterations. Discarding the first 1000 iterations as burn-in hence seems like a safe choice. We create a new list element for model_fit
called burnin
, in which this choice is saved. Having done this, we do not have to provide a burnin
argument to subsequent analyses of the posterior distributions.
Once we are confident that the algorithm parameters are reasonable, we can study the posterior distributions of the model parameters using the generic function plot.BayesMallows
.
With a burnin of 1000, the original model_fit
object from the previous subsection has 1000 MCMC samples. The default parameter of plot.BayesMallows
is \(alpha\), so we can study the posterior distribution with the simple statement below.
We can also get the posterior credible intervals for \(\alpha\):
parameter | mean | median | conf_level | hpdi | central_interval |
---|---|---|---|---|---|
alpha | 10.95 | 10.908 | 95 % | [9.759,12.241] | [9.718,12.229] |
Obtaining posterior samples from \(\rho\) is in general harder than for \(\alpha\). Some items tend to be very sticky. We start by plotting the model_fit
object from above. We now have to tell plot.BayesMallows
that we want a plot of parameter = "rho"
and all the items. This gives us posterior the posterior density of all the items.
We can also find the posterior intervals of the latent ranks:
item | parameter | mean | median | conf_level | hpdi | central_interval |
---|---|---|---|---|---|---|
P1 | rho | 10 | 10 | 95 % | [9,11] | [9,11] |
P2 | rho | 17 | 17 | 95 % | [16,18] | [16,18] |
P3 | rho | 19 | 19 | 95 % | [19] | [19] |
P4 | rho | 16 | 16 | 95 % | [16,18] | [16,18] |
P5 | rho | 10 | 10 | 95 % | [9,12] | [9,12] |
P6 | rho | 15 | 15 | 95 % | [15] | [15] |
P7 | rho | 6 | 6 | 95 % | [5,7] | [5,7] |
P8 | rho | 20 | 20 | 95 % | [20] | [20] |
P9 | rho | 3 | 3 | 95 % | [3] | [3] |
P10 | rho | 4 | 4 | 95 % | [4] | [4] |
P11 | rho | 11 | 11 | 95 % | [9,11] | [9,11] |
P12 | rho | 1 | 1 | 95 % | [1] | [1] |
P13 | rho | 2 | 2 | 95 % | [2] | [2] |
P14 | rho | 7 | 7 | 95 % | [6,7] | [6,7] |
P15 | rho | 18 | 18 | 95 % | [17,18] | [17,18] |
P16 | rho | 8 | 8 | 95 % | [8] | [8] |
P17 | rho | 5 | 5 | 95 % | [5,6] | [5,6] |
P18 | rho | 14 | 14 | 95 % | [13,14] | [13,14] |
P19 | rho | 12 | 12 | 95 % | [10],[12] | [10,12] |
P20 | rho | 13 | 13 | 95 % | [13,14] | [13,14] |
Updating \(\alpha\) in each step may not be necessary. With the alpha_jump
argument, we can tell the MCMC algorithm to update \(\alpha\) only every alpha_jump
th iteration.
We should assess convergence again:
Saving a large number of iterations of \(\rho\) gets quite expensive, so compute_mallows
has a rho_thinning
parameter. It specifies that only each rho_thinning
th iteration of \(\rho\) should be saved to memory. We double the number of iterations, while setting rho_thinning = 2
. This gives us the same number of posterior samples.
Please be careful with thinning. In this small data example it is definitely wasteful! Running the same number of iterations without thinning always gives a better approximation of the posterior distribution. Thinning might be useful when you need to run a large number of iterations to explore the space of latent ranks, and the latent ranks from all iterations do not fit in memory. (See, e.g., Gelman et al. (2004) for a discussion of thinning).
We can try to use the Kendall distance instead of the footrule distance.
It is also worth pointing out that compute_mallows
checks if the input data are indeed ranks. Let us try this by manipulating the first row of potato_visual
, giving rank 1 to the first two items:
Now imagine that the assessors in the potato experiment were asked to rank only the top-five heaviest potatoes. We generate these data by retaining only ranks 5 or higher in potato_visual
, setting the rest to NA
.
In Vitelli et al. (2018) it is shown that the unranked items do not effect the MAP estimates of the ranked items in this top-k setting. In this case, there are 8 potatoes which have been ranked, and so the unranked potatoes should have uniform posterior distributions between 9 and 20. However, arriving at these uniform posteriors require a large number of MCMC iterations, so we instead remove these items:
item_ranked <- apply(potato_top, 2, function(x) !all(is.na(x)))
potato_top <- potato_top[, item_ranked, drop = FALSE]
We are now left with this 12 by 8 matrix:
P7 | P9 | P10 | P12 | P13 | P14 | P16 | P17 | |
---|---|---|---|---|---|---|---|---|
A1 | 4 | 3 | 5 | 1 | 2 | NA | NA | NA |
A2 | NA | 4 | 3 | 1 | 2 | NA | NA | 5 |
A3 | NA | NA | 3 | 2 | 1 | 4 | 5 | NA |
A4 | 5 | 3 | 4 | 1 | 2 | NA | NA | NA |
A5 | 2 | 3 | NA | 1 | 4 | 5 | NA | NA |
A6 | NA | 3 | NA | 1 | 2 | 4 | NA | 5 |
A7 | 5 | 3 | NA | 1 | 2 | NA | NA | 4 |
A8 | NA | 4 | 3 | 1 | 2 | NA | NA | 5 |
A9 | NA | 5 | 3 | 1 | 4 | 2 | NA | NA |
A10 | 5 | 3 | NA | 1 | 2 | NA | NA | 4 |
A11 | NA | 3 | 5 | 1 | 2 | NA | NA | 4 |
A12 | NA | 3 | 4 | 1 | 2 | NA | NA | 5 |
The compute_mallows
function automatically recognizes the NA
values as missing ranks, and augments the data, as described in Section 4.1 of Vitelli et al. (2018). Let us try:
Looking at the returned object, we see that any_missing
is TRUE
, so compute_mallows
has correctly detected that there are missing values. We set save_aug = TRUE
so we can get a diagnostic plot for the convergence of the data augmentation. After you are confident that the algorithm converges, you might want to use the default save_aug = FALSE
, because saving the data takes memory.
model_fit
also has assessor-wise acceptance rates for the augmentation.
assessor | acceptance_rate |
---|---|
1 | 0.56 |
2 | 0.56 |
3 | 0.90 |
4 | 0.58 |
5 | 0.59 |
6 | 0.59 |
7 | 0.58 |
8 | 0.54 |
9 | 0.56 |
10 | 0.56 |
11 | 0.51 |
12 | 0.55 |
The tuning of \(\alpha\) and \(\rho\) can be done exactly as described above for the full potato data. Let us now study the convergence of the data augmentation.
Looking back, we see that assessor 1 does not have values for potatoes 14, 16, and 17. Hence, we expect these to jump between ranks 6, 7, and 8, since the 5 ranked items are fixed to values 1,…,5. Below we have plotted potato 14, which fluctuates between 6 and 8. You can confirm that these same happens to 16 and 17.
Assessor 1 has ranks for potatoes 7, 9, 10, 12, and 13, which are fixed. You can confirm that they are fixed by running the following command.
We can study the posterior distributions of \(\alpha\) and \(\rho\) just like we did above.
Like for the complete ranks, we can vary the distance measure used in the Mallows model. Here are command for running with Cayley, Kendall, and Spearman distance.
If the ranks are missing at random, we cannot remove the unranked items as we did for top-\(k\) rankings above. Let us assume that 10 % of the data in potato_visual
have disappeared due to a disk failure. We generate these in the code chunk below:
missing_indicator <- if_else(
runif(nrow(potato_visual) * ncol(potato_visual)) < 0.1,
NA_real_, 1)
potato_missing <- potato_visual * missing_indicator
The data now look like the following:
P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | P17 | P18 | P19 | P20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 | 10 | NA | 19 | NA | 6 | 16 | 4 | 20 | 3 | 5 | 12 | 1 | 2 | 9 | NA | 8 | 7 | 14 | 13 | 11 |
A2 | 10 | NA | 19 | 17 | 11 | NA | 6 | 20 | 4 | 3 | 13 | 1 | 2 | 7 | 16 | 8 | 5 | 12 | 9 | 14 |
A3 | 12 | 15 | 18 | 16 | 13 | 11 | 7 | 20 | 6 | 3 | 8 | 2 | 1 | 4 | 19 | 5 | 9 | 14 | 10 | 17 |
A4 | 9 | 17 | 19 | 16 | 10 | 15 | 5 | 20 | 3 | 4 | 8 | 1 | 2 | 7 | 18 | 11 | 6 | 13 | NA | 12 |
A5 | 12 | 17 | 19 | NA | 7 | 16 | 2 | 20 | 3 | 9 | 13 | 1 | 4 | 5 | 18 | 11 | 6 | 8 | 10 | 14 |
A6 | NA | 15 | 19 | 16 | 8 | 18 | 6 | 20 | 3 | 7 | 11 | 1 | 2 | 4 | 17 | 9 | 5 | 13 | 12 | 14 |
A7 | 9 | 16 | 19 | 17 | 10 | 15 | 5 | 20 | 3 | 8 | 11 | NA | 2 | 6 | 18 | 7 | 4 | 14 | 12 | NA |
A8 | 14 | 18 | 20 | 19 | 11 | 15 | 6 | 17 | 4 | 3 | 10 | 1 | 2 | NA | 16 | 8 | 5 | 12 | 9 | 13 |
A9 | 8 | NA | 18 | NA | 12 | 13 | 6 | 20 | 5 | 3 | NA | 1 | 4 | NA | 17 | 10 | 9 | 15 | 14 | 11 |
A10 | 7 | 17 | 19 | 18 | NA | 15 | NA | 20 | 3 | 10 | 11 | 1 | 2 | 6 | 16 | 8 | NA | 13 | 12 | 14 |
A11 | NA | 16 | 19 | 15 | NA | 18 | 7 | 20 | 3 | NA | 11 | NA | 2 | 6 | 17 | 10 | 4 | NA | 8 | 9 |
A12 | 14 | 15 | 19 | 16 | 12 | 18 | 8 | 20 | 3 | 4 | 9 | 1 | 2 | 7 | 17 | 6 | 5 | NA | 10 | NA |
We supply potato_missing
to compute_mallows
as before:
We check the convergence with the following commands. The plots (not shown) would show good convergence before 1,000 iterations.
Again, we can look at the acceptance rates for the augmented data. Now let us compare it to the number of missing ranks per assessor.
bind_cols(model_fit$aug_acceptance,
n_missing = apply(potato_missing, 1, function(x) sum(is.na(x))))
assessor | acceptance_rate | n_missing |
---|---|---|
1 | 0.56 | 3 |
2 | 0.68 | 2 |
3 | 1.00 | 0 |
4 | 1.00 | 1 |
5 | 1.00 | 1 |
6 | 1.00 | 1 |
7 | 0.53 | 2 |
8 | 1.00 | 1 |
9 | 0.16 | 4 |
10 | 0.41 | 3 |
11 | 0.08 | 5 |
12 | 0.98 | 2 |
We see that for assessors who have either 0 or 1 missing ranks, their augmentations are always accepted. In the case of 0 missing ranks, there is in fact no augmentation going on at all, but we use the convention that the assessor’s own complete data is accepted.
The posterior distributions can be studied as shown above.
Handling of pairwise preferences in the Mallows rank model is described in Section 4.2 of Vitelli et al. (2018).
Let us start by considering a toy example with two assessors and five items. Assessor 1 has stated a set of preferences \[ \mathcal{B}_{1} = \left\{A_{1} \prec A_{2}, A_{2} \prec A_{5}, A_{4} \prec A_{5} \right\} \] and assessor 2 has the set of preferences \[ \mathcal{B}_{2} = \left\{ A_{1} \prec A_{2}, A_{2} \prec A_{3}, A_{3} \prec A_{4} \right\}. \]
Each time an assessor is asked to compare two objects, a measurement is made. Therefore, in order to keep the data tidy (Wickham (2014)), we define a dataframe in which each row corresponds to a pairwise comparison. The columns (variables) are the assessor, the bottom item, and the top item.
In the code snippet below, we define such a dataframe for the toy example presented above:
pair_comp <- tribble(
~assessor, ~bottom_item, ~top_item,
1, 1, 2,
1, 2, 5,
1, 4, 5,
2, 1, 2,
2, 2, 3,
2, 3, 4
)
assessor | bottom_item | top_item |
---|---|---|
1 | 1 | 2 |
1 | 2 | 5 |
1 | 4 | 5 |
2 | 1 | 2 |
2 | 2 | 3 |
2 | 3 | 4 |
Next, we need to find the transitive closure for the set of pairwise comparisons given by each user. BayesMallows
comes with a function generate_transitive_closure
to do just this.
As we can see, pair_comp_tc
has an additional row containing the relation \(A_{4} \prec A_{5}\) for assessor 1. For assessor 2, \[\text{tc}(\mathcal{B}_{2}) = \mathcal{B}_{2} \cup \left\{ A_{1} \prec A_{3}, A_{1} \prec A_{4}, A_{2} \prec A_{4}\right\},\] so three new rows have been added.
assessor | bottom_item | top_item |
---|---|---|
1 | 1 | 2 |
1 | 1 | 5 |
1 | 2 | 5 |
1 | 4 | 5 |
2 | 1 | 2 |
2 | 1 | 3 |
2 | 2 | 3 |
2 | 1 | 4 |
2 | 2 | 4 |
2 | 3 | 4 |
The dataframe returned by generate_transitive_closure
inherits from tibble
, but has subclass BayesMallowsTC
. The compute_mallows
function uses this information to ensure that the object provided has been through the generate_transitive_closure
function. If it has not, compute_mallows
will do it for us, but this may lead to additional computing time when running several diagnostic runs and trying out different parameters, since the transitive closure will be recomputed each time.
We can also generate an initial ranking, consistent with the pairwise comparisons. Again, compute_mallows
will do it for us, but we may save time by computing it once and for all before we starting running the algorithms.
Rather than digging deeper into this toy example, we go on with a real application.
The beach preference dataset is described in Section 6.2 of Vitelli et al. (2018), and is available in the dataframe beach_preferences
in BayesMallows
. In short, \(60\) assessors were each asked to perform a random set of pairwise comparisons between pictures of \(15\) beaches. The first few rows are listed below.
assessor | bottom_item | top_item |
---|---|---|
1 | 2 | 15 |
1 | 5 | 3 |
1 | 13 | 3 |
1 | 4 | 7 |
1 | 5 | 15 |
1 | 12 | 6 |
We start by generating the transitive closure of the preferences.
We can compare the dataframes before and after. We see that the number of rows has been approximately doubled, and that beach_tc
has subclass BayesMallowsTC
has it should.
str(beach_preferences)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1442 obs. of 3 variables:
#> $ assessor : num 1 1 1 1 1 1 1 1 1 1 ...
#> $ bottom_item: num 2 5 13 4 5 12 14 7 8 14 ...
#> $ top_item : num 15 3 3 7 15 6 6 9 10 9 ...
str(beach_tc)
#> Classes 'BayesMallowsTC', 'tbl_df', 'tbl' and 'data.frame': 2921 obs. of 3 variables:
#> $ assessor : num 1 1 1 1 1 1 1 1 1 1 ...
#> $ bottom_item: int 4 5 7 4 5 7 8 9 13 14 ...
#> $ top_item : int 1 1 1 3 3 3 3 3 3 3 ...
Next, we generate an initial ranking.
We can also take a look at the first 6 rows in it.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | 15 | 3 | 14 | 13 | 2 | 10 | 12 | 6 | 4 | 1 | 5 | 8 | 11 | 9 |
5 | 15 | 14 | 6 | 2 | 13 | 12 | 11 | 1 | 7 | 4 | 10 | 9 | 8 | 3 |
3 | 9 | 7 | 15 | 14 | 2 | 6 | 13 | 1 | 8 | 12 | 11 | 10 | 5 | 4 |
5 | 15 | 2 | 14 | 11 | 8 | 9 | 10 | 1 | 4 | 3 | 13 | 7 | 12 | 6 |
9 | 15 | 7 | 5 | 6 | 1 | 14 | 13 | 4 | 3 | 2 | 12 | 11 | 8 | 10 |
9 | 13 | 5 | 12 | 15 | 3 | 7 | 11 | 1 | 6 | 4 | 14 | 8 | 10 | 2 |
Let us add column names to the initial ranking. This will make the plots more informative. But be aware that if the names are too long, some plots will be filled with text. We therefore name them B1, B2, …, B15.
We can now check the convergence, using the same tools as before.
test_run <- compute_mallows(
rankings = beach_init_rank,
preferences = beach_tc,
save_aug = TRUE
)
#> Using exact partition function.
The trace plots for \(\alpha\) and \(\rho\) show good convergence.
When using assess_convergence
, we can set parameter = "Rtilde"
and choose assessors
and items
. We now illustrate this by going a bit deeper into some assessors. Referring to the mathematical notation in the top of the vignette, Rtilde
corresponds to the augmented rankings \(\tilde{R}_{1}, \dots, \tilde{R}_{N}\).
Let us look at Beach 2 for assessor 1.
assessor | bottom_item | top_item |
---|---|---|
1 | 2 | 6 |
1 | 2 | 15 |
It is implied by the preferences of assessor 1 that \(\{B_{2} \prec B_{6}\}\) and \(\{B_{2} \prec B_{15}\}\).
This seems correct.
Next, no ordering is implied between beach 2 and 4 for assessor 1.
beach_tc %>%
filter(assessor == 1, bottom_item %in% c(2, 4), top_item %in% c(2, 4)) %>%
nrow()
#> [1] 0
The traces of item 2 and 4 do indeed cross.
No ordering is implied between beach 5 and beach 8 either:
beach_tc %>%
filter(assessor == 1, bottom_item %in% c(5, 8), top_item %in% c(5, 8)) %>%
nrow()
#> [1] 0
The traces of beaches 5 and 8 do indeed cross.
We go on with assessor 2. Let us take a look at, say, beach 10.
assessor | bottom_item | top_item |
---|---|---|
2 | 10 | 5 |
2 | 7 | 10 |
Beach 10 is preferred to beach 7, but disfavored to beach 5. The trace plot does indeed show this:
No ordering is implied between beach 1 and beach 15 for assessor 2.
beach_tc %>%
filter(assessor == 1, bottom_item %in% c(1, 15), top_item %in% c(1, 15)) %>%
nrow()
#> [1] 0
Their traces do indeed cross.
We delete the test run before going on.
Based on the convergence diagnostics, and being fairly conservative, we set burnin = 1000
, and take an additional 5,000 samples. We still save the augmented data, because we are going to call the function plot_top_k
below.
model_fit <- compute_mallows(
rankings = beach_init_rank,
preferences = beach_tc,
nmc = 6000,
save_aug = TRUE
)
#> Using exact partition function.
model_fit$burnin <- 1000
The posterior densities of \(\alpha\) and \(\rho\) can be studied as shown above.
We can rank the beaches according to the cumulative probability (CP) consensus. This functionality is provided by compute_consensus
, which returns a dataframe.
ranking | item | cumprob |
---|---|---|
1 | B9 | 0.87 |
2 | B6 | 1.00 |
3 | B3 | 0.57 |
4 | B11 | 0.99 |
5 | B15 | 1.00 |
6 | B10 | 0.94 |
7 | B1 | 1.00 |
8 | B5 | 0.76 |
9 | B7 | 0.63 |
10 | B13 | 1.00 |
11 | B8 | 0.66 |
12 | B4 | 0.60 |
13 | B12 | 0.87 |
14 | B14 | 0.95 |
15 | B2 | 1.00 |
The maximum a posterior (MAP) consensus ranking is the a posterior most likely value of \(\rho\).
probability | item | map_ranking |
---|---|---|
0.12 | B9 | 1 |
0.12 | B6 | 2 |
0.12 | B3 | 3 |
0.12 | B11 | 4 |
0.12 | B15 | 5 |
0.12 | B10 | 6 |
0.12 | B1 | 7 |
0.12 | B5 | 8 |
0.12 | B13 | 9 |
0.12 | B7 | 10 |
0.12 | B8 | 11 |
0.12 | B12 | 12 |
0.12 | B4 | 13 |
0.12 | B14 | 14 |
0.12 | B2 | 15 |
We can compute posterior intervals for several parameters. Here we do it for \(\rho\).
item | parameter | mean | median | conf_level | hpdi | central_interval |
---|---|---|---|---|---|---|
B1 | rho | 7 | 7 | 95 % | [6,7] | [6,7] |
B2 | rho | 15 | 15 | 95 % | [14,15] | [14,15] |
B3 | rho | 3 | 3 | 95 % | [3,4] | [3,4] |
B4 | rho | 12 | 12 | 95 % | [11,13] | [11,13] |
B5 | rho | 8 | 8 | 95 % | [8,10] | [8,10] |
B6 | rho | 2 | 2 | 95 % | [1,2] | [1,2] |
B7 | rho | 9 | 9 | 95 % | [8,10] | [8,10] |
B8 | rho | 11 | 11 | 95 % | [11,13] | [11,13] |
B9 | rho | 1 | 1 | 95 % | [1,2] | [1,2] |
B10 | rho | 6 | 6 | 95 % | [6,7] | [6,7] |
B11 | rho | 4 | 4 | 95 % | [3,4] | [3,4] |
B12 | rho | 13 | 13 | 95 % | [12,14] | [12,15] |
B13 | rho | 9 | 10 | 95 % | [8,10] | [8,10] |
B14 | rho | 14 | 14 | 95 % | [13,15] | [13,15] |
B15 | rho | 5 | 5 | 95 % | [5] | [4,5] |
We can also find the posterior probability for each beach of being ranked top-\(k\), both in \(\rho\) and among the assessors.
We can do a top-\(k\) plot with plot_top_k
. By default, k = 3
. It may be necessary to do some experimentation with the rel_widths
argument to get a good looking plot.
In many situations, it is interesting to divide the assessors into clusters, in which each the assessors in each cluster have similar preferences. In Vitelli et al. (2018), Section 4.3, it is shown how the Bayesian Mallows model can be extended to do exactly this, with a mixture approach. BayesMallows
supports this through the optional argument n_clusters
to compute_mallows
.
To illustrate how to perform clustering, let us create some clusters in the potato_visual
dataset. This dataset has 12 assessors. We leave the first 6 assessors as is, but for the last 6, we revert the rankings.
P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | P12 | P13 | P14 | P15 | P16 | P17 | P18 | P19 | P20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A1 | 10 | 18 | 19 | 15 | 6 | 16 | 4 | 20 | 3 | 5 | 12 | 1 | 2 | 9 | 17 | 8 | 7 | 14 | 13 | 11 |
A2 | 10 | 18 | 19 | 17 | 11 | 15 | 6 | 20 | 4 | 3 | 13 | 1 | 2 | 7 | 16 | 8 | 5 | 12 | 9 | 14 |
A3 | 12 | 15 | 18 | 16 | 13 | 11 | 7 | 20 | 6 | 3 | 8 | 2 | 1 | 4 | 19 | 5 | 9 | 14 | 10 | 17 |
A4 | 9 | 17 | 19 | 16 | 10 | 15 | 5 | 20 | 3 | 4 | 8 | 1 | 2 | 7 | 18 | 11 | 6 | 13 | 14 | 12 |
A5 | 12 | 17 | 19 | 15 | 7 | 16 | 2 | 20 | 3 | 9 | 13 | 1 | 4 | 5 | 18 | 11 | 6 | 8 | 10 | 14 |
A6 | 10 | 15 | 19 | 16 | 8 | 18 | 6 | 20 | 3 | 7 | 11 | 1 | 2 | 4 | 17 | 9 | 5 | 13 | 12 | 14 |
A7 | 12 | 5 | 2 | 4 | 11 | 6 | 16 | 1 | 18 | 13 | 10 | 20 | 19 | 15 | 3 | 14 | 17 | 7 | 9 | 8 |
A8 | 7 | 3 | 1 | 2 | 10 | 6 | 15 | 4 | 17 | 18 | 11 | 20 | 19 | 14 | 5 | 13 | 16 | 9 | 12 | 8 |
A9 | 13 | 5 | 3 | 2 | 9 | 8 | 15 | 1 | 16 | 18 | 14 | 20 | 17 | 19 | 4 | 11 | 12 | 6 | 7 | 10 |
A10 | 14 | 4 | 2 | 3 | 12 | 6 | 16 | 1 | 18 | 11 | 10 | 20 | 19 | 15 | 5 | 13 | 17 | 8 | 9 | 7 |
A11 | 9 | 5 | 2 | 6 | 8 | 3 | 14 | 1 | 18 | 16 | 10 | 20 | 19 | 15 | 4 | 11 | 17 | 7 | 13 | 12 |
A12 | 7 | 6 | 2 | 5 | 9 | 3 | 13 | 1 | 18 | 17 | 12 | 20 | 19 | 14 | 4 | 15 | 16 | 8 | 11 | 10 |
Let us try first without clustering.
Looking at the convergence of \(\alpha\), we see that this model does not settle on a consensus ranking.
Let us instead introduce clustering:
model_fit <- compute_mallows(
rankings = potato_manipulated,
n_clusters = 2
)
#> Using exact partition function.
We can assess convergence in the usual way. Let us start with alpha
. The assess_convergence
method now shows one trace per cluster.
We plot the potato that is most often ranked heaviest (P8), and the potato that is most often ranked lightest (P12).
We also see that the probability of each cluster, \(\tau_{1}\) and \(\tau_{2}\), fluctuate around 0.5, which is just as expected, since we reverted half of the rankings.
We now go on to show clustering in a real data example.
The BayesMallows
package comes with a set of sushi preference data, in which 5000 assessors each have ranked a set of 10 types of sushi (Kamishima (2003)). Here are the first few rows of the dataset.
shrimp | sea eel | tuna | squid | sea urchin | salmon roe | egg | fatty tuna | tuna roll | cucumber roll |
---|---|---|---|---|---|---|---|---|---|
2 | 8 | 10 | 3 | 4 | 1 | 5 | 9 | 7 | 6 |
1 | 8 | 6 | 4 | 10 | 9 | 3 | 5 | 7 | 2 |
2 | 8 | 3 | 4 | 6 | 7 | 10 | 1 | 5 | 9 |
4 | 7 | 5 | 6 | 1 | 2 | 8 | 3 | 9 | 10 |
4 | 10 | 7 | 5 | 9 | 3 | 2 | 8 | 1 | 6 |
4 | 6 | 2 | 10 | 7 | 5 | 1 | 9 | 8 | 3 |
With clustering, we can see if there are subsets of assessors with similar preferences. We set nmc = 1000
here, to speed up the vignette building. The argument include_wcd
specifies whether to compute within-cluster distance during MCMC. When n_cluster > 1
it defaults to TRUE
, otherwise to FALSE
. Hence, in order to get the within-cluster distance for the one-cluster case, we set include_wcd = TRUE
when calling compute_mallows
. We also set save_clus = TRUE
to get the cluster assignments.
model_fit1 <- compute_mallows(sushi_rankings, nmc = 1000, include_wcd = TRUE, save_clus = TRUE)
#> Using exact partition function.
model_fit2 <- compute_mallows(sushi_rankings, n_clusters = 2, nmc = 1000, save_clus = TRUE)
#> Using exact partition function.
It is useful to look at the trace plots of \(\alpha_{1}, \dots, \alpha_{C}\).
We can also look at the trace plot of the cluster probabilities, \(\tau_{1}\) and \(\tau_{2}\):
It seems like \(\alpha\) is mixing rapidly. Let us set burnin = 400
, and compute the CP consensus for the case with two clusters.
We can find the CP consensus for each of the two clusters.
We can now look at each cluster:
cluster | ranking | item | cumprob |
---|---|---|---|
Cluster 1 | 1 | fatty tuna | 1 |
Cluster 1 | 2 | tuna | 1 |
Cluster 1 | 3 | shrimp | 1 |
Cluster 1 | 4 | squid | 1 |
Cluster 1 | 5 | tuna roll | 1 |
Cluster 1 | 6 | sea eel | 1 |
Cluster 1 | 7 | egg | 1 |
Cluster 1 | 8 | salmon roe | 1 |
Cluster 1 | 9 | cucumber roll | 1 |
Cluster 1 | 10 | sea urchin | 1 |
cluster | ranking | item | cumprob |
---|---|---|---|
Cluster 2 | 1 | fatty tuna | 1 |
Cluster 2 | 2 | sea urchin | 1 |
Cluster 2 | 3 | salmon roe | 1 |
Cluster 2 | 4 | sea eel | 1 |
Cluster 2 | 5 | tuna | 1 |
Cluster 2 | 6 | shrimp | 1 |
Cluster 2 | 7 | tuna roll | 1 |
Cluster 2 | 8 | squid | 1 |
Cluster 2 | 9 | egg | 1 |
Cluster 2 | 10 | cucumber roll | 1 |
We can also compute an elbow plot, using plot_elbow
. Before doing that, we should look at its arguments:
plot_elbow
requires the model fits for different number of clusters to be provided as the first arguments, either comma separated, or as a list. The other argument is burnin
, which must be named. If each model provided has its burnin
element set, then this is taken as the default value of burnin
.
In order to systematically investigate how many clusters to include, we can use the function compute_mallows_mixtures
, which takes a vector that specifies the number of clusters, and calls compute_mallows
once for each number.
We only need the within-cluster distances to compute the elbow plot. Thus, we can basically “thin out” all the other parameters, since we do not need them. This reduces the amount of memory used.
n_clusters <- seq(from = 1, to = 10)
nmc <- 2000
models <- compute_mallows_mixtures(n_clusters = n_clusters,
rankings = sushi_rankings,
nmc = nmc,
rho_thinning = nmc - 1,
aug_thinning = nmc - 1,
include_wcd = TRUE
)
Even though the result of compute_mallows_mixtures
is a list of fitted models, we can specify burnin
as an element in this case as well, and it is used by plot_elbow
.
We can now provide the models
list to plot_elbow
.
Let us choose 5 clusters. We then compute this model, now without the extreme thinning, and also saving cluster assignments.
model <- compute_mallows(rankings = sushi_rankings, nmc = nmc, n_clusters = 5, save_clus = TRUE)
#> Using exact partition function.
We can also the function assign_cluster
to get the a posteriori cluster assignment for each assessor.
assessor | probability | map_cluster |
---|---|---|
1 | 0.844 | Cluster 5 |
2 | 0.817 | Cluster 5 |
3 | 0.629 | Cluster 4 |
4 | 0.617 | Cluster 1 |
5 | 0.500 | Cluster 4 |
6 | 0.499 | Cluster 5 |
Asfaw, D., V. Vitelli, O. Sorensen, E. Arjas, and A. Frigessi. 2016. “Time‐varying Rankings with the Bayesian Mallows Model.” Stat 6 (1): 14–30. https://onlinelibrary.wiley.com/doi/abs/10.1002/sta4.132.
Crispino, M., E. Arjas, V. Vitelli, N. Barrett, and A. Frigessi. 2018. “A Bayesian Mallows approach to non-transitive pair comparison data: how human are sounds?” Accepted for Publication in Annals of Applied Statistics.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian Data Analysis. 2nd ed. Chapman; Hall/CRC.
Kamishima, T. 2003. “Nantonac Collaborative Filtering: Recommendation Based on Order Responses.” In Proceedings of the Ninth Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 583–88. New York, NY, USA: ACM.
Liu, Q., M. Crispino, I. Scheel, V. Vitelli, and A. Frigessi. 2018. “Model-based learning from preference data.” Manuscript.
Mallows, C. L. 1957. “Non-Null Ranking Models. I.” Biometrika 44 (1/2): 114–30.
Vitelli, V., O. Sorensen, M. Crispino, E. Arjas, and A. Frigessi. 2018. “Probabilistic Preference Learning with the Mallows Rank Model.” Journal of Machine Learning Research 18 (1): 1–49. http://jmlr.org/papers/v18/15-481.html.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software, Articles 59 (10): 1–23. https://doi.org/10.18637/jss.v059.i10.