---
title: "Protein Design Explorer"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Protein Design Explorer}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Introduction

This vignette adapts the Mosaic
[Protein Design Explorer](https://idl.uw.edu/mosaic/examples/protein-design.html)
example for rMosaic. It explores synthesized protein minibinders generated via
RFDiffusion and combines:

- Menus for process parameters
- Marginal histograms for pLDDT and pAE
- A raster scatterplot for dense protein design metrics
- A linked table for inspecting selected designs

For pAE, lower values are better. For pLDDT, higher values are better.

**Note:** This example uses a remote Parquet file and may take a few seconds to
load.

## Example: Protein Design Explorer

```{r protein_design_explorer}
library(rMosaic)

protein_data_url <- "https://idl.uw.edu/mosaic/data/protein-design.parquet"

protein_spec <- list(
  meta = list(
    title = "Protein Design Explorer",
    description = paste(
      "Explore synthesized proteins generated via RFDiffusion.",
      "Minibinders are small proteins that bind to a specific protein target.",
      "The dashboard links parameter menus, marginal histograms, a pLDDT vs. pAE raster plot, and a table of selected designs.",
      sep = "\n\n"
    ),
    credit = paste(
      "Adapted from a UW CSE 512 project by Christina Savvides,",
      "Alexander Shida, Riti Biswas, and Nora McNamara-Bordewick.",
      "Data from the UW Institute for Protein Design."
    )
  ),
  data = list(
    proteins = list(file = protein_data_url)
  ),
  params = list(
    query = list(select = "crossfilter"),
    point = list(select = "intersect", empty = TRUE),
    plddt_domain = c(67, 94.5),
    pae_domain = c(5, 29),
    scheme = "observable10"
  ),
  vconcat = list(
    # Parameter menus filter all downstream views.
    list(
      hconcat = list(
        list(
          input = "menu",
          from = "proteins",
          column = "partial_t",
          label = "Partial t",
          as = "$query"
        ),
        list(
          input = "menu",
          from = "proteins",
          column = "noise",
          label = "Noise",
          as = "$query"
        ),
        list(
          input = "menu",
          from = "proteins",
          column = "gradient_decay_function",
          label = "Gradient Decay",
          as = "$query"
        ),
        list(
          input = "menu",
          from = "proteins",
          column = "gradient_scale",
          label = "Gradient Scale",
          as = "$query"
        )
      )
    ),
    list(vspace = "1.5em"),
    # Top marginal histogram for pLDDT.
    list(
      hconcat = list(
        list(
          plot = list(
            list(
              mark = "rectY",
              data = list(from = "proteins", filterBy = "$query"),
              x = list(bin = "plddt_total", steps = 60),
              y = list(count = NULL),
              z = "version",
              fill = "version",
              order = "z",
              reverse = TRUE,
              insetLeft = 0.5,
              insetRight = 0.5
            )
          ),
          width = 600,
          height = 55,
          xAxis = NULL,
          yAxis = NULL,
          xDomain = "$plddt_domain",
          colorDomain = "Fixed",
          colorScheme = "$scheme",
          marginLeft = 40,
          marginRight = 0,
          marginTop = 0,
          marginBottom = 0
        ),
        list(hspace = 5),
        list(
          legend = "color",
          `for` = "scatter",
          columns = 1,
          as = "$query"
        )
      )
    ),
    # Main raster scatterplot plus right marginal histogram for pAE.
    list(
      hconcat = list(
        list(
          name = "scatter",
          plot = list(
            list(mark = "frame", stroke = "#ccc"),
            list(
              mark = "raster",
              data = list(from = "proteins", filterBy = "$query"),
              x = "plddt_total",
              y = "pae_interaction",
              fill = "version",
              pad = 0
            ),
            list(
              select = "intervalXY",
              as = "$query",
              brush = list(stroke = "currentColor", fill = "transparent")
            ),
            list(
              mark = "dot",
              data = list(from = "proteins", filterBy = "$point"),
              x = "plddt_total",
              y = "pae_interaction",
              fill = "version",
              stroke = "currentColor",
              strokeWidth = 0.5
            )
          ),
          opacityDomain = c(0, 2),
          opacityClamp = TRUE,
          colorDomain = "Fixed",
          colorScheme = "$scheme",
          xDomain = "$plddt_domain",
          yDomain = "$pae_domain",
          xLabelAnchor = "center",
          yLabelAnchor = "center",
          marginTop = 0,
          marginLeft = 40,
          marginRight = 0,
          width = 600,
          height = 450
        ),
        list(
          plot = list(
            list(
              mark = "rectX",
              data = list(from = "proteins", filterBy = "$query"),
              x = list(count = NULL),
              y = list(bin = "pae_interaction", steps = 60),
              z = "version",
              fill = "version",
              order = "z",
              reverse = TRUE,
              insetTop = 0.5,
              insetBottom = 0.5
            )
          ),
          width = 55,
          height = 450,
          xAxis = NULL,
          yAxis = NULL,
          marginTop = 0,
          marginLeft = 0,
          marginRight = 0,
          yDomain = "$pae_domain",
          colorDomain = "Fixed",
          colorScheme = "$scheme"
        )
      )
    ),
    list(vspace = "1em"),
    list(
      input = "table",
      as = "$point",
      filterBy = "$query",
      from = "proteins",
      columns = c(
        "version",
        "pae_interaction",
        "plddt_total",
        "noise",
        "gradient_decay_function",
        "gradient_scale",
        "movement"
      ),
      width = 680,
      height = 215
    )
  )
)

runMosaicApp(
  spec = protein_spec,
  specType = "yaml",
  data = NULL,
  title = "Protein Design Explorer",
  backend = "wasm",
  height = "900px"
)
```

## Key Features

### Crossfilter Menus and Brushing

The `query` parameter uses `select = "crossfilter"`, so menus and the scatterplot
brush all contribute to a single linked filter.

### Dense Metric View

The central `raster` mark aggregates tens of thousands of protein designs by
`plddt_total` and `pae_interaction`, colored by design `version`.

### Marginal Distributions

The top and right histograms use the same `query` filter as the central plot,
making it easier to compare pLDDT and pAE distributions after filtering by
process parameters.

### Linked Table

The table is filtered by the current query and writes to a separate `point`
selection. Hovering or selecting table rows highlights corresponding records in
the scatterplot.

## Try It Yourself

1. Select values from the parameter menus to compare design settings.
2. Brush the lower-right region of the scatterplot to focus on low pAE and high pLDDT designs.
3. Inspect the linked table to see parameter values for promising designs.
