The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

distops

R-CMD-check Codecov test coverage pkgdown test-coverage CRAN status

The goal of distops is to provide a set of functions to compute distances between observations in a sample and to perform operations on distance matrices.

Installation

You can install the development version of distops from GitHub with:

# install.packages("devtools")
devtools::install_github("LMJL-Alea/distops")

Features

library(distops)

Package developement

We provide two functions for package developers to help with defining efficient implementation of the dist functions for custom distances. Namely:

Subset operator

Let us compute the Euclidean distance matrix for the iris dataset:

D <- dist(iris[, 1:4], method = "euclidean")

We can subset this matrix using the [ operator. We can either provide the same indices for rows and columns in which case it return another object of class dist:

D[1:3, 1:3]
#>           1         2
#> 2 0.5385165          
#> 3 0.5099020 0.3000000

Or we can provide different indices for rows and columns in which case it returns a dense matrix:

D[2:3, 7:12]
#>           7         8         9        10        11        12
#> 2 0.5099020 0.4242641 0.5099020 0.1732051 0.8660254 0.4582576
#> 3 0.2645751 0.4123106 0.4358899 0.3162278 0.8831761 0.3741657

The subsetting operation is fully parallelized using the RcppParallel package. It is also memory efficient as it does not copy the original distance matrix.

Medoid computation

The medoid of a sample is the observation that minimizes the sum of distances to all other observations. The find_medoids() function computes the medoid of a sample for a given distance. It takes advantage of the RcppParallel package to compute the medoid in parallel.

find_medoids(D)
#> [1] 62

If the memberships argument is provided, it returns the medoid for each cluster.

find_medoids(D, memberships = as.factor(rep(1:3, each = 50L)))
#>   1   2   3 
#>   8  97 113

Future work

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.