The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
#Partitioned Distances
Given a matrix X with m observations and another matrix Y with n
observations, Partitioned Distances computes the m by n distance matrix.
A rectangular distance matrix can be more appropriate than a square
matrix in many applications; for example, in bipartite graphs we might
be concerned with the distance between objects in Graph A with objects
in Graph B, but we may not care about the distance between objects
within Graph A or Graph B. Currently, R only has a dist
function which returns square distance matrices.
##Performance pdist
is a slightly optimized version of
the native dist
function; distances are not computed
between objects that are both in X or both in Y. Using native functions,
we could stack X and Y on top of each other using rbind
,
and call dist
on the result, but this would compute the
(m+n) by (m+n) distance matrix, yielding m^2 + mn + n^2 unnecessary
distance computations. If the matrices have p columns, and the distance
metric is the Euclidean metric, then p(m^2 + mn + n^2) unnecessary flops
are made. More complex metrics, such as dynamic time warping, can run in
O(p^3), which means a naive dist function would make O(p3(m2
+ mn + n^2)) unnecessary flops!
##Timing Using a matrix X that is 1000 by 100, it took 0.543 seconds
to compute the distance matrix based on the Euclidean metric using
dist
. Using pdist, the timing was the same. If we are
interested in the subset A taken by the first 100 rows of X, and subset
B taken by the next 100 rows of X, we can compute a smaller distance
matrix in only 0.006 seconds!
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.