Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Quality Measurements for Dimensionality Reduction

Version:

0.2.1

Date:

2023-10-10

Maintainer:

Michael Thrun <m.thrun@gmx.net>

Description:

Several quality measurements for investigating the performance of dimensionality reduction methods are provided here. In addition a new quality measurement called Gabriel classification error is made accessible, which was published in Thrun, M. C., Märte, J., & Stier, Q: "Analyzing Quality Measurements for Dimensionality Reduction" (2023), Machine Learning and Knowledge Extraction (MAKE), <doi:10.3390/make5030056>.

License:

GPL-3

Imports:

DatabionicSwarm

Suggests:

plotly, geometry, deldir, FCPS, ProjectionBasedClustering (≥ 1.2.1), DataVisualizations, FastKNN, ggplot2, pcaPP,pracma, spdep, grid, igraph, cccd, sf

Encoding:

UTF-8

NeedsCompilation:

Packaged:

2023-10-12 07:30:23 UTC; MCT

Author:

Quirin Stier [aut], Florian Lerch [ctb], Julian Märte [aut], Hermann Tafo [ctb], Laukert Schlichting [ctb], Michael Thrun

[aut, cph, cre]

Repository:

CRAN

Date/Publication:

2023-10-12 09:40:15 UTC

Classification Error (rate)

Description

Compares projected points to a given prior classification using knn classifier.

Usage

ClassificationError(OutputDistances,Cls,k=5)

Arguments

OutputDistances

[1:n,1:n] numeric matrix with distance matrix of projected data.

Cls

[1:n] Numeric vector containing class information.

k

number of k nearest neighbors, in Venna 2010 set to 5 (here default)

Details

Projected points are evaluated by k-nearest neighbor classification accuracy (with k = 5), that is, each sample in the visualization is classified by majority vote of its k nearest neighbors in the visualization, and the classification is compared to the ground truth label. [Venna 2010].

Value

List with three entries:

Error

Classification Error: 1-Accuracy[1]

Accuracy

Accuracy

KNNCls

[1:n]] cls of knn classifier

Note

Here, the Outputdistances of the Projected points are used.

Author(s)

Michael Thrun

References

Venna, J., Peltonen, J., Nybo, K., Aidos, H., and Kaski, S. Information retrieval perspective to nonlinear dimensionality reduction for data visualization. The Journal of Machine Learning Research, 11, 451-490. (2010)

Gracia, A., Gonzalez, S., Robles, V., and Menasalvas, E. A methodology to compare Dimensionality Reduction algorithms in terms of loss of quality. Information Sciences, 270, 1-27. (2014)

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
ClassificationError(as.matrix(dist(projection)),Hepta$Cls)
}

C-Measure subtypes

Description

Calculate the C-Measure subtypes of minimal path length and minimal wiring

Usage

Cmeasure(Data, Projection, k = 1)

Arguments

Data

[1:n,1:d] numerical matrix of points in input space.

Projection

[1:n,1:2] numerical matrix of points in output space.

k

Number of nearest neighbors, both measures set it always to k=1.

Value

[1:2] Numerical vector of MinimalPathlength and MinimalWiring values.

Author(s)

Michael Thrun

Examples



if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
Cmeasure(Hepta$Data,projection)
}

Gabriel Classification Error (GCE)

Description

GCE searches for the k-nearest neighbors of the first gabriel neighbors weighted by the Euclidean Distances of the Inputspace [Thrun et al, 2023]. GCE evaluates these neighbors in the Output space. A low value indicates a better two-dimensional projection of the high-dimensional Input space.

Usage

GabrielClassificationError(Data,ProjectedPoints,Cls,LC,
PlotIt=FALSE,Plotter = "native", Colors = NULL,LineColor= 'grey',
main = "Name of Projection", mainSize = 24,xlab = "X", ylab = "Y", xlim, ylim,
pch,lwd,Margin=list(t=50,r=0,l=0,b=0))

Arguments

Data

[1:n,1:d] Numeric matrix with n cases and d variables

ProjectedPoints

[1:n,1:2] Numeric matrix with 2D points in cartesian coordinates

Cls

[1:n] Numeric vector with class labels

LC

Optional, Numeric vector of two values determining grid size of the underlying projection

PlotIt

Optional, Boolean: TRUE/FALSE => Plot/Do not plot (Default: FALSE)

Plotter

Optional, Character with plot technique (native or plotly)

Colors

Optional, Character vector of class colors for points

LineColor

Optional, Character of line color used for edges of graph

main

Optional, Character plot title

mainSize

Optional, Numeric size of plot title

xlab

Optional, Character name of x ax

ylab

Optional, Character name of y ax

xlim

Optional, Numeric vector with two values defining x ax range

ylim

Optional, Numeric vector with two values defining y ax range

pch

Optional, Numeric of point size (graphic parameter)

lwd

Optional, Numeric of linewidth (graphic parameter)

Margin

Optional, Margin of plotly plot

Details

Gabriel Classification Error (GCE) makes an unbiased evaluation of distance- and density-based structures which might be even non-linear separable. First, GCE utilizes the information provided by a prior classification to assess projected structures. Second, GCE applies the insights drawn from graph theory. Details are described in [Thrun et al, 2023].

Value

list of several entries containing first the GCE itself as main result followed by further entries which contain potential important information

GCE

Numeric: the 'Gabriel Classification Error'

GCEperPoint

[1:n] unnormalized GCE of each point: GCE = mean(GCEperPoint)

nn

the number of points in a relevant neghborhood: 0.5 * 85percentile(AnzNN)

AnzNN

[1:n] the number of points with a Gabriel graph neighborhood

NNdists

[1:n,1:nn] the distances within the relevant neighborhood, 1 for inter cluster distances and 0 for inner cluster distances

HD

[1:nn] HD = HarmonicDecay(nn) i.e weight function for the NNdists: GCEperPoint = HD*NNdists

IsInterDistance

Distances to the nn closest neighbors.

GabrielDists

Distance matrix implied by high dimensional distances and the underlying gabriel (Gabriel) graph

ProjectionGraphError

Plotly object in case, plotly is chosen.

Author(s)

Michael Thrun, Quirin Stier, Julian Märte

References

[Thrun et al, 2023] Thrun, M.C, Märte, J., Stier, Q.: Analyzing Quality Measurements for Dimensionality Reduction, Machine Learning and Knowledge Extraction (MAKE), Vol 5., accepted, 2023.

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE
}


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
GabrielClassificationError(Hepta$Data,projection,Hepta$Cls)$GCE
}

Statistical correlation by Kendall

Description

Calculates the statistical correlation by Kendall. Basically a wrapper to pcaPP::cor.fk.

Usage

KendallsTau(InputDists, OutputDists)

Arguments

InputDists

Matrix containing the distances of the first dataset.

OutputDists

Matrix containing the distances of the second dataset.

Value

Equivalent to cor.fk

Author(s)

Michael Thrun

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
InputDist=dist(Hepta$Data)
projection=cmdscale(InputDist, k=2)
KendallsTau(as.matrix(InputDist),as.matrix(dist(projection)))
}

Trustworthiness and Discontinuity.

Description

In a trustworthy projection the visualized proximities hold in the original data as well, whereas a continuous projection visualizes all proximities of the original data.

Usage

MeasureTandD(Data, pData, NeighborhoodSize)

Arguments

Data

[1:n,1:d] points in input room with d attributes

pData

[1:n,1:2] projected points in output room, with index,x,y or index,line,column

NeighborhoodSize

Integer - sets the maximum number of neighbors to calculate trustworthiness and continuity for.

Value

Numeric matrix [1:NeighborhoodSize,1:2] containing the trustworthiness values in the first column and the discontinuity values in the second column.

Author(s)

Julian Märte

References

Venna, J., & Kaski, S. (2005, September). Local multidimensional scaling with controlled tradeoff between trustworthiness and continuity. In Proceedings of 5th Workshop on Self-Organizing Maps (pp. 695-702).

Kaski, S., Nikkilä, J., Oja, M., Venna, J., Törönen, P., & Castrén, E. (2003). Trustworthiness and metrics in visualizing similarity of gene expression. BMC bioinformatics, 4(1), 1-13.

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
MeasureTandD(Hepta$Data,projection, 2)
}

Precision and Recall.

Description

Trade-off between missing similar points versus retrieving dissimilar points.

Usage

PrecisionAndRecall(Data, pData, NeighborhoodSize = 20)

Arguments

Data

[1:n,1:d] points in input room with d attributes

pData

[1:n,1:2] projected points in output room, with index,x,y or index,line,column

NeighborhoodSize

Sets the 'effective number of neighbors' used to control the width of the Gaussian, NeRV paper Seite 463 setzt Default auf 20

Value

Numeric matrix [1:NeighborhoodSize, 1:2] containing the precision values in the first column and the recall values in the second column of the matrix.

Author(s)

Felix Pape

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
PrecisionAndRecall(Hepta$Data,projection)
}

Rescaled average agreement rate

Description

Rescaled average agreement rate deduced by the co-ranking matrix from LCMC.

Usage

RAAR(Data, ProjectedPoints, kmax = nrow(Data) - 2, PlotIt = TRUE)

Arguments

Data

Matrix containing n cases in rows, d variables in columns or a distance matrix which in this case has to be symmetric

ProjectedPoints

n by OutputDimension matrix containing coordinates of the Projection

kmax

maximum of intervall 1:kmax of k nearest neighbors

PlotIt

Optional: Should the output be plottet. Default: TRUE

Value

A list containing:

Raar

Rescaled average agreement rate

Aar

Average agreement rate

Author(s)

Michael Thrun

References

Lee, J. A., Peluffo-Ordonez, D. H., & Verleysen, M. Multiscale stochastic neighbor embedding: Towards parameter-free dimensionality reduction. Paper presented at the Proceedings of 22st European Symposium on Artificial Neural Networks, Computational Intelligence And Machine Learning (ESANN) (2014).

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
RAAR(Hepta$Data,projection,kmax=nrow(Hepta$Data)-2,PlotIt=TRUE)
}

Calculates the error of a projection with spearman's rank correlation coefficient.

Description

Calculates the error of a projection with spearman's rank correlation coefficient.

Arguments

VectorOfInputDists(1:n2)

dissimilarities in Input Space between the n data points in vector form as produced by squareform(Dists(1:n,1:n))

VectorOfOutputDists(1:n2)

dissimilarities in Output Space between the n data points in vector form as produced by squareform(Dists(1:n,1:n))

Value

rho rank correlation coefficient

Author(s)

Florian Lerch

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
SpearmanError(as.matrix(dist(Hepta$Data)),as.matrix(dist(projection)))
}

Calculates the error of a projection with spearman's rank correlation coefficient

Description

Calculates the error of a projection with spearman's rank correlation coefficient

Usage

SpearmansRho(InputDists, OutputDists)

Arguments

InputDists

[1:d,1:d] numeric matrix with input distances

OutputDists

[1:d,1:d] numeric matrix with output distances

Value

rho

Author(s)

Julian Märte

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
SpearmansRho(as.matrix(dist(Hepta$Data)),as.matrix(dist(projection)))
}

Topological Correlation

Description

Calculates the Topological Correlation

Usage

TopologicalCorrelation(Data,ProjectedPoints,type='norm',method,Kn=0)

Arguments

Data

[1:n, 1:d] a numeric matrix of the given n-dim. points: the rows represent the points and the columns represent the coordinates in the d-dim. space.

ProjectedPoints

[1:n, 1:2] numeric matrix of Projected Points, if missing, method should be set!

method

Determines whether the selected projections method for a given set of d-dim. points is a good choice. Therefor, a result of 1 means the selected projection method is good, and a result value of 0 means that the Visualization of the given Data in the two-dim. space doesnt fit the problem.

type

How the paths in the adjacency matrix should be weighted. 'norm' representes path lenthgs of 1 and eucldidean represents the distance in the euclidean metric.

Kn

k nearest neighbours in the graph. only needed in method is isomap and LocallyLinearEmbedding

Value

TC value

Author(s)

Hermann Tafo, Laukert Schlichting 07/2015

Examples

#requires DatabionicSwarm v2.2.1


if(requireNamespace("FCPS")){
#data(Hepta,package="FCPS")
#projection=cmdscale(dist(Hepta$Data), k=2)
#TopologicalCorrelation(Hepta$Data,projection)
}

ZrehenMeasure4All

Description

A generalized version of the Zrehen-measure which defines the neighbourhood with a Gabriel Graph and is therefore not restricted to grid-based projections.

Usage

ZrehenMeasure4All(Data, Projection, width, height, isToroid = FALSE,
isGrid = TRUE, plotGabriel = FALSE)

Arguments

Data

[1:n,1:d] points in input room with d attributes

Projection

[1:n,1:2] projected points in output room, with index,x,y or index,line,column

width

Numeric: only necessary if toroid

height

Numeric: only necessary if toroid

isToroid

Boolean: are the points toroid?

isGrid

Boolean: is the grid a toroid?

plotGabriel

Boolean: plot the generated GabrielGraph (TRUE) or not (FALSE). Default: plotGabriel=FALSE.

Value

List with

V$zrehen

the raw zrehen measure

V$normedzrehen

the zrehen measure normed by the number of neighbours

v$neighbourcounter

the number of possible neighbours by which the zrehen measure is normed

Author(s)

Florian Lerch 07/2015

Examples


if(requireNamespace("FCPS")){
data(Hepta,package="FCPS")
projection=cmdscale(dist(Hepta$Data), k=2)
ZrehenMeasure4All(Hepta$Data,projection)$zrehen
}

Computes Rescaled Average Agreement Rate

Description

Rescaled average agreement rate deduced by the co-ranking matrix from LCMC for various different sizes of the neighborhood.

Usage

plotMeasureRAAR(Raar, label = 'ProjectionMethod',
gPlotList = list(RAARplot = ggplot2::ggplot()), LineType="solid", Shape = 16,
PointsPerE = 10, fancy = FALSE)

Arguments

Raar

Output of RAAR() applied for a projection method.

label

Title of plot.

gPlotList

Settings for ggplot.

LineType

Character - graphic parameter: Line type of ggplot.

Shape

Integer: type of point

PointsPerE

Numeric graphic parameter: Distance between markers on plot line

fancy

Boolean graphic parameter: Some automatic settings for a more appealing plot.

Value

ggplot object

Author(s)

Michael Thrun

Computes rank-based smoothed precision and recall

Description

Compares the projection in pData with the original data in Data and calculates trustworthiness and continuity of the projection for neighborhood sizes ranging from 1 to the size of the neighborhood.

Usage

plotMeasureTandD(TDmatrix, label = 'ProjectionMethod',
gPlotList = list(TW = ggplot2::ggplot(), DC = ggplot2::ggplot()), LineType = "solid",
Shape = 16, PointsPerE = 16)

Arguments

TDmatrix

Output of MeasureTundD() applied for a projection method.

label

Title of plot.

gPlotList

Settings for ggplot.

LineType

Character - graphic parameter: Line type of ggplot.

Shape

Integer: type of point

PointsPerE

Numeric graphic parameter: Distance between markers on plot line

Value

ggplot object

Author(s)

Michael Thrun