eimpute: Efficiently IMPUTE Large Scale Incomplete Matrix

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction

Matrix completion is a procedure for imputing the missing elements in matrices by using the information of observed elements. This procedure can be visualized as:

Matrix completion has attracted a lot of attention, it is widely applied in:

tabular data imputation: recover the missing elements in data table;
recommend system: estimate users’ potantial preference for items pending purchased;
image inpainting: inpaint the missing elements in digit images.

A computationally efficient R package, eimpute is developed for matrix completion. In eimpute, matrix completion problem is solved by iteratively performing low-rank approximation and data calibration, which enjoy two admirable advantages:

unbiased low-rank approximation for incomplete matrix
less time consumption via truncated SVD

Compare eimpute and softimpute in systhesis datasets \(X_{m \times m}\) with \(p\) proportion missing observations. The square matrix \(X_{m \times m}\) is generated by \(X = UV + \epsilon\), where \(U\) and \(V\) are \(m \times r\), \(r \times n\) matrices whose entries are \(i.i.d.\) sampled standard normal distribution, \(\epsilon \sim N(0, r/3)\).

\(m\) is chosen as 1000, 2000, 3000, 4000
\(p\) is chosen as 0.1, 0.5, 0.9.

In high dimension case, als method in softimpute is a little faster than eimpute in low proportion of missing observations, as the proportion of missing observations increase, rsvd method in eimpute have a better performance than softimpute in time cost and test error. Compare with two method in **eimpute*, rsvd method is better than tsvd in time cost.

Installation

Install the stable version from CRAN:

install.packages("eimpute")

Install the development version from github:

library(devtools)
install_github("Mamba413/eimpute", build_vignettes = TRUE)

Quick Example

We start with a toy example. Let us generate a small matrix with some values missing via incomplete.generator function.

m <- 6
n <- 5
r <- 3
x_na <- incomplete.generator(m, n, r)
x_na
#>            [,1]       [,2]       [,3]      [,4]       [,5]
#> [1,] -0.8269428  1.2228586         NA        NA         NA
#> [2,] -2.2410010  4.5095165         NA        NA         NA
#> [3,]  0.4499102         NA -0.2818085 0.7718102 -0.8364048
#> [4,]         NA  1.7167365  0.9480745        NA  3.5680208
#> [5,]         NA  0.7240437         NA        NA  0.2633712
#> [6,]         NA -2.8879249         NA 1.2027552         NA

Use eimpute function to impute missing values.

x_impute <- eimpute(x_na, r)
x_impute[["x.imp"]]
#>            [,1]       [,2]        [,3]      [,4]       [,5]
#> [1,] -0.8269428  1.2228586  0.19035820 0.9514541  0.2994880
#> [2,] -2.2410010  4.5095165  0.39560039 0.7295574  0.4911418
#> [3,]  0.4499102 -1.2083884 -0.28180850 0.7718102 -0.8364048
#> [4,] -0.3408353  1.7167365  0.94807452 0.1835412  3.5680208
#> [5,] -0.3669454  0.7240437  0.11988844 0.3294654  0.2633712
#> [6,]  1.3875965 -2.8879249  0.01871091 1.2027552  0.4512052

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.