The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The goal of ucimlrepo
is to download and import data
sets directly into R from the UCI
Machine Learning Repository.
[!IMPORTANT]
This package is an unoffical port of the Python
ucimlrepo
package.
[!NOTE]
Want to have datasets alongside a help documentation entry?
Check out the
{ucidata}
R package! The package provides a small selection of data sets from the UC Irvine Machine Learning Repository alongside of help entries.
You can install the development version of ucimlrepo from GitHub with:
# install.packages("remotes")
::install_github("coatless-rpkg/ucimlrepo") remotes
To use ucimlrepo
, load the package using:
library(ucimlrepo)
With the package now loaded, we can download a dataset using the
fetch_ucirepo()
function or use the
list_available_datasets()
function to view a list of
available datasets.
For example, to download the iris
dataset, we can
use:
# Fetch a dataset by name
<- fetch_ucirepo(name = "iris")
iris_by_name names(iris_by_name)
#> [1] "data" "metadata" "variables"
There are many levels to the data returned. For example, we can
extract the original data frame containing the iris
dataset
using:
<- iris_by_name$data$original
iris_uci head(iris_uci)
#> sepal length sepal width petal length petal width class
#> 1 5.1 3.5 1.4 0.2 Iris-setosa
#> 2 4.9 3.0 1.4 0.2 Iris-setosa
#> 3 4.7 3.2 1.3 0.2 Iris-setosa
#> 4 4.6 3.1 1.5 0.2 Iris-setosa
#> 5 5.0 3.6 1.4 0.2 Iris-setosa
#> 6 5.4 3.9 1.7 0.4 Iris-setosa
Alternatively, we could retrieve two data frames, one for the features and one for the targets:
<- iris_by_name$data$features
iris_features <- iris_by_name$data$targets iris_targets
We can then view the first few rows of each data frame:
head(iris_features)
#> sepal length sepal width petal length petal width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
head(iris_targets)
#> class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosa
Alternatively, you can also directly query by using an ID found by
using list_available_datasets()
or by looking up the
dataset on the UCI ML Repo website:
# Fetch a dataset by id
<- fetch_ucirepo(id = 53) iris_by_id
We can also view a list of data sets available for download using the
list_available_datasets()
function:
# List available datasets
list_available_datasets()
[!NOTE]
Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.
If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.