Title: | Process and Analyse Data from m-Path Sense |
Version: | 1.2.3 |
Description: | Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically 'm-Path Sense' https://m-path.io. The main task of 'mpathsenser' is therefore to read 'm-Path Sense' JSON files into a database and provide several convenience functions to aid in data processing. |
License: | GPL (≥ 3) |
URL: | https://gitlab.kuleuven.be/ppw-okpiv/researchers/u0134047/mpathsenser/, https://ppw-okpiv.pages.gitlab.kuleuven.be/researchers/u0134047/mpathsenser/ |
BugReports: | https://gitlab.kuleuven.be/ppw-okpiv/researchers/u0134047/mpathsenser/-/issues/ |
Depends: | R (≥ 4.0.0) |
Imports: | DBI, dbplyr, dplyr, furrr, jsonlite, lifecycle, lubridate, purrr, rlang, RSQLite, stats, tibble, tidyr |
Suggests: | cli, curl, ggplot2, httr, kableExtra, knitr, lintr, progressr, rmarkdown, rvest, sodium, spelling, testthat (≥ 3.0.0), vroom |
VignetteBuilder: | knitr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Language: | en-GB |
NeedsCompilation: | no |
Packaged: | 2024-03-29 15:47:37 UTC; u0134047 |
Author: | Koen Niemeijer |
Maintainer: | Koen Niemeijer <koen.niemeijer@kuleuven.be> |
Repository: | CRAN |
Date/Publication: | 2024-03-29 16:00:02 UTC |
mpathsenser: Process and Analyse Data from m-Path Sense
Description
Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically "m-Path Sense" https://m-path.io. The main task of 'mpathsenser' is therefore to read "m-Path Sense" JSON files into a database and provide several convenience functions to aid in data processing.
Author(s)
Maintainer: Koen Niemeijer koen.niemeijer@kuleuven.be (ORCID)
Other contributors:
Kristof Meers kristof.meers@kuleuven.be (ORCID) [contributor]
KU Leuven [copyright holder, funder]
See Also
Useful links:
-
https://gitlab.kuleuven.be/ppw-okpiv/researchers/u0134047/mpathsenser/
-
https://ppw-okpiv.pages.gitlab.kuleuven.be/researchers/u0134047/mpathsenser/
Report bugs at https://gitlab.kuleuven.be/ppw-okpiv/researchers/u0134047/mpathsenser/-/issues/
Get a summary of physical activity (recognition)
Description
Usage
activity_duration(
data = NULL,
db = NULL,
participant_id = NULL,
confidence = 70,
direction = "forward",
start_date = NULL,
end_date = NULL,
by = c("Total", "Day", "Hour")
)
Arguments
data |
A data frame containing the activity data. See get_data for retrieving activity data from an mpathsenser database. |
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
confidence |
The minimum confidence (0-100) that should be assigned to an observation by Activity Recognition. |
direction |
The directionality of the duration calculation, i.e. |
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
by |
Either 'Total', 'Hour', or 'Day' indicating how to summarise the results. |
Value
A tibble containing a column 'activity' and a column 'duration' for the hourly activity duration.
Add gap periods to sensor data
Description
Since there may be many gaps in mobile sensing data, it is pivotal to pay attention to them in
the analysis. This function adds known gaps to data as "measurements", thereby allowing easier
calculations for, for example, finding the duration. For instance, consider a participant spent
30 minutes walking. However, if it is known there is gap of 15 minutes in this interval, we
should somehow account for it. add_gaps
accounts for this by adding the gap data to
sensors data by splitting intervals where gaps occur.
Usage
add_gaps(data, gaps, by = NULL, continue = FALSE, fill = NULL)
Arguments
data |
A data frame containing the data. See |
gaps |
A data frame (extension) containing the gap data. See |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, |
continue |
Whether to continue the measurement(s) prior to the gap once the gap ends. |
fill |
A named list of the columns to fill with default values for the extra measurements that are added because of the gaps. |
Details
In the example of 30 minutes walking where a 15 minute gap occurred (say after 5
minutes), add_gaps()
adds two rows: one after 5 minutes of the start of the interval
indicating the start of the gap(if needed containing values from fill
), and one after 20
minutes of the start of the interval signalling the walking activity. Then, when calculating
time differences between subsequent measurements, the gap period is appropriately accounted
for. Note that if multiple measurements occurred before the gap, they will both be continued
after the gap.
Value
A tibble containing the data and the added gaps.
Warning
Depending on the sensor that is used to identify the gaps (though this is
typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a
small delay between the start of the gap and the actual start of the gap. For example, if the
accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last
accelerometer measurement (so just before the next measurement), the app was killed. However,
within that time other measurements may still have taken place, thereby technically occurring
"within" the gap. This is especially important if you want to use these gaps in
add_gaps
since this issue may lead to erroneous results.
An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.
See Also
identify_gaps()
for finding gaps in the sampling; link_gaps()
for linking gaps to
ESM data, analogous to link()
.
Examples
# Define some data
dat <- data.frame(
participant_id = "12345",
time = as.POSIXct(c("2022-05-10 10:00:00", "2022-05-10 10:30:00", "2022-05-10 11:30:00")),
type = c("WALKING", "STILL", "RUNNING"),
confidence = c(80, 100, 20)
)
# Get the gaps from identify_gaps, but in this example define them ourselves
gaps <- data.frame(
participant_id = "12345",
from = as.POSIXct(c("2022-05-10 10:05:00", "2022-05-10 10:50:00")),
to = as.POSIXct(c("2022-05-10 10:20:00", "2022-05-10 11:10:00"))
)
# Now add the gaps to the data
add_gaps(
data = dat,
gaps = gaps,
by = "participant_id"
)
# You can use fill if you want to get rid of those pesky NA's
add_gaps(
data = dat,
gaps = gaps,
by = "participant_id",
fill = list(type = "GAP", confidence = 100)
)
Alias column names of a data frame containing sensor data
Description
This function is used to remap the column names of a data frame containing sensor data to the
column names used in the database schema of mpathsenser. As the column names have already
changed several times over the year, we use these functions so we can easily add new aliases of
the existing columns. The function is generic and dispatches to the specific method for the
class of data
.
Usage
alias_column_names(data, ...)
Arguments
data |
A data frame for which the column names are remapped according to the specific method
for this data frame. As such, |
... |
These dots are for future extensions and must be empty. |
Details
Aliasing of the column names depends on the specific method, but column names are only changed if they have an alias available. Otherwise, they are kept unchanged.
Value
A data frame with the column names remapped according to the specific method for this data frame.
Examples
x <- data.frame(
id = 1:3,
timestamp = as.POSIXct(c("2021-01-01 00:00:00", "2021-01-01 00:00:01", "2021-01-01 00:00:02")),
xm = c(1, 2, 3),
ym = c(4, 5, 6),
zm = c(7, 8, 9)
)
class(x) <- c("accelerometer", class(x))
mpathsenser:::alias_column_names(x)
Find the category of an app on the Google Play Store
Description
This function scrapes the Google Play Store by using name
as the search term. From there
it selects the first result in the list and its corresponding category and package name.
Usage
app_category(name, num = 1, rate_limit = 5, exact = TRUE)
Arguments
name |
The name of the app to search for. |
num |
Which result should be selected in the list of search results. Defaults to one. |
rate_limit |
The time interval to keep between queries, in seconds. If the rate limit is too low, the Google Play Store may reject further requests or even ban your entirely. |
exact |
In m-Path Sense, the app names of the AppUsage sensor are the last part of the app's
package names. When |
Value
A list containing the following fields:
package | the package name that was selected from the Google Play search |
genre | the corresponding genre of this package |
Warning
Do not abuse this function or you will be banned by the Google Play Store. The minimum delay between requests seems to be around 5 seconds, but this is untested. Also make sure not to do batch lookups, as many subsequent requests will get you blocked as well.
Examples
app_category("whatsapp")
# Example of a generic app name where we can't find a specific app
app_category("weather") # Weather forecast channel
# Get OnePlus weather
app_category("net.oneplus.weather")
Get app usage per hour
Description
This function extracts app usage per hour for either one or multiple participants. If multiple days are selected, the app usage time is averaged.
Usage
app_usage(
db,
participant_id = NULL,
start_date = NULL,
end_date = NULL,
by = c("Total", "Day", "Hour")
)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
by |
Either 'Total', 'Hour', or 'Day' indicating how to summarise the results. |
Value
A data frame containing a column 'app' and a column 'usage' for the hourly app usage.
Create bins in variable time series
Description
In time series with variable measurements, an often recurring task is calculating the total time spent (i.e. the duration) in fixed bins, for example per hour or day. However, this may be difficult when two subsequent measurements are in different bins or span over multiple bins.
Usage
bin_data(
data,
start_time,
end_time,
by = c("sec", "min", "hour", "day"),
fixed = TRUE
)
Arguments
data |
A data frame or tibble containing the time series. |
start_time |
The column name of the start time of the interval, a POSIXt. |
end_time |
The column name of the end time of the interval, a POSIXt. |
by |
A binning specification. |
fixed |
Whether to create fixed bins. If |
Value
A tibble containing the group columns (if any), date, hour (if by = "hour"
), and
the duration in seconds.
See Also
link_gaps()
for linking gaps to data.
Examples
library(dplyr)
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = "WALKING"
)
# get bins per hour, even if the interval is longer than one hour
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
# Alternatively, you can give an integer value to by to create custom-sized
# bins, but only if fixed = FALSE. Not that these bins are not rounded to,
# as in this example 30 minutes, but rather depends on the earliest time
# in the group.
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = 1800L,
fixed = FALSE
)
# More complicated data for showcasing grouping:
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = c("STILL", "WALKING", "STILL", "WALKING")
)
# binned_intervals also takes into account the prior grouping structure
out <- data |>
mutate(datetime = as.POSIXct(datetime)) |>
group_by(participant_id) |>
mutate(lead = lead(datetime)) |>
group_by(participant_id, type) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
print(out)
# To get the duration for each bin (note to change the variable names in sum):
purrr::map_dbl(
out$bin_data,
~ sum(as.double(.x$lead) - as.double(.x$datetime),
na.rm = TRUE
)
)
# Or:
out |>
tidyr::unnest(bin_data, keep_empty = TRUE) |>
mutate(duration = .data$lead - .data$datetime) |>
group_by(bin, .add = TRUE) |>
summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")
Copy mpathsenser zip files to a new location
Description
Copy zip files from a source destination to an origin destination where they do not yet exist. That is, it only updates the target folder from the source folder.
Usage
ccopy(from, to, recursive = TRUE)
Arguments
from |
A path to copy files from. |
to |
A path to copy files to. |
recursive |
Should files from subdirectories be copied? |
Value
A message indicating how many files were copied.
Examples
## Not run:
ccopy("K:/data/myproject/", "~/myproject")
## End(Not run)
Close a database connection
Description
This is a convenience function that is simply a wrapper around DBI::dbDisconnect()
.
Usage
close_db(db)
Arguments
db |
A database connection to an m-Path Sense database. |
Value
Returns invisibly regardless of whether the database is active, valid, or even exists.
See Also
open_db()
for opening an mpathsenser database.
Examples
# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
# Then close it
close_db(db)
# You can even try to close a database that is already closed. This will not trigger an error.
close_db(db)
# Cleanup
file.remove(file.path(tempdir(), "mydb.db"))
Copy (a subset of) a database to another database
Description
Usage
copy_db(source_db, target_db, sensor = "All")
Arguments
source_db |
A mpathsenser database connection from where the data will be transferred. |
target_db |
A mpathsenser database connection where the data will be transferred to.
|
sensor |
A character vector containing one or multiple sensors. See
|
Value
Returns TRUE
invisibly, called for side effects.
Examples
# First create two databases in a temporary directory
db1 <- create_db(tempdir(), "mydb1.db")
db2 <- create_db(tempdir(), "mydb2.db")
# Populate the first database with some data
DBI::dbExecute(db1, "INSERT INTO Study VALUES ('study_1', 'default')")
DBI::dbExecute(db1, "INSERT INTO Participant VALUES ('1', 'study_1')")
DBI::dbExecute(db1, "INSERT INTO Activity VALUES(
'123', '1', '2024-01-01', '08:00:00', '100', 'WALKING')")
# Then copy the first database to the second database
copy_db(db1, db2)
# Check that the second database has the same data as the first database
get_data(db2, "Activity")
# Cleanup
close_db(db1)
close_db(db2)
file.remove(file.path(tempdir(), "mydb1.db"))
file.remove(file.path(tempdir(), "mydb2.db"))
Create a coverage chart of the sampling rate
Description
Only applicable to non-reactive sensors with 'continuous' sampling
Usage
coverage(
db,
participant_id,
sensor = NULL,
frequency = mpathsenser::freq,
relative = TRUE,
offset = "None",
start_date = NULL,
end_date = NULL,
plot = deprecated()
)
Arguments
db |
A valid database connection. Schema must be that as it is created by open_db. |
participant_id |
A character string of one participant ID. |
sensor |
A character vector containing one or multiple sensors. See
|
frequency |
A named numeric vector with sensors as names and the number of expected samples per hour |
relative |
Show absolute number of measurements or relative to the expected number? Logical value. |
offset |
Currently not used. |
start_date |
A date (or convertible to a date using |
end_date |
A date (or convertible to a date using |
plot |
|
Value
A ggplot of the coverage results if plot
is TRUE
or a tibble containing the
hour, type of measure (i.e. sensor), and (relative) coverage.
Examples
## Not run:
freq <- c(
Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements.
AirQuality = 1,
AppUsage = 2, # Once every 30 minutes
Bluetooth = 60, # Once per minute. Can have multiple measurements.
Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements.
Light = 360, # Once per 10 seconds
Location = 60, # Once per 60 seconds
Memory = 60, # Once per minute
Noise = 120,
Pedometer = 1,
Weather = 1,
Wifi = 60 # once per minute
)
coverage(
db = db,
participant_id = "12345",
sensor = c("Accelerometer", "Gyroscope"),
frequency = mpathsenser::freq,
start_date = "2021-01-01",
end_date = "2021-05-01"
)
## End(Not run)
Create a new mpathsenser database
Description
Usage
create_db(path = getwd(), db_name = "sense.db", overwrite = FALSE)
Arguments
path |
The path to the database. |
db_name |
The name of the database. |
overwrite |
In case a database with |
Value
A database connection using prepared database schemas.
Examples
# Create a new database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
# You can also create an in-memory database
db2 <- create_db(path = NULL, ":memory:")
# Cleanup
close_db(db)
close_db(db2)
file.remove(file.path(tempdir(), "mydb.db"))
Decrypt GPS data from a curve25519 public key
Description
By default, the latitude and longitude of the GPS data collected by m-Path Sense are encrypted
using an asymmetric curve25519 key to provide extra protection for these highly sensitive data.
This function takes a character vector and decrypts its longitude and latitude columns using the
provided key
.
Usage
decrypt_gps(data, key, ignore = ":")
Arguments
data |
A character vector containing hexadecimal (i.e. encrypted) data. |
key |
A curve25519 private key. |
ignore |
A string with characters to ignore from |
Value
A vector of doubles of the decrypted GPS coordinates.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Examples
library(dplyr)
library(sodium)
# Create some GPS coordinates.
data <- data.frame(
participant_id = "12345",
time = as.POSIXct(c(
"2022-12-02 12:00:00",
"2022-12-02 12:00:01",
"2022-12-02 12:00:02"
)),
longitude = c("50.12345", "50.23456", "50.34567"),
latitude = c("4.12345", "4.23456", "4.345678")
)
# Generate keypair
key <- sodium::keygen()
pub <- sodium::pubkey(key)
# Encrypt coordinates with pubkey
# You do not need to do this for m-Path Sense
# as this is already encrypted
encrypt <- function(data, pub) {
data <- lapply(data, charToRaw)
data <- lapply(data, function(x) sodium::simple_encrypt(x, pub))
data <- lapply(data, sodium::bin2hex)
data <- unlist(data)
data
}
data$longitude <- encrypt(data$longitude, pub)
data$latitude <- encrypt(data$latitude, pub)
# Once the data has been collected, decrypt it using decrypt_gps().
data |>
mutate(longitude = decrypt_gps(longitude, key)) |>
mutate(latitude = decrypt_gps(latitude, key))
Get the device info for one or more participants
Description
Usage
device_info(db, participant_id = NULL)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
Value
A tibble containing device info for each participant
Examples
## Not run:
# Open the database
db <- open_db("path/to/db")
# Get device info for all participants
device_info(db)
# Get device info for a specific participant
device_info(db, participant_id = 1)
## End(Not run)
Extract the date of the first entry
Description
A helper function for extracting the first date of entry of (of one or all participant) of one sensor. Note that this function is specific to the first date of a sensor. After all, it wouldn't make sense to extract the first date for a participant of the accelerometer, while the first device measurement occurred a day later.
Usage
first_date(db, sensor, participant_id = NULL)
Arguments
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
Value
A string in the format 'YYYY-mm-dd' of the first entry date.
Examples
## Not run:
db <- open_db()
first_date(db, "Accelerometer", "12345")
## End(Not run)
Fix the end of JSON files
Description
When copying data directly coming from m-Path Sense, JSON files are sometimes corrupted due to the app not properly closing them. This function attempts to fix the most common problems associated with improper file closure by m-Path Sense.
Usage
fix_jsons(path = getwd(), files = NULL, recursive = TRUE)
Arguments
path |
The path name of the JSON files. |
files |
Alternatively, a character list of the input files |
recursive |
Should the listing recurse into directories? |
Details
There are two distinct problems this functions tries to tackle. First of all, there are often
bad file endings (e.g. no ]
) because the app was closed before it could properly close
the file. There are several cases that may be wrong (or even multiple), so it unclear what the
precise problems are. As this function is experimental, it may even make it worse by accidentally
inserting an incorrect file ending.
Secondly, in rare scenarios there are illegal ASCII characters in the JSON files. Not often does this happen, and it is likely because of an OS failure (such as a flush error), a disk failure, or corrupted data during transmit. Nevertheless, these illegal characters make the file completely unreadable. Fortunately, they are detected correctly by test_jsons, but they cannot be imported by import. This functions attempts to surgically remove lines with illegal characters, by removing that specific line as well as the next line, as this is often a comma. It may therefore be too liberal in its approach – cutting away more data than necessary – or not liberal enough when the corruption has spread throughout multiple lines. Nevertheless, it is a first step in removing some straightforward corruption from files so that only a small number may still need to be fixed by hand.
Value
A message indicating how many files were fixed, and the number of fixed files invisibly.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Progress
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
Examples
## Not run:
future::plan("multisession")
files <- test_jsons()
fix_jsons(files = files)
## End(Not run)
Measurement frequencies per sensor
Description
A numeric vector containing (an example) of example measurement frequencies per sensor.
Such input is needed for coverage()
.
Usage
freq
Format
An object of class numeric
of length 11.
Value
This vector contains the following information:
Sensor | Frequency (per hour) | Full text |
Accelerometer | 720 | Once per 5 seconds. Can have multiple instances. |
AirQuality | 1 | Once per hour. |
AppUsage | 2 | Once every 30 minutes. Can have multiple instances. |
Bluetooth | 12 | Once every 5 minutes. Can have multiple instances. |
Gyroscope | 720 | Once per 5 seconds. Can have multiple instances. |
Light | 360 | Once per 10 seconds. |
Location | 60 | Once every 60 seconds. |
Memory | 60 | Once per minute |
Noise | 120 | Once every 30 seconds. Microphone cannot be used in the background in Android 11. |
Weather | 1 | Once per hour. |
Wifi | 60 | Once per minute. |
Examples
freq
Reverse geocoding with latitude and longitude
Description
This functions allows you to extract information about a place based on the latitude and longitude from the OpenStreetMaps nominatim API.
Usage
geocode_rev(lat, lon, zoom = 18, email = "", rate_limit = 1, format = "jsonv2")
Arguments
lat |
The latitude of the location (in degrees) |
lon |
The longitude of the location (in degrees) |
zoom |
The desired zoom level from 1-18. The lowest level, 18, is building level. |
email |
If you are making large numbers of request please include an appropriate email address to identify your requests. See Nominatim's Usage Policy for more details. |
rate_limit |
The time interval to keep between queries, in seconds. If the rate limit is too low, OpenStreetMaps may reject further requests or even ban your entirely. |
format |
The format of the response. Either "jsonv2", "geojson", or"geocodejson". See Nomatims documentation for more details. |
Value
A list of information about the location. See Nominatim's documentation
for more details. The response may also be an error message in case of API errors, or NA
if
the client or API is offline.
Warning
Do not abuse this function or you will be banned by OpenStreetMap. The maximum number of requests is around 1 per second. Also make sure not to do too many batch lookups, as many subsequent requests will get you blocked as well.
Examples
# Frankfurt Airport
geocode_rev(50.037936, 8.5599631)
Extract data from an m-Path Sense database
Description
This is a convenience function to help extract data from an m-Path sense database.
Usage
get_data(db, sensor, participant_id = NULL, start_date = NULL, end_date = NULL)
Arguments
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
Details
Note that this function returns a lazy (also called remote) tibble
. This means that
the data is not actually in R until you call a function that pulls the data from the database.
This is useful for various functions in this package that work with a lazy tibble, for example
identify_gaps()
. You may manually want to modify this lazy tibble
by using dplyr
functions such as dplyr::filter()
or dplyr::mutate()
before pulling the data into R. These
functions will be executed in-database, and will therefore be much faster than having to first
pull all data into R and then possibly removing a large part of it. Importantly, data can
pulled into R using dplyr::collect()
.
Value
A lazy tbl
containing the requested data.
Examples
## Not run:
# Open a database
db <- open_db()
# Retrieve some data
get_data(db, "Accelerometer", "12345")
# Or within a specific window
get_data(db, "Accelerometer", "12345", "2021-01-01", "2021-01-05")
## End(Not run)
Get the number of rows per sensor in a mpathsenser database
Description
Usage
get_nrows(
db,
sensor = "All",
participant_id = NULL,
start_date = NULL,
end_date = NULL
)
Arguments
db |
db A database connection, as created by |
sensor |
A character vector of one or multiple vectors. Use |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be
convertible to date using |
end_date |
Optional search window specifying date where to end search. Must be convertible
to date using |
Value
A named vector containing the number of rows for each sensor.
Examples
## Not run:
# Open a database connection
db <- open_db("path/to/db")
# Get the number of rows for all sensors
get_nrows(db, sensor = NULL)
# Get the number of rows for the Accelerometer and Gyroscope sensors
get_nrows(db, c("Accelerometer", "Gyroscope"))
# Remember to close the connection
close_db(db)
## End(Not run)
Get all participants
Description
Usage
get_participants(db, lazy = FALSE)
Arguments
db |
db A database connection, as created by |
lazy |
Whether to evaluate lazily using dbplyr. |
Value
A data frame containing all participant_id
and study_id
.
Examples
# Create a database
db <- create_db(tempdir(), "mydb.db")
# Add some participants
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")
DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')")
# Get the participants
get_participants(db)
# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))
Get all processed files from a database
Description
Usage
get_processed_files(db)
Arguments
db |
A database connection, as created by |
Value
A data frame containing the file_name
, participant_id
, and study_id
of the
processed files.
Examples
# Create a database
db <- create_db(tempdir(), "mydb.db")
# Add some processed files
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")
DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')")
DBI::dbExecute(db, "INSERT INTO ProcessedFiles VALUES('file1', 'participant1', 'study1')")
# Get the processed files
get_processed_files(db)
# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))
Get all studies
Description
Usage
get_studies(db, lazy = FALSE)
Arguments
db |
db A database connection, as created by |
lazy |
Whether to evaluate lazily using dbplyr. |
Value
A data frame containing all studies.
Examples
# Create a database
db <- create_db(tempdir(), "mydb.db")
# Add some studies
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")
# Get the studies
get_studies(db)
# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))
Calculate the Great-Circle Distance between two points in kilometers
Description
Calculate the great-circle distance between two points using the Haversine function.
Usage
haversine(lat1, lon1, lat2, lon2, r = 6371)
Arguments
lat1 |
The latitude of point 1 in degrees. |
lon1 |
The longitude of point 1 in degrees. |
lat2 |
The latitude of point 2 in degrees. |
lon2 |
The longitude of point 2 in degrees. |
r |
The average earth radius. |
Value
A numeric value of the distance between point 1 and 2 in kilometers.
Examples
fra <- c(50.03333, 8.570556) # Frankfurt Airport
ord <- c(41.97861, -87.90472) # Chicago O'Hare International Airport
haversine(fra[1], fra[2], ord[1], ord[2]) # 6971.059 km
Identify gaps in mpathsenser mobile sensing data
Description
Oftentimes in mobile sensing, gaps appear in the data as a result of the participant accidentally closing the app or the operating system killing the app to save power. This can lead to issues later on during data analysis when it becomes unclear whether there are no measurements because no events occurred or because the app quit in that period. For example, if no screen on/off event occur in a 6-hour period, it can either mean the participant did not turn on their phone in that period or that the app simply quit and potential events were missed. In the latter case, the 6-hour missing period has to be compensated by either removing this interval altogether or by subtracting the gap from the interval itself (see examples).
Usage
identify_gaps(
db,
participant_id = NULL,
min_gap = 60,
sensor = "Accelerometer"
)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
min_gap |
The minimum time (in seconds) passed between two subsequent measurements for it to be considered a gap. |
sensor |
One or multiple sensors. See sensors for a list of available sensors. |
Details
While any sensor can be used for identifying gaps, it is best to choose a sensor with a
very high, near-continuous sample rate such as the accelerometer or gyroscope. This function
then creates time between two subsequent measurements and returns the period in which this time
was larger than min_gap
.
Note that the from
and to
columns in the output are character vectors in UTC
time.
Value
A tibble containing the time period of the gaps. The structure of this tibble is as follows:
participant_id | the participant_id of where the gap occurred |
from | the time of the last measurement before the gap |
to | the time of the first measurement after the gap |
gap | the time passed between from and to, in seconds |
Warning
Depending on the sensor that is used to identify the gaps (though this is
typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a
small delay between the start of the gap and the actual start of the gap. For example, if the
accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last
accelerometer measurement (so just before the next measurement), the app was killed. However,
within that time other measurements may still have taken place, thereby technically occurring
"within" the gap. This is especially important if you want to use these gaps in
add_gaps
since this issue may lead to erroneous results.
An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.
Examples
## Not run:
# Find the gaps for a participant and convert to datetime
gaps <- identify_gaps(db, "12345", min_gap = 60) |>
mutate(across(c(to, from), ymd_hms)) |>
mutate(across(c(to, from), with_tz, "Europe/Brussels"))
# Get some sensor data and calculate a statistic, e.g. the time spent walking
# You can also do this with larger intervals, e.g. the time spent walking per hour
walking_time <- get_data(db, "Activity", "12345") |>
collect() |>
mutate(datetime = ymd_hms(paste(date, time))) |>
mutate(datetime = with_tz(datetime, "Europe/Brussels")) |>
arrange(datetime) |>
mutate(prev_time = lag(datetime)) |>
mutate(duration = datetime - prev_time) |>
filter(type == "WALKING")
# Find out if a gap occurs in the time intervals
walking_time |>
rowwise() |>
mutate(gap = any(gaps$from >= prev_time & gaps$to <= datetime))
## End(Not run)
Import m-Path Sense files into a database
Description
Import JSON files from m-Path Sense into a structured database. This function is the bread and
butter of this package, as it populates the database with data that most of the other functions
in this package use. It is recommend to first run test_jsons()
and, if necessary,
fix_jsons()
to repair JSON files with problematic syntax.
Usage
import(
path = getwd(),
db,
sensors = NULL,
batch_size = 24,
backend = "RSQLite",
recursive = TRUE
)
Arguments
path |
The path to the file directory |
db |
Valid database connection, typically created by |
sensors |
Select one or multiple sensors as in |
batch_size |
The number of files that are to be processed in a single batch. |
backend |
Name of the database backend that is used. Currently, only RSQLite is supported. |
recursive |
Should the listing recurse into directories? |
Details
import
allows you to specify which sensors to import (even though there may be more in
the files) and it also allows batching for a speedier writing process. If processing in
parallel is active, it is recommended that batch_size
be a scalar multiple of the number of
CPU cores the parallel cluster can use. If a single JSON file in the batch causes and error,
the batch is terminated (but not the function) and it is up to the user to fix the file. This
means that if batch_size
is large, many files will not be processed. Set batch_size
to 1
for sequential (one-by-one) file processing.
Currently, only SQLite is supported as a backend. Due to its concurrency restriction, parallel
processing works for cleaning the raw data, but not for importing it into the database. This is
because SQLite does not allow multiple processes to write to the same database at the same
time. This is a limitation of SQLite and not of this package. However, while files are
processing individually (and in parallel if specified), writing to the database happens for the
entire batch specified by batch_size
at once. This means that if a single file in the batch
causes an error, the entire batch is skipped. This is to ensure that the database is not left
in an inconsistent state.
Value
A message indicating how many files were imported. If all files were imported successfully, this functions returns an empty string invisibly. Otherwise the file names of the files that were not imported are returned visibly.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Progress
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
See Also
create_db()
for creating a database for import()
to use, close_db()
for closing
this database; index_db()
to create indices on the database for faster future processing, and
vacuum_db()
to shrink the database to its minimal size.
Examples
## Not run:
path <- "some/path"
# Create a database
db <- create_db(path = path, db_name = "my_db")
# Import all JSON files in the current directory
import(path = path, db = db)
# Import all JSON files in the current directory, but do so sequentially
import(path = path, db = db, batch_size = 1)
# Import all JSON files in the current directory, but only the accelerometer data
import(path = path, db = db, sensors = "accelerometer")
# Import all JSON files in the current directory, but only the accelerometer and gyroscope data
import(path = path, db = db, sensors = c("accelerometer", "gyroscope"))
# Remember to close the database
close_db(db)
## End(Not run)
Create indexes for an mpathsenser database
Description
Create indexes for an mpathsenser database on the participant_id
, date
, and a combination
of these variable for all the tables in the database. This will speed up queries that use these
variables in the WHERE
clause.
Usage
index_db(db)
Arguments
db |
A database connection to an m-Path Sense database. |
Value
Returns TRUE
invisibly, called for side effects.
Examples
## Not run:
# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
# Import some files
import(path = "path/to/jsons", db = db)
# Then index it to speed up the database
index_db(db)
## End(Not run)
Get installed apps
Description
Extract installed apps for one or all participants. Contrarily to other get_* functions in this package, start and end dates are not used since installed apps are assumed to be fixed throughout the study.
Usage
installed_apps(db, participant_id = NULL)
Arguments
db |
A database connection to an mpathsenser database. |
participant_id |
A character string identifying a single participant. Use
|
Value
A tibble containing app names.
Examples
## Not run:
db <- open_db()
# Get installed apps for all participants
installed_apps(db)
# Get installed apps for a single participant
installed_apps(db, "12345")
## End(Not run)
Extract the date of the last entry
Description
A helper function for extracting the last date of entry of (of one or all participant) of one sensor. Note that this function is specific to the last date of a sensor. After all, it wouldn't make sense to extract the last date for a participant of the device info, while the last accelerometer measurement occurred a day later.
Usage
last_date(db, sensor, participant_id = NULL)
Arguments
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
Value
A string in the format 'YYYY-mm-dd' of the last entry date.
Examples
## Not run:
db <- open_db()
first_date(db, "Accelerometer", "12345")
## End(Not run)
Link y to the time scale of x
Description
One of the key tasks in analysing mobile sensing data is being able to link it to other data.
For example, when analysing physical activity data, it could be of interest to know how much
time a participant spent exercising before or after an ESM beep to evaluate their stress level.
link()
allows you to map two data frames to each other that are on different time scales,
based on a pre-specified offset before and/or after. This function assumes that both x
and
y
have a column called time
containing DateTimeClasses.
Usage
link(
x,
y,
by = NULL,
time,
end_time = NULL,
y_time,
offset_before = 0,
offset_after = 0,
add_before = FALSE,
add_after = FALSE,
name = "data",
split = by
)
Arguments
x , y |
A pair of data frames or data frame extensions (e.g. a tibble). Both |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, To join by different variables on To join by multiple variables, use a vector with To perform a cross-join (when |
time |
The name of the column containing the timestamps in |
end_time |
Optionally, the name of the column containing the end time in |
y_time |
The name of the column containing the timestamps in |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
add_before |
Logical value. Do you want to add the last measurement before the start of each interval? |
add_after |
Logical value. Do you want to add the first measurement after the end of each interval? |
name |
The name of the column containing the nested |
split |
An optional grouping variable to split the computation by. When working with large
data sets, the computation can grow so large it no longer fits in your computer's working
memory (after which it will probably fall back on the swap file, which is very slow). Splitting
the computation trades some computational efficiency for a large decrease in RAM usage. This
argument defaults to |
Details
y
is matched to the time scale of x
by means of time windows. These time windows are
defined as the period between x - offset_before
and x + offset_after
. Note that either
offset_before
or offset_after
can be 0, but not both. The "interval" of the measurements is
therefore the associated time window for each measurement of x
and the data of y
that also
falls within this period. For example, an offset_before
of
minutes(30)
means to match all data of y
that occurred before each
measurement in x
. An offset_after
of 900 (i.e. 15 minutes) means to match all data of y
that occurred after each measurement in x
. When both offset_before
and offset_after
are
specified, it means all data of y
is matched in an interval of 30 minutes before and 15
minutes after each measurement of x
, thus combining the two arguments.
The arguments add_before
and add_after
let you decide whether you want to add the last
measurement before the interval and/or the first measurement after the interval respectively.
This could be useful when you want to know which type of event occurred right before or after
the interval of the measurement. For example, at offset_before = "30 minutes"
, the data may
indicate that a participant was running 20 minutes before a measurement in x
, However, with
just that information there is no way of knowing what the participant was doing the first 10
minutes of the interval. The same principle applies to after the interval. When add_before
is
set to TRUE
, the last measurement of y
occurring before the interval of x
is added to the
output data as the first row, having the time
of x - offset_before
(i.e. the start
of the interval). When add_after
is set to TRUE
, the first measurement of y
occurring
after the interval of x
is added to the output data as the last row, having the time
of
x + offset_after
(i.e. the end of the interval). This way, it is easier to calculate the
difference to other measurements of y
later (within the same interval). Additionally, an
extra column (original_time
) is added in the nested data
column, which is the original time
of the y
measurement and NULL
for every other observation. This may be useful to check if
the added measurement isn't too distant (in time) from the others. Note that multiple rows may
be added if there were multiple measurements in y
at exactly the same time. Also, if there
already is a row with a timestamp exactly equal to the start of the interval (for add_before = TRUE
) or to the end of the interval (add_after = TRUE
), no extra row is added.
Value
A tibble with the data of x
with a new column data
with the matched data of y
according to offset_before
and offset_after
.
Warning
Note that setting add_before
and add_after
each add one row to each nested
tibble
of the data
column. Thus, if you are only interested in the total count (e.g.
the number of total screen changes), remember to set these arguments to FALSE or make sure to
filter out rows that do not have an original_time
. Simply subtracting 1 or 2 does not work
as not all measurements in x
may have a measurement in y
before or after (and thus no row
is added).
Examples
# Define some data
x <- data.frame(
time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2),
participant_id = c(rep("12345", 3), rep("23456", 3)),
item_one = rep(c(40, 50, 60), 2)
)
# Define some data that we want to link to x
y <- data.frame(
time = rep(seq.POSIXt(as.POSIXct("2021-11-14 12:50:00"), by = "5 min", length.out = 30), 2),
participant_id = c(rep("12345", 30), rep("23456", 30)),
x = rep(1:30, 2)
)
# Now link y within 30 minutes before each row in x
# until the measurement itself:
link(
x = x,
y = y,
by = "participant_id",
time = time,
y_time = time,
offset_before = "30 minutes"
)
# We can also link y to a period both before and after
# each measurement in x.
# Also note that time, end_time and y_time accept both
# quoted names as well as character names.
link(
x = x,
y = y,
by = "participant_id",
time = "time",
y_time = "time",
offset_before = "15 minutes",
offset_after = "15 minutes"
)
# It can be important to also know the measurements
# just preceding the interval or just after the interval.
# This adds an extra column called 'original_time' in the
# nested data, containing the original time stamp. The
# actual timestamp is set to the start time of the interval.
link(
x = x,
y = y,
by = "participant_id",
time = time,
y_time = time,
offset_before = "15 minutes",
offset_after = "15 minutes",
add_before = TRUE,
add_after = TRUE
)
# If you participant_id is not important to you
# (i.e. the measurements are interchangeable),
# you can ignore them by leaving by empty.
# However, in this case we'll receive a warning
# since x and y have no other columns in common
# (except time, of course). Thus, we can perform
# a cross-join:
link(
x = x,
y = y,
by = character(),
time = time,
y_time = time,
offset_before = "30 minutes"
)
# Alternatively, we can specify custom intervals.
# That is, we can create variable intervals
# without using fixed offsets.
x <- data.frame(
start_time = rep(
x = as.POSIXct(c(
"2021-11-14 12:40:00",
"2021-11-14 13:30:00",
"2021-11-14 15:00:00"
)),
times = 2
),
end_time = rep(
x = as.POSIXct(c(
"2021-11-14 13:20:00",
"2021-11-14 14:10:00",
"2021-11-14 15:30:00"
)),
times = 2
),
participant_id = c(rep("12345", 3), rep("23456", 3)),
item_one = rep(c(40, 50, 60), 2)
)
link(
x = x,
y = y,
by = "participant_id",
time = start_time,
end_time = end_time,
y_time = time,
add_before = TRUE,
add_after = TRUE
)
Link two sensors OR one sensor and an external data frame using an mpathsenser database
Description
This function is specific to mpathsenser databases. It is a wrapper around link()
but
extracts data in the database for you. It is now soft deprecated as I feel this function's use
is limited in comparison to link()
.
Usage
link_db(
db,
sensor_one,
sensor_two = NULL,
external = NULL,
external_time = "time",
offset_before = 0,
offset_after = 0,
add_before = FALSE,
add_after = FALSE,
participant_id = NULL,
start_date = NULL,
end_date = NULL,
reverse = FALSE,
ignore_large = FALSE
)
Arguments
db |
A database connection to an m-Path Sense database. |
sensor_one |
The name of a primary sensor. See sensors for a list of available sensors. |
sensor_two |
The name of a secondary sensor. See sensors for a list of
available sensors. Cannot be used together with |
external |
Optionally, specify an external data frame. Cannot be used at the same time as a
second sensor. This data frame must have a column called |
external_time |
The name of the column containing the timestamps in |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
add_before |
Logical value. Do you want to add the last measurement before the start of each interval? |
add_after |
Logical value. Do you want to add the first measurement after the end of each interval? |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
reverse |
Switch |
ignore_large |
Safety override to prevent long wait times. Set to |
Value
A tibble with the data of sensor_one
with a new column data
with the matched data of
either sensor_two
or external
according to offset_before
or offset_after
. The other way
around when reverse = TRUE
.
See Also
Examples
## Not run:
# Open a database
db <- open_db("path/to/db")
# Link two sensors
link_db(db, "accelerometer", "gyroscope", offset_before = 300, offset_after = 300)
# Link a sensor with an external data frame
link_db(db, "accelerometer",
external = my_external_data,
external_time = "time", offset_before = 300, offset_after = 300
)
## End(Not run)
Link gaps to (ESM) data
Description
Gaps in mobile sensing data typically occur when the app is stopped by the operating system or
the user. While small gaps may not pose problems with analyses, greater gaps may cause bias or
skew your data. As a result, gap data should be considered in order to inspect and limit their
influence. This function, analogous to link()
, allows you to connect gaps to other data
(usually ESM/EMA data) within a user-specified time range.
Usage
link_gaps(
data,
gaps,
by = NULL,
offset_before = 0,
offset_after = 0,
raw_data = FALSE
)
Arguments
data |
A data frame or an extension to a data frame (e.g. a tibble). While gap data can be linked to any other type of data, ESM data is most commonly used. |
gaps |
A data frame (extension) containing the gap data. See |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, To join by different variables on To join by multiple variables, use a vector with To perform a cross-join (when |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
raw_data |
Whether to include the raw data (i.e. the matched gap data) to the output as gap_data. |
Value
The original data
with an extra column duration
indicating the gap during within the
interval in seconds (if duration
is TRUE
), or an extra column called gap_data
containing
the gaps within the interval. The function ensures all durations and gap time stamps are within
the range of the interval.
See Also
bin_data()
for linking two sets of intervals to each other; identify_gaps()
for
finding gaps in the sampling; add_gaps()
for adding gaps to sensor data;
Examples
# Create some data
x <- data.frame(
time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2),
participant_id = c(rep("12345", 3), rep("23456", 3)),
item_one = rep(c(40, 50, 60), 2)
)
# Create some gaps
gaps <- data.frame(
from = as.POSIXct(c("2021-11-14 13:00:00", "2021-11-14 14:00:00")),
to = as.POSIXct(c("2021-11-14 13:30:00", "2021-11-14 14:30:00")),
participant_id = c("12345", "23456")
)
# Link the gaps to the data
link_gaps(x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800)
# Link the gaps to the data and include the raw data
link_gaps(
x,
gaps,
by = "participant_id",
offset_before = 0,
offset_after = 1800,
raw_data = TRUE
)
Moving average for values in an mpathsenser database
Description
Usage
moving_average(
db,
sensor,
cols,
n,
participant_id = NULL,
start_date = NULL,
end_date = NULL
)
Arguments
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
cols |
Character vectors of the columns in the |
n |
The number of seconds to average over. The index of the result will be centered compared to the rolling window of observations. |
participant_id |
A character vector identifying one or multiple participants. |
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
Value
A tibble with the same columns as the input, modified to be a moving average.
Examples
## Not run:
path <- system.file("testdata", "test.db", package = "mpathsenser")
db <- open_db(NULL, path)
moving_average(
db = db,
sensor = "Light",
cols = c("mean_lux", "max_lux"),
n = 5, # seconds
participant_id = "12345"
)
close_db(db)
## End(Not run)
Get number of times screen turned on
Description
Usage
n_screen_on(
db,
participant_id,
start_date = NULL,
end_date = NULL,
by = c("Total", "Hour", "Day")
)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
by |
Either 'Total', 'Hour', or 'Day' indicating how to summarise the results. Defaults to total. |
Value
In case grouping is by the total amount, returns a single numeric value. For date and hour grouping returns a tibble with columns 'date' or 'hour' and the number of screen on's 'n'.
Get number of screen unlocks
Description
Usage
n_screen_unlocks(
db,
participant_id,
start_date = NULL,
end_date = NULL,
by = c("Total", "Hour", "Day")
)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
by |
Either 'Total', 'Hour', or 'Day' indicating how to summarise the results. Defaults to total. |
Value
In case grouping is by the total amount, returns a single numeric value. For date and hour grouping returns a tibble with columns 'date' or 'hour' and the number of screen unlocks 'n'.
Open an mpathsenser database.
Description
Usage
open_db(path = getwd(), db_name = "sense.db")
Arguments
path |
The path to the database. Use NULL to use the full path name in db_name. |
db_name |
The name of the database. |
Value
A connection to an mpathsenser database.
See Also
close_db()
for closing a database; copy_db()
for copying (part of) a database;
index_db()
for indexing a database; get_data()
for extracting data from a database.
Examples
# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
close_db(db)
DBI::dbIsValid(db) # db is closed
# Then re-open it
db2 <- open_db(tempdir(), "mydb.db")
DBI::dbIsValid(db2) # db is opened
# Cleanup
close_db(db2)
file.remove(file.path(tempdir(), "mydb.db"))
Plot a coverage overview
Description
Plot a coverage overview
Usage
## S3 method for class 'coverage'
plot(x, ...)
Arguments
x |
A tibble with the coverage data coming from |
... |
Other arguments passed on to methods. Not currently used. |
Value
A ggplot2::ggplot object.
See Also
Examples
## Not run:
freq <- c(
Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements.
AirQuality = 1,
AppUsage = 2, # Once every 30 minutes
Bluetooth = 60, # Once per minute. Can have multiple measurements.
Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements.
Light = 360, # Once per 10 seconds
Location = 60, # Once per 60 seconds
Memory = 60, # Once per minute
Noise = 120,
Pedometer = 1,
Weather = 1,
Wifi = 60 # once per minute
)
data <- coverage(
db = db,
participant_id = "12345",
sensor = c("Accelerometer", "Gyroscope"),
frequency = mpathsenser::freq,
start_date = "2021-01-01",
end_date = "2021-05-01"
)
plot(data)
## End(Not run)
Screen duration by hour or day
Description
Calculate the screen duration time where the screen was unlocked (i.e. not just on).
Usage
screen_duration(
db,
participant_id,
start_date = NULL,
end_date = NULL,
by = c("Hour", "Day")
)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
by |
Either 'Hour' or 'Day' indicating how to summarise the results. Leave empty to get raw screen duration per measurement. |
Value
A tibble with either 'hour' and 'duration' columns or 'date' and 'duration' columns
depending on the by
argument. Alternatively, if no by
is specified, a remote
tibble is returned with the date, time, and duration since the previous measurement.
Available Sensors
Description
A list containing all available sensors in this package you can work with. This variable was created so it is easier to use in your own functions, e.g. to loop over sensors.
Usage
sensors
Format
An object of class character
of length 27.
Value
A character vector containing all sensor names supported by mpathsenser
.
Examples
sensors
Get step count
Description
Extracts the number of steps per hour as sensed by the underlying operating system.
Usage
step_count(db, participant_id = NULL, start_date = NULL, end_date = NULL)
Arguments
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
Value
A tibble with the 'date', 'hour', and the number of 'steps'.
Test JSON files for being in the correct format.
Description
Usage
test_jsons(path = getwd(), files = NULL, db = NULL, recursive = TRUE)
Arguments
path |
The path name of the JSON files. |
files |
Alternatively, a character list of the input files. |
db |
A mpathsenser database connection (optional). If provided, will be used to check which files are already in the database and check only those JSON files which are not. |
recursive |
Should the listing recurse into directories? |
Value
A message indicating whether there were any issues and a character vector of the file names that need to be fixed. If there were no issues, an invisible empty string is returned.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Progress
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
Examples
## Not run:
# Test all files in a directory
test_jsons(path = "path/to/jsons", recursive = FALSE)
# Test all files in a directory and its subdirectories
test_jsons(path = "path/to/jsons", recursive = TRUE)
# Test specific files
test_jsons(files = c("file1.json", "file2.json"))
# Test files in a directory, but skip those that are already in the database
test_jsons(path = "path/to/jsons", db = db)
## End(Not run)
Unpack raw sensor data
Description
This function takes raw sensor data coming from import()
and unpacks it into tidy data frames
so that it can be written to the database. Note that this function is internal and should not
be used for other purposes.
Usage
unpack_sensor_data(data, ...)
Arguments
data |
A data frame containing the raw sensor data. |
... |
These dots are for future extensions and must be empty. |
Value
A data frame with the sensor data unpacked.
Examples
x <- tibble::tibble(
study_id = "test-study",
participant_id = "12345",
data_format = "cams 1.0.0",
start_time = "2021-11-14 16:40:00.123456",
end_time = NULL,
sensor = "Activity",
data = list(list(
confidence = 80,
type = "WALKING"
))
)
class(x) <- c("activity", class(x))
mpathsenser:::unpack_sensor_data(x)
Unzip m-Path Sense output
Description
Similar to unzip, but makes it easier to unzip all files in a given path with one function call.
Usage
unzip_data(path = getwd(), to = NULL, overwrite = FALSE, recursive = TRUE)
Arguments
path |
The path to the directory containing the zip files. |
to |
The output path. |
overwrite |
Logical value whether you want to overwrite already existing zip files. |
recursive |
Logical value indicating whether to unzip files in subdirectories as well. These files will then be unzipped in their respective subdirectory. |
Value
A message indicating how many files were unzipped.
Parallel
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
Progress
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
Examples
## Not run:
# Unzip all files in a directory
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = FALSE)
# Unzip all files in a directory and its subdirectories
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = TRUE)
# Unzip specific files
unzip_data(
path = "path/to/zipfiles",
to = "path/to/unzipped",
files = c("file1.zip", "file2.zip")
)
# Unzip files in a directory, but skip those that are already unzipped
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", overwrite = FALSE)
## End(Not run)
Vacuum a database
Description
This is a convenience function that calls the VACUUM
command on a database. This command will
rebuild the database file, repacking it into a minimal amount of disk space.
Usage
vacuum_db(db)
Arguments
db |
A database connection to an m-Path Sense database. |
Value
a scalar numeric that specifies the number of rows affected by the vacuum.
Examples
# Create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
# Assuming that we have imported some data into the database, we can vacuum it
vacuum_db(db)
# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))