The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The website Pocket is a well-known tool for storing things you find on the internet - in case you want to access them later. Be it a news article, a video on YouTube, an interesting Tweet or just any other URL that might prove useful. In short: Pocket is a tool that contributes to organising oneself. Pocket also offers an API whereby you can basically execute all actions that you can do when using Pocket in your web browser. This is where pocketapi
comes into play. We have created a R wrapper for Pocket’s API that allows you to organise, retrieve and change your items stored in your Pocket account through some convenient R functions. In the following, we explain how to connect to the API via pocketapi
, how the key functions work and how the workflow looks like.
You need to create a Pocket application in Pocket’s developer portal to access your Pocket data. Don’t worry: this app will only be visible to you and only serves the purpose of acquiring the credentials for pocketapi
.
pocketapi
functions:pocketapi function |
what it does | needed permission |
---|---|---|
pocket_get |
get data frame of all your pockets | Retrieve |
all other pocket_* functions |
add new Pocket entries | Modify |
Pocketapi
uses the OAuth2 flow provided by the Pocket Authentication API to get an access token for your App. Because Pocket does not closely follow the OAuth standard, we are not able to provide as smooth an experience as other packages do (e.g. googlesheets4). Instead, the user has to follow the following instructions once to obtain an access token:
Request a request token: req_token <- get_request_token(consumer_key)
Authorize your app by entering the URL created by create_authorize_url
in your browser:
This step is critical: Even if you have authorized your app before and you want to get a new access token, you need to do the authorization in your browser again. Otherwise, the request token will not be authorized to generate an access token!
Important: Never make your consumer_key
and access_token
publicly available – or anyone will be able to access your Pocket!
It is common practice to set API keys in your R environment file so that every time you start R the key is loaded.
All pocketapi
functions access your consumer_key
and access_token
automatically by executing Sys.getenv("POCKET_CONSUMER_KEY")
respectively Sys.getenv("POCKET_ACCESS_TOKEN")
. Alternatively, you can provide an explicit definition of your consumer_key
and access_token
with each function call.
In order to add your key to your environment file, you can use the function edit_r_environ
from the usethis
package:
This will open your .Renviron
file in the RStudio editor. Now, you can add the following lines to it:
POCKET_CONSUMER_KEY="yourkeygoeshere"
POCKET_ACCESS_TOKEN="youraccesstokengoeshere"
Save the file and restart R for the changes to take effect.
If your .Renviron
lives at a non-conventional place, you can also edit it manually using RStudio or your favorite text editor.
If you don’t want to clutter your .Renviron
file, you can also use an .env
file in your project directory together with the dotenv
package. In this case, make sure to never share your .env
file.
pocketapi
Pocketapi
offers seven functions to interact with items in your Pocket account. They can be grouped into three general buckets:
pocket_add()
pocket_get()
pocket_archive()
, pocket_delete()
, pocket_favorite()
, pocket_unfavorite()
, pocket_tag()
If you create an API token for a new Pocket app, you need to decide which permissions this app should have. The Pocket website offers three dimensions of permissions which match the bucket structure mentioned above. If you do not grant a specific permission to your app, it will not be possible to execute the corresponding functions in R. For instance, if you do not grant your app the permission to “modify”, it will not be possible to archive, delete and favorite/unfavorite items as well as modifying the tags associated with your items.
If you want to use the “modify” functions, you should also grant the app the permission to “retrieve” items: All “modify” functions are based on Pocket’s internal item IDs, which you can only know when you retrieve the data first.
The main function to access Pocket data from R is pocket_get()
. You can assign the output of the function to a new object to obtain a data frame where every row represents an object saved in Pocket. The default settings for pocket_get()
are very broad, but it can be adjusted by using different arguments in the function.
pocket_get()
functionThe main arguments of pocket_get()
are also explained in the function’s help file, but below you find a quick overview of the main function arguments for adjusting which content is retrieved from Pocket.
favorite = TRUE
) or on unfavorited items (favorite = FALSE
). Using the default value (favorite = NULL
) means that items are retrieved regardless if they are favorited or not.item_type = NULL
) retrieves all items regardless of type.state = "unread"
) or to filter on items that are already read and archived (state = "archive"
). By default, the function returns all items (state = "all"
), regardless whether they are read or unread. Depending on your goals, you might prefer another option besides this default.The function arguments “consumer_key” and “access_token” are the mandatory account credentials for the API. Please see the previous section for an explanation how to set up your account.
By default, the Pocket API has a limit of 5000 items per API call. This means that the output of pocket_get()
is a data frame with up to 5000 rows, even if there might actually be more items in your Pocket. If you have more items saved in Pocket, you may use some tricks to combine multiple API calls to get a more complete picture (e.g., using the “state” parameter or playing around with favorited/tagged items), but it may not always be possible to retrieve all items if you have an extremely high number of items saved in your Pocket account.
Like many other APIs, Pocket allows only a certain number of API calls per hour (find details here), but usually this limit should not be an issue for pocket_get()
. Note that all API calls, including adding or modifying Pocket items, count towards this hourly limit.
pocket_get()
returns a data frame where each row is an item that has been saved in Pocket. For each item, the following information is saved:
pocket_archive()
or pocket_delete()
), you need to specify the item_id as an input for this function.TRUE
if an item is favorited.0
if the item is unread, 1
if the item is archived, 2
if the item is about to be deleted.TRUE
if Pocket has detected the item to be an article (and not an image or a video). The classification is provided by Pocket and not always 100 percent precise.0
if the item does not contain any image/video, 1
if the item contains an image/video, 2
if the item is an image/video.If you need a more technical description of the data that is returned by the Pocket API, the overview in the developer documentation might help.
pocket_get()
and using itThe following applied example illustrates how to extract data from Pocket via pocket_get()
and use it thereafter in R. Please note that the example assumes that you have already connected a new Pocket application in your developer settings, created the necessary credentials and saved them in your .renviron
file. All these steps are explained in the previous sections of this vignette.
In our first code snippet, we call pocket_get()
, keeping all options to the defaults, and assign its result to a new object pocket_items
. Technically, state = "all"
is also the default, but we specify it to make it easier to understand that we get data for both read and unread items.
The resulting object pocket_items
is now part of our R environment and can be used just like any standard data frame from other sources. It’s a data.frame
/ tibble
, with the latter allowing it to be integrated nicely into workflows based on the tidyverse
packages ^[For more information about the tibble
class, see here. We can also see that we have 53 rows in our data frame, one for each item saved in Pocket. Tabulating the “status” variable shows us the reading status: There are 20 items which are unread (status == 0
) and 33 items which that are archived (status == 1
). (Note: The data is coming from an account that has been set up specifically for the development of this package. It’s very likely that “real” users have many more items in their Pocket accounts.)
class(pocket_items)
#> [1] "tbl_df" "tbl" "data.frame"
nrow(pocket_items)
#> [1] 53
table(pocket_items$status) # reading status
#>
#> 0 1
#> 20 33
The data can be used nicely as input for other R functions. In the code example below, we first show how the data is used as input for tidyverse
packages: We use the dplyr
package to calculate the mean and the standard deviation of the number of words per item, separately for unread and archived items. Then, we demonstrate how the data is used in a base R function - in this case, a t-test that allows us to see that there is no significant difference in the length of unread vs. archived articles.
library(dplyr)
# quick re-code: new "reading_status" variable with labels that are intuitive to understand
pocket_items$reading_status <- recode(pocket_items$status, `0` = "unread", `1` = "archived")
# using dplyr functions on the data: grouping the data and then summarizing it
pocket_items %>%
group_by(reading_status) %>%
summarise(word_count_avg = mean(word_count),
word_count_sd = sd(word_count),
base_group = n())
#> # A tibble: 2 x 4
#> reading_status word_count_avg word_count_sd base_group
#> <chr> <dbl> <dbl> <int>
#> 1 archived 967. 626. 33
#> 2 unread 1167. 834. 20
# using base-R functions on the data: t-test
t.test(word_count ~ reading_status, data = pocket_items)
#>
#> Welch Two Sample t-test
#>
#> data: word_count by reading_status
#> t = -0.92555, df = 31.96, p-value = 0.3616
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -639.9889 240.1071
#> sample estimates:
#> mean in group archived mean in group unread
#> 966.9091 1166.8500
Of course, the data can also serve as input for plotting functions. Below, we use the ggplot2
package to show the distribution of the number of words per item for both unread and archived articles. As already shown in the t-test, the mean for both groups is very similar and the distributions largely overlap, but it seems that the distribution of the unread articles has slightly wider tails.
library(ggplot2)
ggplot(data = pocket_items) +
geom_density(aes(x = word_count, group = reading_status, fill = reading_status), alpha = 0.25) +
labs(title = "Article Length of Read / Unread Items in Pocket",
x = "Word Count",
y = "Density",
fill = "Reading Status",
caption = "Data: Articles from an example account.
\nn=20 unread and n=33 read/archived articles") +
theme_minimal() +
theme(legend.position = "top")
With the function pocket_add()
, {pocketapi}
provides a convenient way of adding new links/items to your Pocket account. The function takes a character vector URLs as compulsory input; it will add these URLs to your account. Additionally, you can specify a variety of arguments. There are two main arguments: tags
allows you to add tags to the items you want to add. Attention: The tags specified in tags
will be added to all elements in the vector! The argument success
(default: TRUE
) outputs success or failure messages for each element (in order to use this argument, your App needs the rights for the GET endpoint, for it uses pocket_get()
in the background).
Here is an example of how this works:
new_urls <- c("https://www.correlaid.org/blog", "https://correlaid.org/about")
pocketapi::pocket_add(add_urls = new_urls, success = FALSE, tags = c("data4good", "correlaid"))
Before adding items using pocket_add()
, it is crucial to beware of two things, since the Pocket API sometimes behaves oddly/unpredictably, especially when it comes to adding items:
The API only accepts URLs that start with http or https, including ://; those starting with only www. will most likely not be added.
In some cases it might be that the URL, once it has been successfully added, is being changed from https to http internally. If you use the success
argument of pocket_add()
, this might lead to false failure messages (the URL will be successfully added, but the function cannot detect it due to the change in the URL).
For further information on pocket_add()
see the function’s documentation.
The pocketapi
package also provides a range of functions for manipulating the items in your Pocket. These are:
pocket_archive()
pocket_delete()
pocket_favorite()
pocket_unfavorite()
pocket_tag()
They usually do not need a lot of arguments, the most important one being the item_ids
. Pocket assigns a unique ID to each item in your Pocket. You can retrive the IDs by using pocket_get()
. The functions pocket_archive
, pocket_delete
, pocket_unfavorite
and pocket_favorite
only need the item_ids
argument (which can be a single one or a vector). They perform the respective action on these items (e.g., delete or unfavorite them).
# First perform pocket_get() to receive a data frame containing your items, including the items' IDs
pocketapi::pocket_delete(item_ids = c("242234", "694834"))
pocket_tag()
is slightly more complex since it can perform a range of actions on the specified items. These include: tags_add
, tags_remove
, tags_replace
, tags_clear
, tag_rename
, and tag_delete
. In addition to the item_ids
argument, this function also needs one of six actions to be performed on the items. You can check out Pocket’s API documentation for more info.
Two arguments, tag_delete
and tag_rename
do not need item_ids
since they change all tags present in your Pocket list (delete them or rename them, respectively). They need the tags
argument instead where you either specify the tags to be deleted or the (one!) tag to be renamed (specifying the old and the new name).
The action tags_clear
, in turn, does not need tags
, since it clears the items of item_id
from all tags.
tags_add
and tags_remove
need both the tags
and the item_ids
arguments. They add (or remove) the tags specified in tags
from the respective items.
tags_replace
also needs tags
and item_ids
. However, it is used to replace all tags of the item(s) with the newly specified tags
.
Here are a few examples on how to manipulate your items in Pocket:
# Adds four new tags to two items
pocketapi::pocket_tag(action_name = "tags_add",
item_ids = c("242234", "694834"),
tags = c("boring", "done_reading", "newspaper", "german"))
# Note: No tags needed, affects all items with tag "german"
pocketapi::pocket_tag(action_name = "tag_delete",
tags = "german")
# Renames the tag "newspaper" into "politics" for all items
pocketapi::pocket_tag(action_name = "tag_rename",
tags = c("newspaper", "politics"))
# Removes the tag "boring" from the two items
pocketapi::pocket_tag(action_name = "tags_remove",
item_ids = c("242234", "694834"),
tags = "boring")
# Replaces all existing tags with these three new ones
pocketapi::pocket_tag(action_name = "tags_replace",
item_ids = c("242234", "694834"),
tags = c("interesting", "economics", "longread"))
# Clears the two items from all tags we have assigned previously
pocketapi::pocket_tag(action_name = "tags_clear",
item_ids = c("242234", "694834"))
If you have any questions remaining regarding Pocket’s API, refer to its documentation.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.