The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
httpcache
provides the
HTTP verb functions GET()
, PUT()
, PATCH()
, POST()
, and DELETE()
, which are drop-in
replacements for those in the httr package. GET()
responses are added to the local query cache; PUT()
, PATCH()
, POST()
, and DELETE()
requests trigger cache invalidation on the associated resources. For APIs where a POST
requests is used to send a command that returns content and doesn't modify state (and hence is semantically more of a GET
), you can use cachedPOST()
, which writes to the cache and doesn't invalidate.
To take advantage of these cache-aware functions, all you need to do is load httpcache
instead of httr
, or in package development, import from httpcache
.
library(httpcache)
system.time(a <- GET("https://httpbin.org/get"))
## user system elapsed
## 0.100 0.020 0.571
system.time(b <- GET("https://httpbin.org/get"))
## user system elapsed
## 0.000 0.000 0.001
Notice how the second request returns instantly. It's reading from cache—there's no communication with a remote server, so no network latency and no server processing time. Remember: the fastest API request is the one you don't have to make.
Reading from the query cache yields exactly the same response as if we had contacted the server. We can confirm:
identical(a, b)
## [1] TRUE
How do we know, other than by the faster response, that we're hitting cache and not making server requests?
When designing API clients in R, logging is an invaluable tool for understanding and improving request patterns. As you build layers of abstraction on top of the direct HTTP requests, it can be easy to make inefficient or repetitive requests that degrade performance for your users and impose unnecessary load on your servers. You can't improve what you can't measure, and the logging tools included in httpcache
can help you measure.
Let's clear the cache and repeat that exercise, this time with logging enabled. Use startLog()
to enable the request log. startLog
takes a file or connection argument, which it passes to cat
for log writing. The default, same as for cat
, prints to the standard output—your display, in an interactive session. (See ?cat
for details.)
clearCache()
startLog()
a <- GET("http://httpbin.org/get")
## 2021-01-10T13:16:59.923 HTTP GET http://httpbin.org/get 200 366 0 0.001 0.123 0.123 0.234 0.234
## 2021-01-10T13:16:59.924 CACHE SET http://httpbin.org/get
b <- GET("http://httpbin.org/get")
## 2021-01-10T13:16:59.925 CACHE HIT http://httpbin.org/get
Notice how the first request results in an “HTTP GET” and a “CACHE SET”, while the second one gets a “CACHE HIT” and does not touch “HTTP”. From this log output, we can conclude that the query cache is working.
You can also pass a file name to startLog
. This makes it easier to read the log output back in as a data.frame
and analyze it quantitatively.
You may want to send other events to the log, interspersed with your HTTP requests, whether for their own sake or to see how work done in R outside of the HTTP layer maps onto your server traffic. The function logMessage()
writes to the connection specified by startLog
, and it is available for general use. For error logging, the halt()
function wraps stop
and sends a message to the log (it also makes the awkwardly named call.
argument to stop
default to FALSE
for cleaner error messaging).
As the saying (or joke, depending on the version), cache invalidation is one of the two hard problems in computer science. The trouble with caching what the server serves is that the server is the source of truth, and if the state of data on the server changes, our local copy of the data is stale. In some applications and with some APIs, we have no idea when the server state changes, but in many cases, the source of change on the server is actions that we initiate ourselves. In these cases, a local query cache is more feasible, and cache invalidation more tractable.
httpcache
provides some functions to direct cache invalidation. We've seen one already, clearCache()
, which wipes the entire cache. Other functions give more surgical control. dropOnly()
invalidates cache only for the specified URL. dropPattern()
uses
regular expression matching to invalidate cache. dropCache()
is a
convenience wrapper around dropPattern
that invalidates cache for
any resources that start with the given URL.
Depending on the API with which you're communicating, you may not need to use those cache-invalidation functions directly, or you may need them only infrequently. httpcache
was designed with RESTful APIs in mind, particularly those that expose resources that contain collections of entities that can be created, replaced, updated, and deleted (“CRUD”) with POST, PUT, PATCH, and DELETE, respectively. Consequently, these four HTTP verb functions are built with default cache invalidation actions: POST
invalidates cache only for the request URL (dropOnly
), for the case where POST creates a new entity appearing as a subresource; while PUT
, PATCH
, and DELETE
drop cache for the request URL and everything “below” it (dropCache
).
For example, if GET http://api.example/projects/
returns a catalog of project entities, and POST to http://api.example/projects/
creates a resource at http://api.example/projects/new_id/
, we need to bust cache for the project catalog on POST, but our cached responses for resources such as http://api.example/projects/old_id/
should still be valid. But, if we modify http://api.example/projects/new_id/
, we should invalidate cache for that resource and for other resources appearing as subresources of it, such as http://api.example/projects/new_id/users/
.
These verb functions in httpcache
(POST()
, PUT()
, PATCH()
, and DELETE()
) all take a drop
argument, which defaults as described above. To override them, you can specify a different call other than dropCache(url)
or dropOnly(url)
, or you can pass drop = NULL
and call the cache-invalidation functions directly outside of the request functions. Depending on your API and your usage of it, however, httpcache
's cache management may just work for you with no additional effort.
The query cache you build up in one R session doesn't have to end with it. Use saveCache()
to write out the contents of your cache to a .rds
file. Restore it later with loadCache()
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.