The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Last updated 10 February 2024
Follow the instructions here for the GUI method or here for the command line method. See also the GCS concept cheatsheet for an overview of recommended environment variables.
Pass a single-page pdf or image file to Document AI and get the output immediately:
Requires configuration of
googleCloudStorageR
. Send larger batches for offline
processing in three steps:
The output will be delivered to the same bucket as JSON files.
## Not run:
# Get a dataframe with the bucket contents
contents <- gcs_list_objects()
# Get the names of the JSON output files
jsons <- grep("*.json", contents$name, value = TRUE)
# Download them
map(jsons, ~ gcs_get_object(.x, saveToDisk = basename(.x)))
# Extract the text from the JSON files and save it as .txt files
local_jsons <- basename(jsons)
map(local_jsons, ~ get_text(.x, type = "async", save_to_file = TRUE))
Assuming your pdfs were named sample1.pdf
and
sample2.pdf
, there will now be two files named
sample1-0.txt
and sample2-0.txt
in your
working directory.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.