The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

image-annotation

Maximilian Weber

Ollama also supports multimodal models, which can interact with (but not create) images.

We start by loading the package:

library(rollama)

After loading the package, we need to pull a model that can handle images. For example, the llava model. Using pull_model("llava") will download the model, or just load it if it has already been downloaded before.

pull_model("llava")
#> ✔ model llava pulled succesfully

We can use textual and visual input together. For instance, we can ask a question and provide a link to a picture or a local file path, such as images = "/home/user/Pictures/IMG_4561.jpg".

In the first example, we ask the model to describe the logo of this package:

query("Excitedly desscribe this logo", model = "llava",
      images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png")
#> 
#> ── Answer from llava ─────────────────────────────────────────────────
#> The image you've shared is a vibrant and playful logo. At the center
#> of the design, there's an animated character that appears to be a
#> white, cat-like creature with blue eyes and ears. This character
#> seems to be in a relaxed state, laying on its stomach with its head
#> resting comfortably on one arm while the other arm is stretched out,
#> adding to the overall whimsical feel of the logo.
#> 
#> Above this character, there's a blue circular element with some sort
#> of design or text, but it's not clear enough for me to describe.
#> Below the character, the word "ROLLAM" is prominently displayed in
#> bold black letters, suggesting that this could be the name of the
#> entity represented by the logo.
#> 
#> The background of the logo features a light blue color, providing a
#> soft contrast to the central character and text elements. The overall
#> design of the logo suggests it might be for a gaming or
#> entertainment-related company or product, given the animated
#> character and playful aesthetic.

The second example asks a classification question:

query("Which animal is in this image: a llama, dog, or walrus?",
      model = "llava",
      images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png")
#> 
#> ── Answer from llava ─────────────────────────────────────────────────
#> The image features a character that appears to be a llama wearing a
#> blue helmet, lying on grass.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.