Ollama also supports multimodal models, which can interact with (but not create) images.
We start by loading the package:
After loading the package, we need to pull a model that can handle
images. For example, the llava
model. Using pull_model("llava")
will download the model,
or just load it if it has already been downloaded before.
We can use textual and visual input together. For instance, we can
ask a question and provide a link to a picture or a local file path,
such as images = "/home/user/Pictures/IMG_4561.jpg"
.
In the first example, we ask the model to describe the logo of this package:
query("Excitedly desscribe this logo", model = "llava",
images = "https://raw.githubusercontent.com/JBGruber/rollama/master/man/figures/logo.png")
#>
#> ── Answer ────────────────────────────────────────────────────────
#> The logo features an anthropomorphic teddy bear lying down in a
#> grassy field, wearing a blue hat. It appears to be an animal
#> character with a playful and comforting vibe.
#> The background consists of a green hue with some patches of blue,
#> which give the scene a lush, natural ambiance. This unique design
#> effectively conveys a sense of relaxation and fun associated with
#> the brand.
The second example asks a classification question: