Home/Docs/Tagging Models

Tagging Models

Models Overview

Tag-AI supports two AI models for generating image tags: local processing with LLaVA and cloud processing with Google's Gemini API.

Each model has distinct advantages and ideal use cases. You can switch between them based on your needs.

Local Tagging (LLaVA)

Overview

LLaVA (Large Language and Vision Assistant) is a local, privacy-focused multimodal model that runs on your own computer.

Technical Details

Typical Tag Output

LLaVA typically generates 15-20 tags per image, focusing on:

Example Tags (Mountain Landscape)

mountain, landscape, snow, trees, forest, sky, cloudy, nature, outdoor, scenery, wilderness, tranquil, valley, peak, rocky, winter, evergreen, alpine, daylight

Cloud Tagging (Gemini)

Overview

Google's Gemini API is a cloud-based vision model offering high-quality image analysis.

Technical Details

Typical Tag Output

Gemini typically generates 30-50 tags per image, with more specific identification of:

Example Tags (Mountain Landscape)

mountain, alpine, snow-capped, evergreen trees, coniferous forest, valley, clouds, overcast, dramatic landscape, wilderness, nature, outdoors, hiking destination, mountain range, rocky terrain, alpine meadow, mountain trail, scenic vista, photography, tranquil scene, panoramic view, natural beauty, backpacking, mountaineering, pristine, conservation, national park, ecological diversity, winter scene, environmental photography

Model Comparison

Feature Local (LLaVA) Cloud (Gemini)
Privacy Excellent (fully local) Limited (cloud processing)
Tag Quality Good Excellent
Tag Quantity 15-20 tags per image 30-50 tags per image
Processing Speed Depends on hardware Depends on internet connection
Resource Usage High (local CPU/GPU) Low (cloud-based)
Internet Required Only for setup Always
Cost Free (after Tag-AI purchase) Free tier or paid API subscription
Rate Limits None (hardware-dependent) Yes (API quota)

Switching Models

To switch between tagging models:

  1. Open the Configuration Editor (Actions → Edit Config)
  2. Scroll to the [script_to_run] section
  3. Set the llm_script value:
    • local - Use LLaVA local processing
    • gemini - Use Google's Gemini API
  4. Click Save

If selecting Gemini, ensure you've configured your API key in the [tagger_gemini] section.

Tagging Process Internals

Image Preparation

Before sending images to either model, Tag-AI:

  1. Resizes the image to reduce memory usage
  2. Converts it to an appropriate format for the model
  3. Optimizes the image for efficient processing

Tagging Prompt

Tag-AI uses specific prompts for each model:

LLaVA Prompt

Please analyze the provided image and follow these steps:
1. Look at the image.
2. List a minimum of 15 distinct tags that capture specific attributes of the image.
3. Return the tags in a single line separated by a comma and a space.

Gemini Prompt

Generate comma-separated detailed tags for this image, describing subject, context, and visual details.

Post-Processing

After receiving tags from either model, Tag-AI:

  1. Validates the tag format
  2. Applies confidence thresholds (for Gemini)
  3. Limits to the maximum number of tags
  4. Stores them in the database

Custom Models

Using Different Ollama Models

Advanced users can modify the configuration to use different local models:

  1. Ensure you have pulled the alternative model with Ollama:
    ollama pull bakllava or ollama pull llava-v1.6-34b
  2. Open the Configuration Editor
  3. In the [tagger_ollama] section, change model_name to your preferred model
  4. Save the changes

Only multimodal vision-language models will work. Pure language models without vision capabilities will fail.

Recommended Alternative Models

Other models can be found on the Ollama model library.