Home/Docs/Tagging Models

Tagging Models

Models Overview

Tag-AI supports two AI models for generating image tags: local processing with LLaVA and cloud processing with Google's Gemini API.

Each model has distinct advantages and ideal use cases. You can switch between them based on your needs.

Local Tagging (LLaVA)

Overview

LLaVA (Large Language and Vision Assistant) is a local, privacy-focused multimodal model that runs on your own computer.

Runs entirely on your local machine
No internet connection required after setup
No data sent to external servers
Complete privacy for sensitive images
Performance depends on your hardware

Technical Details

Model Architecture: Based on the LLaVA open source project
Implementation: Runs through Ollama local server
Size: ~4GB model file
Performance: ~5-10 images per minute with GPU acceleration
GPU Acceleration: Supports NVIDIA CUDA and AMD ROCm

Typical Tag Output

LLaVA typically generates 15-20 tags per image, focusing on:

Main subjects
Scene context
Key visual elements
Colors and lighting
General scene description

Example Tags (Mountain Landscape)

mountain, landscape, snow, trees, forest, sky, cloudy, nature, outdoor, scenery, wilderness, tranquil, valley, peak, rocky, winter, evergreen, alpine, daylight

Cloud Tagging (Gemini)

Overview

Google's Gemini API is a cloud-based vision model offering high-quality image analysis.

Images processed on Google's servers
Internet connection required during processing
More extensive and precise tagging
Utilizes Google's advanced AI capabilities
Requires API key for access

Technical Details

Model: Google Gemini multimodal model
Implementation: API calls to Google's servers
Rate Limiting: Free tier limited to ~15 requests per minute, paid tier up to 2000
Security: Images are transmitted securely but processed on Google's servers
Data Usage: Subject to Google's API terms and privacy policy

Typical Tag Output

Gemini typically generates 30-50 tags per image, with more specific identification of:

Precise objects and subjects
Detailed scene descriptors
Activities and actions
Technical photo aspects
Emotional content and atmosphere

Example Tags (Mountain Landscape)

mountain, alpine, snow-capped, evergreen trees, coniferous forest, valley, clouds, overcast, dramatic landscape, wilderness, nature, outdoors, hiking destination, mountain range, rocky terrain, alpine meadow, mountain trail, scenic vista, photography, tranquil scene, panoramic view, natural beauty, backpacking, mountaineering, pristine, conservation, national park, ecological diversity, winter scene, environmental photography

Model Comparison

Feature	Local (LLaVA)	Cloud (Gemini)
Privacy	Excellent (fully local)	Limited (cloud processing)
Tag Quality	Good	Excellent
Tag Quantity	15-20 tags per image	30-50 tags per image
Processing Speed	Depends on hardware	Depends on internet connection
Resource Usage	High (local CPU/GPU)	Low (cloud-based)
Internet Required	Only for setup	Always
Cost	Free (after Tag-AI purchase)	Free tier or paid API subscription
Rate Limits	None (hardware-dependent)	Yes (API quota)

Switching Models

To switch between tagging models:

Open the Configuration Editor (Actions → Edit Config)
Scroll to the [script_to_run] section
Set the llm_script value:
- local - Use LLaVA local processing
- gemini - Use Google's Gemini API
Click Save

If selecting Gemini, ensure you've configured your API key in the [tagger_gemini] section.

Tagging Process Internals

Image Preparation

Before sending images to either model, Tag-AI:

Resizes the image to reduce memory usage
Converts it to an appropriate format for the model
Optimizes the image for efficient processing

Tagging Prompt

Tag-AI uses specific prompts for each model:

LLaVA Prompt

Please analyze the provided image and follow these steps:
1. Look at the image.
2. List a minimum of 15 distinct tags that capture specific attributes of the image.
3. Return the tags in a single line separated by a comma and a space.

Gemini Prompt

Generate comma-separated detailed tags for this image, describing subject, context, and visual details.

Post-Processing

After receiving tags from either model, Tag-AI:

Validates the tag format
Applies confidence thresholds (for Gemini)
Limits to the maximum number of tags
Stores them in the database

Custom Models

Using Different Ollama Models

Advanced users can modify the configuration to use different local models:

Ensure you have pulled the alternative model with Ollama:
ollama pull bakllava or ollama pull llava-v1.6-34b
Open the Configuration Editor
In the [tagger_ollama] section, change model_name to your preferred model
Save the changes

Only multimodal vision-language models will work. Pure language models without vision capabilities will fail.

Recommended Alternative Models

bakllava - Alternative vision model with different tag styles
llava-v1.6-34b - Larger, more capable model (requires more GPU memory)
cogvlm - Alternative vision-language model with different characteristics

Other models can be found on the Ollama model library.

Tagging Models

On this page

Models Overview

Local Tagging (LLaVA)

Overview

Technical Details

Typical Tag Output

Example Tags (Mountain Landscape)

Cloud Tagging (Gemini)

Overview

Technical Details

Typical Tag Output

Example Tags (Mountain Landscape)

Model Comparison

Switching Models

Tagging Process Internals

Image Preparation

Tagging Prompt

LLaVA Prompt

Gemini Prompt

Post-Processing

Custom Models

Using Different Ollama Models

Recommended Alternative Models