Troubleshooting#
Containers have started but the application is not working#
You can try resetting the volume storage by deleting the previously created volumes:
Note that this step does not apply when you are setting up the application for the first time.
source setup.sh --clean-data
Search returns no results after changing embedding model#
Problem: The UI displays No videos found matching your search query. Try using different keywords or check if videos have been uploaded. even though videos were ingested in --search or --all mode.
Cause: Either no videos have been processed yet, or the embedding model was switched to one with a different embedding dimension. Previously indexed vectors stay in the database, and their dimensions must match the active model. A mismatch prevents similarity lookups from returning any results.
Solution:
Verify at least one video has been uploaded or a summary run completed after the model change.
If you recently changed
EMBEDDING_MODEL_NAME, re-run ingestion so embeddings are recreated with the new dimensions. You can clean existing data withsource setup.sh --clean-dataand then re-run your desired mode.Review the supported embedding models and their dimensions in microservices/multimodal-embedding-serving/docs/user-guide/supported-models.md before switching models.
VLM Microservice Model Loading Issues#
Problem: VLM microservice fails to load or save models with permission errors, or you see errors related to model access in the logs.
Cause: This issue occurs when the ov-models Docker volume was created with incorrect ownership (root user) in previous versions of the application. The VLM microservice runs as a non-root user and requires proper permissions to read/write models.
Symptoms:
VLM microservice container fails to start or crashes during model loading
Permission denied errors in VLM service logs
Model conversion or caching failures
Error messages mentioning
/home/appuser/.cache/huggingfaceor/app/ov-modelaccess issues
Solution:
Stop the running application:
source setup.sh --down
Remove the existing
ov-models(old volume name) anddocker_ov-models(updated volume name) Docker volume:docker volume rm ov-models docker_ov-models
Restart the application (the volume will be recreated with correct permissions):
# For Video Summarization source setup.sh --summary # Or for Video Search source setup.sh --search
Note: Removing the
ov-modelsordocker_ov-modelsvolume will delete any previously cached or converted models. The VLM service will automatically re-download and convert models on the next startup, which may take additional time depending on your internet connection and the model size.
Prevention: This issue has been fixed in the current version of the VLM microservice Dockerfile. New installations will automatically create the volume with correct permissions.
VLM Final Summary Hallucination Issues#
Problem: The final summary generated by the VLM microservice contains hallucinated or inaccurate information that doesn’t reflect the actual video content.
Cause: This issue can occur when using smaller VLM models that may not have sufficient capacity to accurately process and summarize complex video content, leading to generation of plausible but incorrect information.
Symptoms:
The final summary contains information not present in the video
The Summary describes events, objects, or activities that don’t actually occur in the video
Inconsistent or contradictory information in the generated summary
The Summary quality is poor despite chunk-wise summaries being accurate
Solution:
Try using a larger, more capable VLM model by updating the VLM_MODEL_NAME environment variable:
Stop the running application:
source setup.sh --down
Set a larger VLM model (e.g., upgrade from 3B to 7B parameters):
export VLM_MODEL_NAME="Qwen/Qwen2.5-VL-7B-Instruct"
Restart the application:
source setup.sh --summary
Alternative Models to Try:
For CPU:
Qwen/Qwen2.5-VL-7B-Instruct(larger version)For GPU: Consider other supported VLM models with higher parameter counts
Note: Larger models will require more system resources (RAM or VRAM) and may have longer inference times, but typically provide more accurate and coherent summaries.