Troubleshooting#

Containers have started but the application is not working#

You can try resetting the volume storage by deleting the previously created volumes:

Note that this step does not apply when you are setting up the application for the first time.

source setup.sh --clean-data

OpenGL/Mesa Library Dependencies (Certain Kernel Versions)#

On some Linux systems with certain kernel versions, you may encounter OpenCV-related errors due to missing OpenGL/Mesa libraries. If you experience issues with the summary stack or video processing, try installing the following dependencies:

sudo apt update
sudo apt install libgl1-mesa-dri libgl1-mesa-dev

After installing these dependencies:

  1. Remove the ov_models/ directory (if it exists)

  2. Redeploy the VSS stack using the latest tagged images

  3. Rerun your tests

This should resolve OpenCV-related dependency issues and allow the summary stack to work as expected.

Search returns no results after changing embedding model#

Problem: The UI displays No videos found matching your search query. Try using different keywords or check if videos have been uploaded. even though videos were ingested in --search or --all mode.

Cause: Either no videos have been processed yet, or the embedding model was switched to one with a different embedding dimension. Previously indexed vectors stay in the database, and their dimensions must match the active model. A mismatch prevents similarity lookups from returning any results.

Solution:

  1. Verify at least one video has been uploaded or a summary run completed after the model change.

  2. If you recently changed EMBEDDING_MODEL_NAME, re-run ingestion so embeddings are recreated with the new dimensions. You can clean existing data with source setup.sh --clean-data and then re-run your desired mode.

  3. Review the supported embedding models and their dimensions in Supported Models for Multimodal Embedding Serving before switching models.

VLM Microservice Model Loading Issues#

Problem: VLM microservice fails to load or save models with permission errors, or you see errors related to model access in the logs.

Cause: This issue occurs when the ov-models Docker volume was created with incorrect ownership (root user) in previous versions of the application. The VLM microservice runs as a non-root user and requires proper permissions to read/write models.

Symptoms:

  • VLM microservice container fails to start or crashes during model loading

  • Permission denied errors in VLM service logs

  • Model conversion or caching failures

  • Error messages mentioning /home/appuser/.cache/huggingface or /app/ov-model access issues

Solution:

  1. Stop the running application:

    source setup.sh --down
    
  2. Remove the existing ov-models (old volume name) and docker_ov-models (updated volume name) Docker volume:

    docker volume rm ov-models docker_ov-models
    
  3. Restart the application (the volume will be recreated with correct permissions):

    # For Video Summarization
    source setup.sh --summary
    
    # Or for Video Search
    source setup.sh --search
    

Note: Removing the ov-models or docker_ov-models volume will delete any previously cached or converted models. The VLM service will automatically re-download and convert models on the next startup, which may take additional time depending on your internet connection and the model size.

Prevention: This issue has been fixed in the current version of the VLM microservice Dockerfile. New installations will automatically create the volume with correct permissions.

VLM Final Summary Hallucination Issues#

Problem: The final summary generated by the VLM microservice contains hallucinated or inaccurate information that doesn’t reflect the actual video content.

Cause: This issue can occur when using smaller VLM models that may not have sufficient capacity to accurately process and summarize complex video content, leading to generation of plausible but incorrect information.

Symptoms:

  • The final summary contains information not present in the video

  • The Summary describes events, objects, or activities that don’t actually occur in the video

  • Inconsistent or contradictory information in the generated summary

  • The Summary quality is poor despite chunk-wise summaries being accurate

Solution: Try using a larger, more capable VLM model by updating the VLM_MODEL_NAME environment variable:

  1. Stop the running application:

    source setup.sh --down
    
  2. Set a larger VLM model (e.g., upgrade from 3B to 7B parameters):

    export VLM_MODEL_NAME="Qwen/Qwen2.5-VL-7B-Instruct"
    
  3. Restart the application:

    source setup.sh --summary
    

Alternative Models to Try:

  • For CPU: Qwen/Qwen2.5-VL-7B-Instruct (larger version)

  • For GPU: Consider other supported VLM models with higher parameter counts

Note: Larger models will require more system resources (RAM or VRAM) and may have longer inference times, but typically provide more accurate and coherent summaries.

Final Summary Stuck or OVMS Container Stopped#

Problem: The final video summary remains in a “Ready” or “In Progress” state indefinitely, and never completes.

Cause: The OVMS (OpenVINO Model Server) container may have crashed or the LLM request may have been rejected because the prompt size plus the requested max_completion_tokens exceeds the model’s maximum context length. For example, if a model supports a 4096-token context window and the application requests 4000 completion tokens, even a modest prompt (~300 tokens) will exceed the limit.

Symptoms:

  • Final summary status stays at “Ready” or “In Progress” and never progresses

  • OVMS container has exited (shows as stopped in docker ps -a)

  • OVMS logs contain errors like: Number of prompt tokens: <N> + max tokens value: <M> exceeds model max length: <L>

  • OVMS logs contain CL_OUT_OF_RESOURCES or similar GPU memory errors

Diagnosis:

  1. Check if the OVMS container is still running:

    docker ps -a | grep ovms
    
  2. If the container has stopped or is in an exited state, check its logs:

    docker logs <ovms-container-name> 2>&1 | tail -50
    
  3. Look for errors related to token limits or resource exhaustion in the log output.

Solution:

  • If the logs show a token limit exceeded error, either reduce SUMMARIZATION_MAX_COMPLETION_TOKENS in your environment configuration, or switch to a model with a larger context window.

  • If the logs show GPU resource errors, see the section below on GPU memory issues.

  • After fixing the configuration, restart the application:

    source setup.sh --down
    source setup.sh --summary
    

Smaller Models May Block Final Summary Due to Limited Context Window#

Problem: The final video summary fails or hangs when using a smaller VLM/LLM model.

Cause: Smaller models often have a limited context window (e.g., 4096 tokens). When the combined prompt tokens and requested max_completion_tokens exceed this limit, the inference backend rejects the request and the final summary never completes.

Symptoms:

  • Final summary status stays at “Ready” or “In Progress” indefinitely

  • OVMS logs show errors such as: Number of prompt tokens: <N> + max tokens value: <M> exceeds model max length: <L>

  • The chunk-wise summaries complete successfully but the final summary does not

Solution:

Reduce PM_SUMMARIZATION_MAX_COMPLETION_TOKENS to a value below the default of 4000 so that the prompt plus completion tokens fit within the model’s context window:

export PM_SUMMARIZATION_MAX_COMPLETION_TOKENS=2000
source setup.sh --summary

Alternatively, switch to a model with a larger context window.

VLM Workload Fails on NPU#

Problem: The VLM model fails to load or run when VLM_TARGET_DEVICE or LLM_TARGET_DEVICE is set to NPU.

Cause: Not all VLM/LLM models are compatible with NPU execution. NPU support depends on the model architecture and the OpenVINO version installed.

Symptoms:

  • OVMS container crashes or fails to start when targeting NPU

  • Inference errors or unsupported-operation messages in OVMS logs

  • Model conversion succeeds but inference produces errors

Solution:

  1. Verify that your model is listed on the OpenVINO Supported Models page for NPU execution.

  2. If the model is not supported on NPU, switch to a supported model or fall back to CPU/GPU:

    export VLM_TARGET_DEVICE="CPU"
    source setup.sh --summary
    

GPU Out-of-Resources When Loading Multiple Models#

Problem: OVMS crashes or fails inference when multiple models (e.g., VLM + LLM) are loaded on the same GPU.

Cause: Loading multiple large models on a single GPU can exceed the available device memory. When the GPU runs out of resources during inference, the OpenCL runtime returns CL_OUT_OF_RESOURCES and OVMS terminates the request or crashes.

Symptoms:

  • OVMS container exits unexpectedly or restarts repeatedly

  • OVMS logs contain errors like:

    onednn_verbose,v1,primitive,error,ocl,errcode -5,CL_OUT_OF_RESOURCES
    Exception from src/plugins/intel_gpu/src/graph/impls/onednn/primitive_onednn_base.h
    Error occurred in LLM executor
    
  • Inference requests hang and then fail

  • Only one model works at a time but loading both causes failures

Solution:

  1. Distribute models across devices — run the VLM on GPU and the LLM on CPU (or vice versa) to avoid competing for GPU memory. Adjust the device settings in your environment configuration accordingly.

  2. Use smaller model variants — switch to quantized or smaller parameter models that consume less GPU memory.

  3. Increase GPU resources — if available, use a GPU with more memory.

  4. After making changes, restart the application:

    source setup.sh --down
    source setup.sh --summary