šŸ“ Directory Watcher Service Guide#

Overview#

The Directory Watcher service is an automated video ingestion system that monitors a specified directory for new video files and automatically processes them for search indexing. This service is designed to work exclusively with the Video Search mode of the application.

How It Works#

The Directory Watcher service provides the following functionality:

šŸ” File Monitoring#

  • Watches for MP4 files: Monitors a specified directory for new .mp4 video files

  • File size filtering: Only processes files larger than 512KB (524,288 bytes) to avoid incomplete or corrupted files

  • Real-time detection: Automatically detects when files are created or modified in the watched directory

  • Recursive monitoring: Optionally monitors subdirectories when VS_WATCH_DIRECTORY_RECURSIVE=true is set

ā±ļø Debounced Processing#

  • Debounce mechanism: Groups file events together to avoid processing files multiple times during rapid file operations

  • Configurable delay: Uses a configurable debounce time (default: 5 seconds) before processing detected files

  • Batch processing: Processes multiple files together for efficiency

šŸš€ Automatic Upload and Indexing#

  • Two-step process:

    1. Uploads video files to the data preparation service

    2. Generates search embeddings for the uploaded videos

  • Retry mechanism: Implements exponential backoff retry logic (up to 3 attempts) for failed uploads

  • Status tracking: Maintains upload status with total, completed, and pending file counts

šŸ—‚ļø Initial Directory Processing#

  • Bulk upload: Optionally processes all existing MP4 files in the watched directory on startup

  • Batch processing: Processes existing files in batches of 10 for optimal performance

  • Thread-based: Uses separate threads for non-blocking batch uploads

Configuration#

The Directory Watcher service is configured through environment variables. The service is automatically enabled.

Optional Configuration (Override Default Values)#

The following environment variables have default values and only need to be set if you want to override them:

# Path to the directory to watch on the host system (enables directory watcher)
# Default: "edge-ai-libraries/sample-applications/video-search-and-summarization/data"
export VS_WATCHER_DIR="/path/to/your/video/directory"

# Time to wait before processing detected files (in seconds)
# Default: 10
export DEBOUNCE_TIME=10

# Process all existing files in the `VS_WATCHER_DIR` directory on startup
# Default: false
export VS_INITIAL_DUMP=true

# Delete processed files from `VS_WATCHER_DIR` after successful upload
# Default: false
export DELETE_PROCESSED_FILES=true

# Enable recursive monitoring of subdirectories
# Default: false
export VS_WATCH_DIRECTORY_RECURSIVE=true

Note: You only need to export these variables if you want to change the default behavior. The service works with default values when only WATCH_DIRECTORY_HOST_PATH is set.

Usage Instructions#

Setting Up Directory Watching#

  1. Optional: Create a directory on your host system to store video files:

Note: Make sure that the directory is created with normal user access and not as the root user.

mkdir -p /home/user/videos
  1. Optional: Set the required environment variable:

    export VS_WATCHER_DIR="/home/user/videos"
    
  2. Optional: Override default settings if needed:

    # Only set these if you want to change default behavior
    export VS_INITIAL_DUMP=true              # Process existing files on startup
    export DELETE_PROCESSED_FILES=true       # Remove files after processing
    export DEBOUNCE_TIME=15                  # Wait 15 seconds instead of default 10
    export VS_WATCH_DIRECTORY_RECURSIVE=true    # Monitor subdirectories recursively
    
  3. Start the application with directory watching enabled:

    source setup.sh --search
    

Adding Videos for Processing#

Once the service is running, simply copy or move MP4 files to your watched directory:

# Copy videos to the watched directory
cp /path/to/your/videos/*.mp4 /home/user/videos/

# Or move videos
mv /path/to/source/*.mp4 /home/user/videos/

# If recursive monitoring is enabled, you can also organize videos in subdirectories
mkdir -p /home/user/videos/category1 /home/user/videos/category2
cp /path/to/category1/*.mp4 /home/user/videos/category1/
cp /path/to/category2/*.mp4 /home/user/videos/category2/

The service will automatically:

  1. Detect the new files

  2. Wait for the debounce period to ensure file operations are complete

  3. Upload and process the videos for search indexing

  4. Make them available for search queries through the application UI

Monitoring Upload Status#

The Directory Watcher provides upload status information that can be monitored through the application logs:

  • Total files: Total number of files detected for processing

  • Completed files: Number of successfully processed files

  • Pending files: Number of files waiting to be processed

  • Last updated timestamp: When the last file processing operation completed

File Processing Flow#

graph TD
    A[New MP4 file detected] --> B{File size > 512KB?}
    B -->|No| C[Ignore file]
    B -->|Yes| D[Add to processing queue]
    D --> E[Wait for debounce period]
    E --> F[Start batch processing]
    F --> G[Upload video to dataprep service]
    G --> H{Upload successful?}
    H -->|No| I[Retry with exponential backoff]
    I --> H
    H -->|Yes| J[Generate search embeddings]
    J --> K{Embeddings successful?}
    K -->|No| I
    K -->|Yes| L[Mark as completed]
    L --> M{DELETE_PROCESSED_FILES enabled?}
    M -->|Yes| N[Delete original file]
    M -->|No| O[Keep original file]
    N --> P[File ready for search]
    O --> P

Best Practices#

File Management#

  • Use descriptive filenames: This helps with identification and debugging

  • Ensure sufficient disk space: Both for incoming files and potential processing overhead

  • Consider file cleanup: Enable DELETE_PROCESSED_FILES=true if you don’t need to keep original files

Performance Optimization#

  • Adjust debounce time: Increase DEBOUNCE_TIME if you frequently add multiple files simultaneously

  • Monitor system resources: Large video files require significant processing power and memory

  • Batch file additions: Add multiple files at once rather than one-by-one for better efficiency

Monitoring and Troubleshooting#

  • Check application logs for upload status and error messages

  • Verify directory permissions: Ensure the application has read access to the watched directory

  • Confirm network connectivity: Ensure the video upload endpoint is accessible

  • Monitor disk space: Processing requires temporary storage for video analysis

Limitations#

  • MP4 files only: Currently supports only MP4 video format

  • Search mode only: Directory watching is not available in summary or combined modes

  • File size minimum: Files must be larger than 512KB to be processed

  • Local directory only: Watches local filesystem directories, not remote or cloud storage

  • Subdirectory monitoring: By default, only monitors the specified directory. Enable recursive monitoring with VS_WATCH_DIRECTORY_RECURSIVE=true to include subdirectories

Troubleshooting#

Common Issues#

Files not being processed:

  • Verify WATCH_DIRECTORY_HOST_PATH is set and points to an accessible directory

  • Check that files are MP4 format and larger than 512KB

  • Ensure the watched directory path is correct and accessible

  • Review application logs for error messages

Upload failures:

  • Check network connectivity and proxy settings

  • Review retry attempts in the logs

  • Ensure the data preparation service is running

Permission errors:

  • Verify read permissions on the watched directory

  • Check that the container has appropriate file system access

  • Ensure the directory exists and is mounted correctly

High resource usage:

  • Reduce the number of simultaneous file additions

  • Increase DEBOUNCE_TIME to reduce processing frequency

  • Monitor system memory and CPU usage during processing

For additional troubleshooting, refer to the application logs and the main troubleshooting section of the getting started guide.