2-Tier Hybrid Search Implementation#

Overview#

This document describes the implementation of 2-tier hybrid search for Re-ID (Re-Identification) in the Scene Controller, as specified in ADR-0010.

Architecture: TIER 1 (metadata filtering) + TIER 2 (vector similarity)

VDMS Query Flow:

  sscape_object with semantic metadata (age, gender, color, etc.)
    ↓
  Extract semantic attributes via _extractSemanticMetadata()
    ↓
  sendSimilarityQuery() calls findMatches() with constraints
    ↓
  TIER 1: VDMS applies metadata constraints (exact-match filtering)
    "Find entries where type='Person' AND gender='Female' AND age='22'"
    ↓
  TIER 2: VDMS performs vector similarity on filtered candidates
    "Compute L2 distance between query vector and filtered candidates"
    ↓
  Return top-k matches with metadata

Key Concepts#

Confidence-Based Constraint Filtering (AND-Only)#

The 2-tier implementation uses metadata confidence scores to determine which constraints are applied in TIER 1 filtering. Only high-confidence (≥ 0.8) constraints are used for strict AND filtering. Low-confidence constraints are skipped in TIER 1, allowing TIER 2 vector similarity to handle flexible matching:

High Confidence (≥ 0.8)        Low Confidence (< 0.8)
        ↓                                ↓
    AND Constraint          IGNORED (rely on TIER 2)
        ↓                                ↓
   age = 22                       Skip
   AND gender = Female            ↓
        ↓                    Vector similarity
   TIER 1: Strict            finds matches
   metadata filter           based on embeddings

Why AND for high confidence only (≥ 0.8)?

  • Age + gender from same model (age-gender-recognition-retail-0013) typically both ~0.85-0.95 confidence

  • Combining multiple high-confidence attributes = very reliable (significantly fewer false positives)

  • Query: “Find Person where age=22 AND gender=Female” is specific and highly accurate

  • Reduces false matches by requiring ALL high-confidence attributes to align

Why ignore low confidence (< 0.8)?

  • VDMS limitations: OR constraints across multiple properties are not well-supported

  • Simplified design: Skip low-confidence filtering in TIER 1 entirely

  • TIER 2 vector similarity provides flexible matching instead

  • Query: “Find similar Persons” via vector embedding (ignores low-confidence metadata)

  • Better approach: Rely on embedding distance rather than unreliable metadata

Example:

Query: Person with age=25 (conf 0.92), gender=Male (conf 0.90), eyewear=glasses (conf 0.55)

TIER 1 Filtering: age=25 AND gender=Male (high confidence applied)
                  eyewear=glasses IGNORED (low confidence - below 0.8 threshold)

TIER 2 Matching: Vector similarity finds closest matches among TIER 1 filtered candidates
                 The embedding distance handles eyewear and other low-confidence attributes

Result: "Find strong age-gender matches, refined by vector similarity"

Backward Compatibility#

  • ✅ Objects without metadata continue to work (missing fields handled gracefully)

  • ✅ Old records (without metadata) can coexist with new records (with metadata)

  • ✅ No database migration needed when new metadata fields added

  • ✅ Queries with partial constraints work (omitted fields skip that filtering)

Phase Evolution#

Phase 1 (Current): Initial Semantic Metadata#

  • Person: age, gender, person-attributes

  • Vehicle: color, make, model

  • Automatic extraction via _extractSemanticMetadata()

  • 2-tier queries with metadata filtering

Phase 2: Confidence Scores & Versioning#

  • Store confidence dicts: {"color": 0.95, "make": 0.88}

  • Add model name and versioning metadata: {"model_name": "age_gender", "model_version": "v2.1", "timestamp": "..."}

  • Application-level filtering on complex data types

Phase 3: Spatio-Temporal Tracking#

  • Add position/orientation: {"x": 123.45, "y": 456.78, "orientation": 45.0}

  • Add timestamp: {"timestamp": "2026-02-06T11:37:26.093Z"}

  • Spatial radius queries via application-level post-processing

Environment variables:

  • VDMS_HOSTNAME: VDMS server hostname (default: vdms.scenescape.intel.com)

  • REID_DATABASE: Vector database backend (default: VDMS)

  • VDMS_CONFIDENCE_THRESHOLD: Minimum confidence for applying constraints in TIER 1 (default: 0.8)

    • Values ≥ threshold: Included in AND constraints (strict metadata filtering)

    • Values < threshold: Ignored (rely on TIER 2 vector similarity for flexible matching)

    • Valid range: 0.0 to 1.0

    • Example: Set to 0.7 to include more metadata filters, 0.9 for stricter filtering

Configuring Confidence Threshold#

The confidence threshold determines which metadata constraints are applied in TIER 1 filtering. Only constraints meeting or exceeding the threshold are used. Constraints below the threshold are skipped, allowing vector similarity in TIER 2 to handle the matching:

# In the controller service environment in docker-compose.yml or .env file
VDMS_CONFIDENCE_THRESHOLD=0.85

# Launch controller with custom threshold
docker compose up -d

Example Threshold Selection Guide:

  • 0.7: More metadata constraints applied, higher specificity in TIER 1 (may miss matches due to strict filtering)

  • 0.8: Default balanced approach (recommended for most use cases)

  • 0.9: Only highest-confidence metadata filters applied, rely more on TIER 2 vector similarity (highest recall)

REID Configuration File#

The Scene Controller now supports a dedicated reid-config.json configuration file for managing Re-ID specific settings. This file provides separation of concerns between tracker configuration (motion models, timing parameters) and Re-ID behavior (feature accumulation, database flushing, similarity thresholds).

Configuration File Location#

Place reid-config.json in the controller config directory:

controller/config/reid-config.json

Sample Configuration#

{
  "stale_feature_timeout_secs": 5.0,
  "stale_feature_check_interval_secs": 1.0,
  "feature_accumulation_threshold": 12,
  "feature_slice_size": 10,
  "similarity_threshold": 60
}

Configuration Parameters#

Parameter

Type

Default

Description

stale_feature_timeout_secs

float

5.0

How long (seconds) to accumulate features in memory before flushing to VDMS. Features older than this threshold are persisted to the database for long-term storage.

stale_feature_check_interval_secs

float

1.0

How frequently (seconds) the background timer checks for stale features and flushes them to VDMS. More frequent checks ensure timely database updates.

feature_accumulation_threshold

int

12

Minimum number of quality features required before initiating a similarity query against the database. More features = higher statistical confidence in matching.

feature_slice_size

int

10

When persisting features to VDMS, sample every Nth feature vector from the accumulated set to reduce database bloat. Example: slice_size=10 stores every 10th vector.

similarity_threshold

int

60

Minimum similarity score (0-100) for a match to be considered valid. Higher values = stricter matching.

Using the Configuration File#

Pass the reid-config file path to the Scene Controller:

python scene_controller.py \
  --tracker_config_file controller/config/tracker_config.json \
  --reid_config_file controller/config/reid-config.json \
  --broker mqtt.example.com \
  --resturl http://rest.example.com

Current Implementation Note:

  • stale_feature_timeout_secs, stale_feature_check_interval_secs, feature_accumulation_threshold, feature_slice_size, and similarity_threshold are fully implemented

  • All semantic metadata attributes are currently used for TIER 1 filtering. Selective metadata filtering is planned for Phase 2.

Tuning Recommendations#

For Higher Recall (more matches found):

  • Decrease stale_feature_timeout_secs: 3.0 (flush features sooner, capture recent appearances)

  • Decrease stale_feature_check_interval_secs: 0.5 (check for stale features more frequently)

  • Decrease feature_accumulation_threshold: 8 (query sooner with fewer features)

  • Decrease similarity_threshold: 50 (accept less-perfect matches)

  • Increase feature_slice_size: 20 (store more diverse samples)

For Higher Precision (only confident matches):

  • Increase stale_feature_timeout_secs: 8.0 (accumulate more features before persisting)

  • Increase stale_feature_check_interval_secs: 2.0 (check less frequently, reduce overhead)

  • Increase feature_accumulation_threshold: 16 (require more samples for statistical confidence)

  • Increase similarity_threshold: 75 (stricter matching)

  • Decrease feature_slice_size: 5 (store every 5th feature for better coverage)

Future Extensibility#

The reid-config.json design is extensible for future REID enhancements:

  • Phase 2: Confidence score thresholds per attribute type

  • Phase 3: Model-specific configuration (reid model name, version)

  • Phase 4: Spatio-temporal constraints (spatial radius, time window)

  • Phase 5: Custom feature aggregation strategies

Testing#

Tests should verify:

  1. ✅ Metadata extraction correctly identifies semantic vs generic properties

  2. ✅ TIER 1 filtering works (constraints properly applied)

  3. ✅ TIER 2 similarity works on filtered candidates

  4. ✅ Backward compatibility (queries work with/without metadata)

  5. ✅ Schema flexibility (new metadata fields accepted without code changes)

  6. ✅ Storage and retrieval of metadata with reid vectors

  7. ✅ Stale feature flushing respects configured timeout

  8. ✅ Configuration file loading and parameter application

References#