2-Tier Hybrid Search Implementation#
Overview#
This document describes the implementation of 2-tier hybrid search for Re-ID (Re-Identification) in the Scene Controller, as specified in ADR-0010.
Architecture: TIER 1 (metadata filtering) + TIER 2 (vector similarity)
VDMS Query Flow:
sscape_object with semantic metadata (age, gender, color, etc.)
↓
Extract semantic attributes via _extractSemanticMetadata()
↓
sendSimilarityQuery() calls findMatches() with constraints
↓
TIER 1: VDMS applies metadata constraints (exact-match filtering)
"Find entries where type='Person' AND gender='Female' AND age='22'"
↓
TIER 2: VDMS performs vector similarity on filtered candidates
"Compute L2 distance between query vector and filtered candidates"
↓
Return top-k matches with metadata
Key Concepts#
Confidence-Based Constraint Filtering (AND-Only)#
The 2-tier implementation uses metadata confidence scores to determine which constraints are applied in TIER 1 filtering. Only high-confidence (≥ 0.8) constraints are used for strict AND filtering. Low-confidence constraints are skipped in TIER 1, allowing TIER 2 vector similarity to handle flexible matching:
High Confidence (≥ 0.8) Low Confidence (< 0.8)
↓ ↓
AND Constraint IGNORED (rely on TIER 2)
↓ ↓
age = 22 Skip
AND gender = Female ↓
↓ Vector similarity
TIER 1: Strict finds matches
metadata filter based on embeddings
Why AND for high confidence only (≥ 0.8)?
Age + gender from same model (age-gender-recognition-retail-0013) typically both ~0.85-0.95 confidence
Combining multiple high-confidence attributes = very reliable (significantly fewer false positives)
Query: “Find Person where age=22 AND gender=Female” is specific and highly accurate
Reduces false matches by requiring ALL high-confidence attributes to align
Why ignore low confidence (< 0.8)?
VDMS limitations: OR constraints across multiple properties are not well-supported
Simplified design: Skip low-confidence filtering in TIER 1 entirely
TIER 2 vector similarity provides flexible matching instead
Query: “Find similar Persons” via vector embedding (ignores low-confidence metadata)
Better approach: Rely on embedding distance rather than unreliable metadata
Example:
Query: Person with age=25 (conf 0.92), gender=Male (conf 0.90), eyewear=glasses (conf 0.55)
TIER 1 Filtering: age=25 AND gender=Male (high confidence applied)
eyewear=glasses IGNORED (low confidence - below 0.8 threshold)
TIER 2 Matching: Vector similarity finds closest matches among TIER 1 filtered candidates
The embedding distance handles eyewear and other low-confidence attributes
Result: "Find strong age-gender matches, refined by vector similarity"
Backward Compatibility#
✅ Objects without metadata continue to work (missing fields handled gracefully)
✅ Old records (without metadata) can coexist with new records (with metadata)
✅ No database migration needed when new metadata fields added
✅ Queries with partial constraints work (omitted fields skip that filtering)
Phase Evolution#
Phase 1 (Current): Initial Semantic Metadata#
Person: age, gender, person-attributes
Vehicle: color, make, model
Automatic extraction via _extractSemanticMetadata()
2-tier queries with metadata filtering
Phase 2: Confidence Scores & Versioning#
Store confidence dicts:
{"color": 0.95, "make": 0.88}Add model name and versioning metadata:
{"model_name": "age_gender", "model_version": "v2.1", "timestamp": "..."}Application-level filtering on complex data types
Phase 3: Spatio-Temporal Tracking#
Add position/orientation:
{"x": 123.45, "y": 456.78, "orientation": 45.0}Add timestamp:
{"timestamp": "2026-02-06T11:37:26.093Z"}Spatial radius queries via application-level post-processing
Environment variables:
VDMS_HOSTNAME: VDMS server hostname (default:vdms.scenescape.intel.com)REID_DATABASE: Vector database backend (default:VDMS)VDMS_CONFIDENCE_THRESHOLD: Minimum confidence for applying constraints in TIER 1 (default:0.8)Values ≥ threshold: Included in AND constraints (strict metadata filtering)
Values < threshold: Ignored (rely on TIER 2 vector similarity for flexible matching)
Valid range: 0.0 to 1.0
Example: Set to
0.7to include more metadata filters,0.9for stricter filtering
Configuring Confidence Threshold#
The confidence threshold determines which metadata constraints are applied in TIER 1 filtering. Only constraints meeting or exceeding the threshold are used. Constraints below the threshold are skipped, allowing vector similarity in TIER 2 to handle the matching:
# In the controller service environment in docker-compose.yml or .env file
VDMS_CONFIDENCE_THRESHOLD=0.85
# Launch controller with custom threshold
docker compose up -d
Example Threshold Selection Guide:
0.7: More metadata constraints applied, higher specificity in TIER 1 (may miss matches due to strict filtering)0.8: Default balanced approach (recommended for most use cases)0.9: Only highest-confidence metadata filters applied, rely more on TIER 2 vector similarity (highest recall)
REID Configuration File#
The Scene Controller now supports a dedicated reid-config.json configuration file for managing Re-ID specific settings. This file provides separation of concerns between tracker configuration (motion models, timing parameters) and Re-ID behavior (feature accumulation, database flushing, similarity thresholds).
Configuration File Location#
Place reid-config.json in the controller config directory:
controller/config/reid-config.json
Sample Configuration#
{
"stale_feature_timeout_secs": 5.0,
"stale_feature_check_interval_secs": 1.0,
"feature_accumulation_threshold": 12,
"feature_slice_size": 10,
"similarity_threshold": 60
}
Configuration Parameters#
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
float |
5.0 |
How long (seconds) to accumulate features in memory before flushing to VDMS. Features older than this threshold are persisted to the database for long-term storage. |
|
float |
1.0 |
How frequently (seconds) the background timer checks for stale features and flushes them to VDMS. More frequent checks ensure timely database updates. |
|
int |
12 |
Minimum number of quality features required before initiating a similarity query against the database. More features = higher statistical confidence in matching. |
|
int |
10 |
When persisting features to VDMS, sample every Nth feature vector from the accumulated set to reduce database bloat. Example: slice_size=10 stores every 10th vector. |
|
int |
60 |
Minimum similarity score (0-100) for a match to be considered valid. Higher values = stricter matching. |
Using the Configuration File#
Pass the reid-config file path to the Scene Controller:
python scene_controller.py \
--tracker_config_file controller/config/tracker_config.json \
--reid_config_file controller/config/reid-config.json \
--broker mqtt.example.com \
--resturl http://rest.example.com
Current Implementation Note:
stale_feature_timeout_secs,stale_feature_check_interval_secs,feature_accumulation_threshold,feature_slice_size, andsimilarity_thresholdare fully implementedAll semantic metadata attributes are currently used for TIER 1 filtering. Selective metadata filtering is planned for Phase 2.
Tuning Recommendations#
For Higher Recall (more matches found):
Decrease
stale_feature_timeout_secs: 3.0 (flush features sooner, capture recent appearances)Decrease
stale_feature_check_interval_secs: 0.5 (check for stale features more frequently)Decrease
feature_accumulation_threshold: 8 (query sooner with fewer features)Decrease
similarity_threshold: 50 (accept less-perfect matches)Increase
feature_slice_size: 20 (store more diverse samples)
For Higher Precision (only confident matches):
Increase
stale_feature_timeout_secs: 8.0 (accumulate more features before persisting)Increase
stale_feature_check_interval_secs: 2.0 (check less frequently, reduce overhead)Increase
feature_accumulation_threshold: 16 (require more samples for statistical confidence)Increase
similarity_threshold: 75 (stricter matching)Decrease
feature_slice_size: 5 (store every 5th feature for better coverage)
Future Extensibility#
The reid-config.json design is extensible for future REID enhancements:
Phase 2: Confidence score thresholds per attribute type
Phase 3: Model-specific configuration (reid model name, version)
Phase 4: Spatio-temporal constraints (spatial radius, time window)
Phase 5: Custom feature aggregation strategies
Testing#
Tests should verify:
✅ Metadata extraction correctly identifies semantic vs generic properties
✅ TIER 1 filtering works (constraints properly applied)
✅ TIER 2 similarity works on filtered candidates
✅ Backward compatibility (queries work with/without metadata)
✅ Schema flexibility (new metadata fields accepted without code changes)
✅ Storage and retrieval of metadata with reid vectors
✅ Stale feature flushing respects configured timeout
✅ Configuration file loading and parameter application