# Markerless Camera Calibration Internals
The markerless calibration path uses a Hierarchical Localization (HLoc) workflow with two stages:
1. **Global retrieval** with **NetVLAD** to find candidate database images.
2. **Local matching** (sparse or dense) followed by geometric pose solving.
## How NetVLAD is used
- During scene registration, the service extracts global descriptors for dataset images and stores them in an HDF5 file (for example, `global-feats-netvlad.h5`).
- During camera localization, the service extracts a NetVLAD descriptor for the query frame and uses `pairs_from_retrieval` to retrieve top-$K$ candidates (`number_of_localizations`, default `50`) from the registered descriptor database.
- The retrieved image pairs define the shortlist for local feature matching and pose estimation.
## How quadtree attention is used
- SceneScape integrates a custom HLoc matcher based on **QTA-LoFTR** (`qta_loftr.py`) that loads the QuadTreeAttention implementation.
- In this matcher, LoFTR coarse matching is configured with `BLOCK_TYPE = "quadtree"` (with `ATTN_TYPE = "B"` and tuned `TOPKS`) to reduce attention cost while preserving long-range correspondences.
- Dense matching is selected when a local feature entry is `"-"`; otherwise, the service runs sparse extraction and matching.
## How HLoc ties the pipeline together
- Registration and localization are orchestrated from the markerless calibration module.
- HLoc modules used include `extract_features`, `pairs_from_retrieval`, `match_features` / `match_dense`, and `localize_scenescape`.
- `localize_scenescape.pose_from_cluster` back-projects matched keypoints to 3D using scene depth or mesh, then runs PnP (`pycolmap.absolute_pose_estimation`) to estimate camera pose.
- The service validates results with two quality gates before returning success:
- `minimum_number_of_matches` (default `20`)
- `inlier_threshold` (default `0.5`, computed as $\frac{n_{inliers}}{n_{matches}}$)
## Flow Diagram: Registration and Localization
```mermaid
flowchart TD
A[Polycam zip uploaded] --> B[Preprocess dataset and transform to SceneScape layout]
B --> C[Registration start]
C --> D[Extract NetVLAD descriptors for DB images]
D --> E[Save DB global descriptors
global-feats-netvlad.h5]
E --> F[Calibration request with query frame]
F --> G[Extract query NetVLAD descriptor]
G --> H[pairs_from_retrieval selects top-K DB images]
H --> I{Local matching mode}
I -->|Sparse| J[Extract local features
example: SIFT]
J --> K[match_features
example: NN-ratio]
I -->|Dense| L[match_dense with QTA-LoFTR
coarse block type: quadtree]
K --> M[localize_scenescape pose_from_cluster]
L --> M
M --> N[Back-project DB matches to 3D using depth or mesh]
N --> O[pycolmap PnP with RANSAC]
O --> P{Quality gates pass?}
P -->|No| Q[Return weak or insufficient matches]
P -->|Yes| R[Return quaternion and translation]
```
These values are scene-level configuration inputs from the service model: `global_feature`, `local_feature`, `matcher`, `number_of_localizations`, `minimum_number_of_matches`, and `inlier_threshold`.