Markerless Camera Calibration Internals#

The markerless calibration path uses a Hierarchical Localization (HLoc) workflow with two stages:

  1. Global retrieval with NetVLAD to find candidate database images.

  2. Local matching (sparse or dense) followed by geometric pose solving.

How NetVLAD is used#

  • During scene registration, the service extracts global descriptors for dataset images and stores them in an HDF5 file (for example, global-feats-netvlad.h5).

  • During camera localization, the service extracts a NetVLAD descriptor for the query frame and uses pairs_from_retrieval to retrieve top-$K$ candidates (number_of_localizations, default 50) from the registered descriptor database.

  • The retrieved image pairs define the shortlist for local feature matching and pose estimation.

How quadtree attention is used#

  • SceneScape integrates a custom HLoc matcher based on QTA-LoFTR (qta_loftr.py) that loads the QuadTreeAttention implementation.

  • In this matcher, LoFTR coarse matching is configured with BLOCK_TYPE = "quadtree" (with ATTN_TYPE = "B" and tuned TOPKS) to reduce attention cost while preserving long-range correspondences.

  • Dense matching is selected when a local feature entry is "-"; otherwise, the service runs sparse extraction and matching.

How HLoc ties the pipeline together#

  • Registration and localization are orchestrated from the markerless calibration module.

  • HLoc modules used include extract_features, pairs_from_retrieval, match_features / match_dense, and localize_scenescape.

  • localize_scenescape.pose_from_cluster back-projects matched keypoints to 3D using scene depth or mesh, then runs PnP (pycolmap.absolute_pose_estimation) to estimate camera pose.

  • The service validates results with two quality gates before returning success:

    • minimum_number_of_matches (default 20)

    • inlier_threshold (default 0.5, computed as $\frac{n_{inliers}}{n_{matches}}$)

Flow Diagram: Registration and Localization#

        flowchart TD
    A[Polycam zip uploaded] --> B[Preprocess dataset and transform to SceneScape layout]
    B --> C[Registration start]
    C --> D[Extract NetVLAD descriptors for DB images]
    D --> E[Save DB global descriptors<br/>global-feats-netvlad.h5]

    E --> F[Calibration request with query frame]
    F --> G[Extract query NetVLAD descriptor]
    G --> H[pairs_from_retrieval selects top-K DB images]

    H --> I{Local matching mode}
    I -->|Sparse| J[Extract local features<br/>example: SIFT]
    J --> K[match_features<br/>example: NN-ratio]
    I -->|Dense| L[match_dense with QTA-LoFTR<br/>coarse block type: quadtree]

    K --> M[localize_scenescape pose_from_cluster]
    L --> M
    M --> N[Back-project DB matches to 3D using depth or mesh]
    N --> O[pycolmap PnP with RANSAC]
    O --> P{Quality gates pass?}
    P -->|No| Q[Return weak or insufficient matches]
    P -->|Yes| R[Return quaternion and translation]
    

These values are scene-level configuration inputs from the service model: global_feature, local_feature, matcher, number_of_localizations, minimum_number_of_matches, and inlier_threshold.