Improving Object Pose Estimation with Line Features in Mixed Reality

ETHz Mixed Reality 2024

Deqing Song
Moyang Li
Yifan Jiang
Yuqiao Huang

Project Overview

Object pose estimation plays a critical role in augmented reality (AR) applications, particularly in environments with low-texture surfaces, such as SBB train doors. This work investigates the challenges of using line-based meth- ods for pose estimation under domain shifts, demonstrat- ing their limitations. We introduce a robust, generaliz- able approach leveraging the dense feature matcher (GIM) and a pipeline incorporating YOLOv8 bounding box pre- dictions and LIMAP-based line feature extraction. Our method enhances feature matching in domain-shifted condi- tions and provides a real-time implementation on Microsoft HoloLens, demonstrating its potential for practical AR ap- plications. Results shows the effectiveness of GIM for ro- bust feature matching and pose estimation.

Pipeline

LiMAP is a line-based feature matching algorithm that uses line segments extracted from images to estimate object poses. The algorithm first detects line segments in the reference and query images, then matches them based on their geometric properties. By calculating the transformation matrix between the two images, LiMAP accurately localizes the object in the query image.

GIM (Generalizable Image Matcher) is a self-supervised framework designed to learn robust image matching models with strong generalization capabilities from internet videos. By leveraging candidate correspondences between adjacent frames and propagating them to wider baselines, GIM enhances the training process for improved performance in unseen scenarios. In our project, we use GIM to localize SBB car doors, utilizing its ability to extract reliable feature correspondences.

In this project, we compare the performance of LiMAP and GIM in feature matching accuracy, evaluating their robustness in further localization of SBB train doors.

LiMAP

Figure 1. the top part shows the pipeline of GIM-based feature matching. GIM-based method first conducts dense feature matching between the refer- ence image and query images, then uses matched features to real- ize image warping and pose estimation. The bottom part presents the pipeline of LIMAP-based method. LIMAP-based method first extracts both line features and point features, then utilizes YOLO box to remove outliers. LIMAP-based method use filtered features to realize localization.

YOLO Object Detection

To improve the quality of feature matching, the pipeline removes noisy line features outside the target object in the query image. This is achieved by leveraging a bounding box proposed by YOLO . To enable bounding box generation, the pipeline trains a YOLOv8 network. The YOLO pipeline structure is illustrated in Figure 2.

YOLO

Figure 2. YOLOv8 Bouding Box Pipeline. The figure illustrates the image inference process and YOLOv8 model training pipeline. The query image undergoes the preprocessing and is then fed into the trained YOLOv8 model. The training dataset of annotated images is used to fine-tune the model through a process involv- ing model training, validation and model saving. Once fine-tuned, the model processes the query image by performing tasks of pre- processing, model prediction, and post-processing to identify and bound relevant features in the image. The resulting inference high- lights SBB door areas within the query image, bounds regions around detected objects and remove the uninterested areas.

LiMAP Results

Our experiments demonstrate that LiMAP is effective in reconstructing SBB train doors.

Our experiments reveal that LiMAP with bounding boxes is still insufficient for accurately localizing SBB train doors with high precision. While the algorithm successfully reconstructs the line features of the SBB train doors, it struggles to transfer the learned features from the synthetic dataset to real-world query images, regardless of whether bounding boxes are applied.

SBB train door reconstruction using LiMAP

LiMAP 2D-3D Line Matching Results

2D-3D Line Matching Without Bounding Box

LiMAPLiMAPLiMAP

2D-3D Line Matching Without Bounding Box in Synthetic Image

LiMAPLiMAPLiMAP

2D-3D Line Matching With Bounding Box

LiMAPLiMAPLiMAP

GIM Results

Generalizable Image Matcher (GIM) employ internet video to enhance the robustness and generalizability in chal- lenging scenarios.

GIM Feature Matching Results

Our experiments show that GIM outperforms well in feature matching accuracy. The GIM algorithm is robust to occlusion and cluttered backgrounds, providing a more reliable solution for further object pose estimation in mixed reality applications.

GIM vs LiMAP

We compare the performance of GIM and LiMAP, evaluate their robustness in further localization of SBB train doors.

Quantitative evaluation results

Table 1. Quantitative Comparison of Localization on Synthetic DatasetLIMAP-based method achieves better performance in terms of translation and rotation due to the introduction of line features.
MethodTranslation (m)Rotation (deg)
Point-based0.0820.396
LIMAP-based [13]0.0320.145
Table 2. Comparison of Localization on Real-world Datasetx denotes that the method fails in this sequence. GIM-based method realizes smaller absolute trajectory error (m) on 3 sequences due to its generalizability on datasets with domain shift.
Methodseq1seq2seq3
LIMAP-based [13]x1.368 ± 0.4990.958 ± 0.417
GIM-based [22]1.005 ± 0.4730.659 ± 0.3780.564 ± 0.346

Table 1 demonstrates the superiority of LIMAP-based method on low-texture images such as the SBB door. The introduction of line features improves the robustness of localization. Moreover, we conduct experiments of LIMAP-based method on real-world datasets. Table 2 show that LIMAP-based method fails on the real-world dataset due to large domain shift between the reference image and the query images. There are lots of mismatched line features even with the help of YOLO bounding box. However, the GIM-DKM model can realize more accurate feature matching and stable localization on datasets with large domain shift, as shown in Table 2.

Qualitative Comparisons of Trajectory Estimation

GIM vs LiMAP

Given the reference image from synthetic dataset and a sequence of real-world images, GIM-based method is able to realize stable feature matching and more accurate pose estimation. LIMAP-based method fails to achieve accurate point and line feature matching due to large domain shift, thus leading to the failure on all sequences

Hololens Implementation

We implemented the GIM algorithms on the HoloLens. By leveraging the HoloLens capabilities, we accurately localize SBB train doors in mixed reality environments, providing a reliable solution for applications in the transportation industry.

Hololens real-time implementation

project overviewproject overviewproject overview
project overviewproject overview
project overview
project overview