简介
MASt3R-SLAM 是一种基于深度学习的实时单目 SLAM(Simultaneous Localization and Mapping)系统。它结合了传统 SLAM 方法和深度神经网络的优势,能够在环境中实现高精度的相机定位和三维地图构建。相比于先前的DROID-SLAM,进一步提升了优化的效率和精度,本文重点关注前端优化部分。
位姿优化
MASt3R-SLAM的位姿优化函数为gauss_newton_ray_cuda,通过射线对齐方法(Ray Alignment)来优化相机位姿。其核心思想是最小化观测点与地图点之间的距离误差。
Sim3群表示
平移$t\in \mathbb{R}^3$,旋转$q=[q_x,q_y,q_z,q_w]$,缩放$s\in \mathbb{R}^+$,则李代数参数化可表示为:
$$
\zeta=[\tau,\omega,\sigma]^T \in \mathbb{R}^7
$$
其中$\tau \in \mathbb{R}^3$表示平移,$\omega \in \mathbb{R}^3$表示旋转,$\sigma \in \mathbb{R}$表示尺度对数。相对位姿计算
给定两个相机位姿$T_i,T_j \in \mathrm{Sim}(3)$,其相对位姿$T_{ij}$可通过李代数指数映射计算:
$$
\begin{aligned}
T_{ij} &= T_j \circ T_i^{-1}\
s_{ij}&=s_i^{-1}\cdot s_j\
q_{ij}&=q_i^{-1}\otimes q_j\
t_{ij}&=s_i^{-1}\cdot R_i^{-1} (t_j - t_i)
\end{aligned}
$$
残差定义
射线方向残差
将点归一化为单位射线:
$$
\mathbf{r_i}=\frac{X_i}{|X_i|}
$$
将$X_j$变换到相机$i$坐标系下:
$$
X_{j,C_i}=s_{ij} R_{ij} X_j+t_{ij}
$$
归一化变换后的点:
$$
\mathbf{r_{j,C_i}}=\frac{X_{j,C_i}}{|X_{j,C_i}|}
$$
射线方向残差定义为:
$$
\mathbf{e_{ray}}=\mathbf{r_i}-\mathbf{r_{j,C_i}}
$$
距离残差
$$
e_{dist}=|\mathbf{X_i}-\mathbf{X_{j,C_i}}|
$$
雅可比矩阵计算
目标:$J_i=\frac{\partial e}{\partial \zeta_i}, J_j=\frac{\partial e}{\partial \zeta_j}$
链式法则分解:
$$
\frac{\partial e}{\partial \zeta_j} = \frac{\partial e}{\partial \mathbf{r_{j,C_i}}}\cdot \frac{\partial \mathbf{r_{j,C_i}}}{\partial X_{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_{i,j}}\cdot \frac{\partial \zeta_{i,j}}{\partial \zeta_j}
$$
归一化雅可比
$$
\mathbf{r}=\frac{X}{|X|}=\frac{X}{\sqrt{X_x^2+X_y^2+X_z^2}}
$$
令$n=|X|=\sqrt{X^TX}$:
$$
\begin{aligned}
\frac{\partial n}{\partial X}&=\frac{X^T}{n}\
\frac{\partial \mathbf{r}}{\partial X}&=\frac{\partial }{\partial X}(\frac{X}{n})\
&=\frac{1}{n}I - \frac{X}{n^2}\frac{\partial n}{\partial X}\
&=\frac{1}{n}(I-\frac{XX^T}{n^2})\
&=\frac{1}{|X|}(I - \mathbf{r}\mathbf{r}^T)\
&=\frac{1}{|X|}(I-\frac{XX^T}{|X|^2})
\end{aligned}
$$
$$
\begin{aligned}
\frac{\partial r_x}{\partial X_x}&=\frac{1}{|X|}-\frac{X_x^2}{|X|^3}\
\frac{\partial r_x}{\partial X_y}&=-\frac{X_x X_y}{|X|^3}\
\frac{\partial r_x}{\partial X_z}&=-\frac{X_x X_z}{|X|^3}
\end{aligned}
$$
同理可得$r_y,r_z$对$X$的偏导数
Sim3变换雅可比
$$
X’=sRX+t
$$
李代数扰动:
$$
\delta \zeta=[\delta \tau,\delta \omega,\delta \sigma]^T
$$
$$
X’=\exp(\delta \boldsymbol{\zeta})\cdot X \approx X+\left.\frac{\partial X’}{\partial \zeta}\right|{\zeta=0}\delta \zeta
$$
平移:
$$
\frac{\partial X’}{\partial \zeta}=\begin{bmatrix}
\frac{\partial X’}{\partial \tau} & \frac{\partial X’}{\partial \omega} & \frac{\partial X’}{\partial \sigma}
\end{bmatrix}
$$
$$
\frac{\partial X’}{\partial \tau} = I{3\times 3}
$$
旋转:
$$
\begin{aligned}
X’&=RX\
\delta R&=I+[\delta \omega]_\times\
X’&=(I+[\delta \omega]_\times)X=X+[\delta \omega]_\times X
\end{aligned}
$$
$$
\frac{\partial X’}{\partial \omega} = -[X’]_\times\
=\begin{bmatrix}
0 & x’_z & -x’_y \
-x’_z & 0 & x’_x \
x’_y & -x’x & 0
\end{bmatrix}
$$
尺度:
$$
X’=sRX+t
$$
$$
\frac{\partial X’}{\partial s}=RX
$$
设$s=e^\sigma$,则:
$$
\frac{\partial X’}{\partial \sigma}=\frac{\partial X’}{\partial s}\cdot \frac{\partial s}{\partial \sigma}=sRX=X’-t\approx X’
$$
$$
\begin{aligned}
\frac{\partial r}{\partial \sigma}&=\frac{\partial}{\partial \sigma}(\frac{X’}{|X’|})=\frac{1}{|X’|}(I-\frac{X’X’^T}{|X’|^2})\cdot \frac{\partial X’}{\partial \sigma}\
&=\frac{1}{|X’|}(I-\frac{X’X’^T}{|X’|^2})X’=0
\end{aligned}
$$
$$
\begin{aligned}
\frac{\partial e{dist}}{\partial \sigma}&=\frac{\partial |X’|}{\partial \sigma}=\frac{\partial |X’|}{\partial X’}\cdot \frac{\partial X’}{\partial \sigma}=\frac{X’^T}{|X’|}\cdot X’=|X’|
\end{aligned}
$$
完整的雅可比为:
对于点$X’=sRX+t$:
$$
J_{local}=\begin{bmatrix}
I_{3\times 3} & -[X’]\times & X’
\end{bmatrix}
$$
射线方向误差:
$$
\begin{aligned}
J{ray}^{local}&=\frac{\partial r}{\partial X’}\cdot J_{local}\
J_{i,j}&=\sum_{k=1}^3 \frac{\partial r_i}{\partial X’_k} \cdot \frac{\partial X’_k}{\partial \zeta_j}
\end{aligned}
$$
伴随变换
优化变量是全局位姿$\zeta_i,\zeta_j$,但雅可比是在局部坐标系$T_{ij}$下计算的,需要通过伴随变换转换:
$$
J_{global}=\mathbf{Ad}{T{ij}}^{-T}J_{local}
$$
Sim3伴随表示:
$$
\mathbf{Ad}=\begin{bmatrix}
sR & 0 & 0 \
[t]\times sR & sR & 0 \
t^TsR & 0 & 1
\end{bmatrix}
$$
伴随矩阵的转置:
$$
J{global} = \mathbf{Ad}{T{ij}}^{-T} J_{local}
$$
伴随逆公式:
$$
\mathbf{Ad}^{-1}=\begin{bmatrix}
s^{-1}R^T & 0 & 0 \
-s^{-1}[t]\times R^T & s^{-1}R^T & 0 \
-s^{-1}t^TR^T & 0 & 1
\end{bmatrix}
$$
从相机$j$到相机$i$的雅可比:
$$
\begin{aligned}
J_j&=\frac{\partial e}{\partial \zeta_j} \
&=\frac{\partial e}{\partial X{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_j} \cdot \mathbf{Ad}{T{ij}}^{-T}
\end{aligned}
$$
对相机i的雅可比:
$$
\begin{aligned}
T_{ij}&=T_i^{-1} \circ T_j\
J_i&=\frac{\partial e}{\partial \zeta_i} \
&=-\frac{\partial e}{\partial X_{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_{ij}} \cdot \mathbf{Ad}{T{ij}}^{-T} \
&=-J_j
\end{aligned}
$$
Gauss-Newton优化
Hessian矩阵:
对每条边(相机对i,j)的Hessian矩阵为:
$$
\begin{aligned}
H_{ii}&=\sum_k J_i^{kT}W_kJ_i^k\
H_{ij}&=\sum_k J_i^{kT}W_kJ_j^k\
H_{jj}&=\sum_k J_j^{kT}W_kJ_j^k
\end{aligned}
$$
梯度:
$$
\begin{aligned}
g_i&=\sum_k J_i^{kT}W_k e_k\
g_j&=\sum_k J_j^{kT}W_k e_k
\end{aligned}
$$
得到:
$$
H\cdot \Delta \zeta = -g
$$
我是学生,给我钱