简介
MASt3R-SLAM 是一种基于深度学习的实时单目 SLAM(Simultaneous Localization and Mapping)系统。它结合了传统 SLAM 方法和深度神经网络的优势,能够在环境中实现高精度的相机定位和三维地图构建。相比于先前的DROID-SLAM,进一步提升了优化的效率和精度,本文重点关注前端优化部分。
位姿优化
MASt3R-SLAM的位姿优化函数为gauss_newton_ray_cuda,通过射线对齐方法(Ray Alignment)来优化相机位姿。其核心思想是最小化观测点与地图点之间的距离误差。
Sim3群表示
平移t ∈ R 3 t\in \mathbb{R}^3 t ∈ R 3 ,旋转q = [ q x , q y , q z , q w ] q=[q_x,q_y,q_z,q_w] q = [ q x , q y , q z , q w ] ,缩放s ∈ R + s\in \mathbb{R}^+ s ∈ R + ,则李代数参数化可表示为:
ζ = [ τ , ω , σ ] T ∈ R 7 \zeta=[\tau,\omega,\sigma]^T \in \mathbb{R}^7
ζ = [ τ , ω , σ ] T ∈ R 7
其中τ ∈ R 3 \tau \in \mathbb{R}^3 τ ∈ R 3 表示平移,ω ∈ R 3 \omega \in \mathbb{R}^3 ω ∈ R 3 表示旋转,σ ∈ R \sigma \in \mathbb{R} σ ∈ R 表示尺度对数。
相对位姿计算
给定两个相机位姿T i , T j ∈ S i m ( 3 ) T_i,T_j \in \mathrm{Sim}(3) T i , T j ∈ S i m ( 3 ) ,其相对位姿T i j T_{ij} T i j 可通过李代数指数映射计算:
T i j = T j ∘ T i − 1 s i j = s i − 1 ⋅ s j q i j = q i − 1 ⊗ q j t i j = s i − 1 ⋅ R i − 1 ( t j − t i ) \begin{aligned}
T_{ij} &= T_j \circ T_i^{-1}\\
s_{ij}&=s_i^{-1}\cdot s_j\\
q_{ij}&=q_i^{-1}\otimes q_j\\
t_{ij}&=s_i^{-1}\cdot R_i^{-1} (t_j - t_i)
\end{aligned}
T i j s i j q i j t i j = T j ∘ T i − 1 = s i − 1 ⋅ s j = q i − 1 ⊗ q j = s i − 1 ⋅ R i − 1 ( t j − t i )
残差定义
射线方向残差
将点归一化为单位射线:
r i = X i ∥ X i ∥ \mathbf{r_i}=\frac{X_i}{\|X_i\|}
r i = ∥ X i ∥ X i
将X j X_j X j 变换到相机i i i 坐标系下:
X j , C i = s i j R i j X j + t i j X_{j,C_i}=s_{ij} R_{ij} X_j+t_{ij}
X j , C i = s i j R i j X j + t i j
归一化变换后的点:
r j , C i = X j , C i ∥ X j , C i ∥ \mathbf{r_{j,C_i}}=\frac{X_{j,C_i}}{\|X_{j,C_i}\|}
r j , C i = ∥ X j , C i ∥ X j , C i
射线方向残差定义为:
e r a y = r i − r j , C i \mathbf{e_{ray}}=\mathbf{r_i}-\mathbf{r_{j,C_i}}
e r a y = r i − r j , C i
距离残差
e d i s t = ∥ X i − X j , C i ∥ e_{dist}=\|\mathbf{X_i}-\mathbf{X_{j,C_i}}\|
e d i s t = ∥ X i − X j , C i ∥
雅可比矩阵计算
目标:J i = ∂ e ∂ ζ i , J j = ∂ e ∂ ζ j J_i=\frac{\partial e}{\partial \zeta_i}, J_j=\frac{\partial e}{\partial \zeta_j} J i = ∂ ζ i ∂ e , J j = ∂ ζ j ∂ e
链式法则分解:
∂ e ∂ ζ j = ∂ e ∂ r j , C i ⋅ ∂ r j , C i ∂ X j , C i ⋅ ∂ X j , C i ∂ ζ i , j ⋅ ∂ ζ i , j ∂ ζ j \frac{\partial e}{\partial \zeta_j} = \frac{\partial e}{\partial \mathbf{r_{j,C_i}}}\cdot \frac{\partial \mathbf{r_{j,C_i}}}{\partial X_{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_{i,j}}\cdot \frac{\partial \zeta_{i,j}}{\partial \zeta_j}
∂ ζ j ∂ e = ∂ r j , C i ∂ e ⋅ ∂ X j , C i ∂ r j , C i ⋅ ∂ ζ i , j ∂ X j , C i ⋅ ∂ ζ j ∂ ζ i , j
归一化雅可比
r = X ∥ X ∥ = X X x 2 + X y 2 + X z 2 \mathbf{r}=\frac{X}{\|X\|}=\frac{X}{\sqrt{X_x^2+X_y^2+X_z^2}}
r = ∥ X ∥ X = X x 2 + X y 2 + X z 2 X
令n = ∥ X ∥ = X T X n=\|X\|=\sqrt{X^TX} n = ∥ X ∥ = X T X :
∂ n ∂ X = X T n ∂ r ∂ X = ∂ ∂ X ( X n ) = 1 n I − X n 2 ∂ n ∂ X = 1 n ( I − X X T n 2 ) = 1 ∥ X ∥ ( I − r r T ) = 1 ∥ X ∥ ( I − X X T ∥ X ∥ 2 ) \begin{aligned}
\frac{\partial n}{\partial X}&=\frac{X^T}{n}\\
\frac{\partial \mathbf{r}}{\partial X}&=\frac{\partial }{\partial X}(\frac{X}{n})\\
&=\frac{1}{n}I - \frac{X}{n^2}\frac{\partial n}{\partial X}\\
&=\frac{1}{n}(I-\frac{XX^T}{n^2})\\
&=\frac{1}{\|X\|}(I - \mathbf{r}\mathbf{r}^T)\\
&=\frac{1}{\|X\|}(I-\frac{XX^T}{\|X\|^2})
\end{aligned}
∂ X ∂ n ∂ X ∂ r = n X T = ∂ X ∂ ( n X ) = n 1 I − n 2 X ∂ X ∂ n = n 1 ( I − n 2 X X T ) = ∥ X ∥ 1 ( I − r r T ) = ∥ X ∥ 1 ( I − ∥ X ∥ 2 X X T )
∂ r x ∂ X x = 1 ∥ X ∥ − X x 2 ∥ X ∥ 3 ∂ r x ∂ X y = − X x X y ∥ X ∥ 3 ∂ r x ∂ X z = − X x X z ∥ X ∥ 3 \begin{aligned}
\frac{\partial r_x}{\partial X_x}&=\frac{1}{\|X\|}-\frac{X_x^2}{\|X\|^3}\\
\frac{\partial r_x}{\partial X_y}&=-\frac{X_x X_y}{\|X\|^3}\\
\frac{\partial r_x}{\partial X_z}&=-\frac{X_x X_z}{\|X\|^3}
\end{aligned}
∂ X x ∂ r x ∂ X y ∂ r x ∂ X z ∂ r x = ∥ X ∥ 1 − ∥ X ∥ 3 X x 2 = − ∥ X ∥ 3 X x X y = − ∥ X ∥ 3 X x X z
同理可得r y , r z r_y,r_z r y , r z 对X X X 的偏导数
Sim3变换雅可比
X ′ = s R X + t X'=sRX+t
X ′ = s R X + t
李代数扰动:
δ ζ = [ δ τ , δ ω , δ σ ] T \delta \zeta=[\delta \tau,\delta \omega,\delta \sigma]^T
δ ζ = [ δ τ , δ ω , δ σ ] T
X ′ = exp ( δ ζ ) ⋅ X ≈ X + ∂ X ′ ∂ ζ ∣ ζ = 0 δ ζ X'=\exp(\delta \boldsymbol{\zeta})\cdot X \approx X+\left.\frac{\partial X'}{\partial \zeta}\right|_{\zeta=0}\delta \zeta
X ′ = exp ( δ ζ ) ⋅ X ≈ X + ∂ ζ ∂ X ′ ∣ ∣ ∣ ∣ ∣ ζ = 0 δ ζ
平移:
∂ X ′ ∂ ζ = [ ∂ X ′ ∂ τ ∂ X ′ ∂ ω ∂ X ′ ∂ σ ] \frac{\partial X'}{\partial \zeta}=\begin{bmatrix}
\frac{\partial X'}{\partial \tau} & \frac{\partial X'}{\partial \omega} & \frac{\partial X'}{\partial \sigma}
\end{bmatrix}
∂ ζ ∂ X ′ = [ ∂ τ ∂ X ′ ∂ ω ∂ X ′ ∂ σ ∂ X ′ ]
∂ X ′ ∂ τ = I 3 × 3 \frac{\partial X'}{\partial \tau} = I_{3\times 3}
∂ τ ∂ X ′ = I 3 × 3
旋转:
X ′ = R X δ R = I + [ δ ω ] × X ′ = ( I + [ δ ω ] × ) X = X + [ δ ω ] × X \begin{aligned}
X'&=RX\\
\delta R&=I+[\delta \omega]_\times\\
X'&=(I+[\delta \omega]_\times)X=X+[\delta \omega]_\times X
\end{aligned}
X ′ δ R X ′ = R X = I + [ δ ω ] × = ( I + [ δ ω ] × ) X = X + [ δ ω ] × X
∂ X ′ ∂ ω = − [ X ′ ] × = [ 0 x z ′ − x y ′ − x z ′ 0 x x ′ x y ′ − x x ′ 0 ] \frac{\partial X'}{\partial \omega} = -[X']_\times\\
=\begin{bmatrix}
0 & x'_z & -x'_y \\
-x'_z & 0 & x'_x \\
x'_y & -x'_x & 0
\end{bmatrix}
∂ ω ∂ X ′ = − [ X ′ ] × = ⎣ ⎢ ⎡ 0 − x z ′ x y ′ x z ′ 0 − x x ′ − x y ′ x x ′ 0 ⎦ ⎥ ⎤
尺度:
X ′ = s R X + t X'=sRX+t
X ′ = s R X + t
∂ X ′ ∂ s = R X \frac{\partial X'}{\partial s}=RX
∂ s ∂ X ′ = R X
设s = e σ s=e^\sigma s = e σ ,则:
∂ X ′ ∂ σ = ∂ X ′ ∂ s ⋅ ∂ s ∂ σ = s R X = X ′ − t ≈ X ′ \frac{\partial X'}{\partial \sigma}=\frac{\partial X'}{\partial s}\cdot \frac{\partial s}{\partial \sigma}=sRX=X'-t\approx X'
∂ σ ∂ X ′ = ∂ s ∂ X ′ ⋅ ∂ σ ∂ s = s R X = X ′ − t ≈ X ′
∂ r ∂ σ = ∂ ∂ σ ( X ′ ∥ X ′ ∥ ) = 1 ∥ X ′ ∥ ( I − X ′ X ′ T ∥ X ′ ∥ 2 ) ⋅ ∂ X ′ ∂ σ = 1 ∥ X ′ ∥ ( I − X ′ X ′ T ∥ X ′ ∥ 2 ) X ′ = 0 \begin{aligned}
\frac{\partial r}{\partial \sigma}&=\frac{\partial}{\partial \sigma}(\frac{X'}{\|X'\|})=\frac{1}{\|X'\|}(I-\frac{X'X'^T}{\|X'\|^2})\cdot \frac{\partial X'}{\partial \sigma}\\
&=\frac{1}{\|X'\|}(I-\frac{X'X'^T}{\|X'\|^2})X'=0
\end{aligned}
∂ σ ∂ r = ∂ σ ∂ ( ∥ X ′ ∥ X ′ ) = ∥ X ′ ∥ 1 ( I − ∥ X ′ ∥ 2 X ′ X ′ T ) ⋅ ∂ σ ∂ X ′ = ∥ X ′ ∥ 1 ( I − ∥ X ′ ∥ 2 X ′ X ′ T ) X ′ = 0
∂ e d i s t ∂ σ = ∂ ∥ X ′ ∥ ∂ σ = ∂ ∥ X ′ ∥ ∂ X ′ ⋅ ∂ X ′ ∂ σ = X ′ T ∥ X ′ ∥ ⋅ X ′ = ∥ X ′ ∥ \begin{aligned}
\frac{\partial e_{dist}}{\partial \sigma}&=\frac{\partial \|X'\|}{\partial \sigma}=\frac{\partial \|X'\|}{\partial X'}\cdot \frac{\partial X'}{\partial \sigma}=\frac{X'^T}{\|X'\|}\cdot X'=\|X'\|
\end{aligned}
∂ σ ∂ e d i s t = ∂ σ ∂ ∥ X ′ ∥ = ∂ X ′ ∂ ∥ X ′ ∥ ⋅ ∂ σ ∂ X ′ = ∥ X ′ ∥ X ′ T ⋅ X ′ = ∥ X ′ ∥
完整的雅可比为:
对于点X ′ = s R X + t X'=sRX+t X ′ = s R X + t :
J l o c a l = [ I 3 × 3 − [ X ′ ] × X ′ ] J_{local}=\begin{bmatrix}
I_{3\times 3} & -[X']_\times & X'
\end{bmatrix}
J l o c a l = [ I 3 × 3 − [ X ′ ] × X ′ ]
射线方向误差:
J r a y l o c a l = ∂ r ∂ X ′ ⋅ J l o c a l J i , j = ∑ k = 1 3 ∂ r i ∂ X k ′ ⋅ ∂ X k ′ ∂ ζ j \begin{aligned}
J_{ray}^{local}&=\frac{\partial r}{\partial X'}\cdot J_{local}\\
J_{i,j}&=\sum_{k=1}^3 \frac{\partial r_i}{\partial X'_k} \cdot \frac{\partial X'_k}{\partial \zeta_j}
\end{aligned}
J r a y l o c a l J i , j = ∂ X ′ ∂ r ⋅ J l o c a l = k = 1 ∑ 3 ∂ X k ′ ∂ r i ⋅ ∂ ζ j ∂ X k ′
伴随变换
优化变量是全局位姿ζ i , ζ j \zeta_i,\zeta_j ζ i , ζ j ,但雅可比是在局部坐标系T i j T_{ij} T i j 下计算的,需要通过伴随变换转换:
J g l o b a l = A d T i j − T J l o c a l J_{global}=\mathbf{Ad}_{T_{ij}}^{-T}J_{local}
J g l o b a l = A d T i j − T J l o c a l
Sim3伴随表示:
A d = [ s R 0 0 [ t ] × s R s R 0 t T s R 0 1 ] \mathbf{Ad}=\begin{bmatrix}
sR & 0 & 0 \\
[t]_\times sR & sR & 0 \\
t^TsR & 0 & 1
\end{bmatrix}
A d = ⎣ ⎢ ⎡ s R [ t ] × s R t T s R 0 s R 0 0 0 1 ⎦ ⎥ ⎤
伴随矩阵的转置:
J g l o b a l = A d T i j − T J l o c a l J_{global} = \mathbf{Ad}_{T_{ij}}^{-T} J_{local}
J g l o b a l = A d T i j − T J l o c a l
伴随逆公式:
A d − 1 = [ s − 1 R T 0 0 − s − 1 [ t ] × R T s − 1 R T 0 − s − 1 t T R T 0 1 ] \mathbf{Ad}^{-1}=\begin{bmatrix}
s^{-1}R^T & 0 & 0 \\
-s^{-1}[t]_\times R^T & s^{-1}R^T & 0 \\
-s^{-1}t^TR^T & 0 & 1
\end{bmatrix}
A d − 1 = ⎣ ⎢ ⎡ s − 1 R T − s − 1 [ t ] × R T − s − 1 t T R T 0 s − 1 R T 0 0 0 1 ⎦ ⎥ ⎤
从相机j j j 到相机i i i 的雅可比:
J j = ∂ e ∂ ζ j = ∂ e ∂ X j , C i ⋅ ∂ X j , C i ∂ ζ j ⋅ A d T i j − T \begin{aligned}
J_j&=\frac{\partial e}{\partial \zeta_j} \\
&=\frac{\partial e}{\partial X_{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_j} \cdot \mathbf{Ad}_{T_{ij}}^{-T}
\end{aligned}
J j = ∂ ζ j ∂ e = ∂ X j , C i ∂ e ⋅ ∂ ζ j ∂ X j , C i ⋅ A d T i j − T
对相机i的雅可比:
T i j = T i − 1 ∘ T j J i = ∂ e ∂ ζ i = − ∂ e ∂ X j , C i ⋅ ∂ X j , C i ∂ ζ i j ⋅ A d T i j − T = − J j \begin{aligned}
T_{ij}&=T_i^{-1} \circ T_j\\
J_i&=\frac{\partial e}{\partial \zeta_i} \\
&=-\frac{\partial e}{\partial X_{j,C_i}} \cdot \frac{\partial X_{j,C_i}}{\partial \zeta_{ij}} \cdot \mathbf{Ad}_{T_{ij}}^{-T} \\
&=-J_j
\end{aligned}
T i j J i = T i − 1 ∘ T j = ∂ ζ i ∂ e = − ∂ X j , C i ∂ e ⋅ ∂ ζ i j ∂ X j , C i ⋅ A d T i j − T = − J j
Gauss-Newton优化
Hessian矩阵:
对每条边(相机对i,j)的Hessian矩阵为:
H i i = ∑ k J i k T W k J i k H i j = ∑ k J i k T W k J j k H j j = ∑ k J j k T W k J j k \begin{aligned}
H_{ii}&=\sum_k J_i^{kT}W_kJ_i^k\\
H_{ij}&=\sum_k J_i^{kT}W_kJ_j^k\\
H_{jj}&=\sum_k J_j^{kT}W_kJ_j^k
\end{aligned}
H i i H i j H j j = k ∑ J i k T W k J i k = k ∑ J i k T W k J j k = k ∑ J j k T W k J j k
梯度:
g i = ∑ k J i k T W k e k g j = ∑ k J j k T W k e k \begin{aligned}
g_i&=\sum_k J_i^{kT}W_k e_k\\
g_j&=\sum_k J_j^{kT}W_k e_k
\end{aligned}
g i g j = k ∑ J i k T W k e k = k ∑ J j k T W k e k
得到:
H ⋅ Δ ζ = − g H\cdot \Delta \zeta = -g
H ⋅ Δ ζ = − g