Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

2021-05-08 约 2656 字预计阅读 6 分钟 91 次阅读

标题 Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

年份： 2020 年 10 月

GB/T 7714: [1] Tabelini L , Berriel R , Paixo T M , et al. Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection[J]. 2020.

本文提出了LaneATT:一种基于锚点的深车道检测模型，它与其他一般的深度目标检测器类似，使用锚点进行特征池化操作。由于车道遵循规则模式，且高度相关。全局信息可能对推断它们的位置至关重要，特别是在闭塞、车道标志缺失等情况下。因此，本研究提出了一种新的基于锚点的注意机制，可以聚集全局信息。

主要贡献：

一个SOTA车道线检测方法；

一个更快的训练收敛时间的模型；

一种新的基于锚点的车道检测注意机制。

LaneATT是一个基于锚的单级车道检测模型(比如YOLOv3、SSD)，方法框架如图1，输入为一个由前置摄像头拍摄的RGB图I=R3×HI×WII = \R^{3\times H_I\times W_I}I=R3×HI×WI，输出为车道边界线。主干网络为CNN，生成一个特征映射，然后池提取每个锚的特征。将提取的特征与全局注意力特征相结合，该模型可以更方便地利用其他车道的信息，已解决遮挡问题。最后，将组合特征传递到全连通层，预测最终输出通道。

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509135412454.png

Overview of the method

主干从输入图像生成特征映射。然后，将每个锚点投影到特征图上。这个投影用于汇集与注意力模块中创建的另一组功能相连接的功能。最后，使用这个结果特征集，两个层，一个用于分类，另一个用于回归，做出最终的预测。

Lane and anchor representation

车道线点定义：(X,Y)(X,Y)(X,Y)

Y=yii=0Npts−1Y = {y_i}^{N_{}{pts}-1}_{i=0}Y=yii=0Npts−1，其中yi=i⋅HINpts−1y_i = i\cdot \frac{H_I}{N_{pts}-1} yi=i⋅Npts−1HI;

X=xii=0Npts−1X = {x_i}^{N_{}{pts}-1}_{i=0}X=xii=0Npts−1

由于大多数车道不会垂直穿过整个图像，所以使用起始索引sss和结束索引eee来定义XXX的有效连续序列。

锚的定义：使用lines而不是boxes，预测的车道有锚（lines）作为参考，锚点是图像平面上的一条“虚”线

Origin point: O=(xorig,yorig)O = (x_{orig},y_{orig})O=(xorig,yorig)，其中 yorig∈Yy_{orig} \in Yyorig∈Y ，位于图像的边框处，除上边框

Direction: θ\thetaθ，使用文献[Likewise Line-CNN]中的锚集。

Backone

首先进行特征提取，使用任意的CNN网络都可以

通过池化操作输出特征图 Fback∈RCF′×HF×WFF_{back} \in \R^{C^\prime _F \times H_F \times W_F}Fback∈RCF′×HF×WF ，

对FbackF_{back}Fback使用1×11 \times 11×1的卷积进行降维，得到F∈RCF×HF×WFF \in \R^{C_F \times H_F \times W_F}F∈RCF×HF×WF

Anchor-based feature pooling

一个锚定义了将用于各自的FFF点。由于锚点被建模为直线，对于一个给定锚点的兴趣点是那些截距锚点的虚拟线(考虑到栅格化的线减少到特征地图的尺寸)。对于每个yj=0,1,2,…,HF−1，y_j = 0, 1, 2, … , H_F−1，yj=0,1,2,…,HF−1，将有一个对应的x xj=⌊1tan⁡θ(yj−yorig /δback )+xorig /δback ⌋ x_{j}=\left\lfloor\frac{1}{\tan \theta}\left(y_{j}-y_{\text {orig }} / \delta_{\text {back }}\right)+x_{\text {orig }} / \delta_{\text {back }}\right\rfloor xj=⌊tanθ1(yj−yorig /δback )+xorig /δback ⌋

式中(xorig,yorig)(x_{orig}, y_{orig})(xorig,yorig)和θθθ分别为锚定线的原点和斜率，δbackδ_{back}δback为全局步幅。每个锚iii将有其对应的特征向量ailoc∈RCF⋅HFa^{loc }_i∈\R ^{C_F·H_F}ailoc∈RCF⋅HF(列向量表示法)从携带局部特征信息(局部特征)的F中池化。锚的一部分在FFF的边界之外的情况下，ailoca^{loc}_iailoc填充零。

Attention mechanism

作用于局部特性a.loca^{loc}_.a.loc来产生辅助特性a.globa^{glob}_.a.glob来聚合全局信息。由全连接层LattL_{att}Latt组成，LattL_{att}Latt处理局部特征向量a.loca^{loc}_.a.loc，以及每个锚点j,j≠ij, j \neq ij,j=i输出概率(权重)wi,jw_{i,j}wi,j

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509163911177.png

然后，将这些权重与局部特征相结合，得到相同维数的全局特征向量

aiglob=∑jwi,jajloc \mathbf{a}_{i}^{g l o b}=\sum_{j} w_{i, j} \mathbf{a}_{j}^{l o c} aiglob=j∑wi,jajloc

整个过程可以通过矩阵乘法有效地实现，因为所有锚点都执行相同的过程。锚的数量: NancN_{anc}Nanc, 局部特征向量的矩阵: Aloc=[a0loc,…,aNanc−1loc]TA^{loc} = [a^{loc}_0 ,…,a^{loc}_{N_{anc}-1}]^TAloc=[a0loc,…,aNanc−1loc]T，方程(2)中定义的权值矩阵: W=[wi,j]Nanc×NancW = [w_{i,j}]_{N_{anc} \times N_{anc}}W=[wi,j]Nanc×Nanc，wi,j，因此，全局特征可以计算为

Aglob =WAloc \mathbf{A}^{\text {glob }}=\mathbf{W} \mathbf{A}^{l o c} Aglob =WAloc

Aglob ,Aloc\mathbf{A}^{\text {glob }}, \mathbf{A}^{l o c}Aglob ,Aloc维度相同Aglob ∈RNanc×CF⋅HF\mathbf{A}^{\text {glob }} \in \R^{N_{anc}\times C_F \cdot H_F}Aglob ∈RNanc×CF⋅HF

Proposal prediction

每个锚的车道预测由三个主要部分组成：

K+1K+1K+1 的概率(K lane types and one class for “background” or invalid proposal)

NptsN_{pts}Npts补偿(the horizontal distance between the prediction and the anchor’s line)

预测的长lll(the number of valid offsets); 起始索引sss( The start-index (s) for the proposal is directly determined by the y-coordinate of the anchor’s origin); 终止索引e=s+[l]−1e = s + [l] - 1e=s+[l]−1

将局部和全局特征ailoc,aigloba^{loc}_i, a^{glob}_iailoc,aiglob结合，得到一个增广特征向量ailoc∈R2⋅CF⋅HFa^{loc}_i \in \R^{2\cdot C_F \cdot H_F}ailoc∈R2⋅CF⋅HF,反馈给两个平行的全连接层，一个是分类层（LclsL_{cls}Lcls），一个是分割层（LregL_{reg}Lreg），LclsL_{cls}Lcls预测 pi=p0,…,pK+1p_i = {p_0,…,p_{K+1}}pi=p0,…,pK+1;LregL_{reg}Lreg预测 ri=(l,x0,…,xNpts−1)r_i = (l,{x_0,…,x_{N_{pts}-1}})ri=(l,x0,…,xNpts−1)

Non-maximum Supression (NMS)

两个车道 Xa=xiai=1NptsX_{a}={x_{i}^{a}}_{i=1}^{N_{p t s}}Xa=xiai=1Npts 和 Xb=xibi=1NptsX_{b}={x_{i}^{b}}_{i=1}^{N_{p t s}}Xb=xibi=1Npts 之间的距离是基于它们的共同有效指数(或yyy坐标)计算的。设s′=max(sa,sb)s\prime = max(s_a, s_b)s′=max(sa,sb)和e′=min(ea,eb)e\prime = min(e_a, e_b)e′=min(ea,eb)为定义的范围。因此，车道距离度量被定义为

D(Xa,Xb)={1e′−s′+1⋅∑i=s′e′∣xia−xib∣,e′≥s′+∞, otherwise D\left(X_{a}, X_{b}\right)=\left\{\begin{array}{ll} \frac{1}{e^{\prime}-s^{\prime}+1} \cdot \sum_{i=s^{\prime}}^{e^{\prime}}\left|x_{i}^{a}-x_{i}^{b}\right|, & e^{\prime} \geq s^{\prime} \\ +\infty, & \text { otherwise } \end{array}\right. D(Xa,Xb)={e′−s′+11⋅∑i=s′e′∣∣∣xia−xib∣∣∣,+∞,e′≥s′ otherwise

### Model training

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173054232.png

. LaneATT qualitative results on TuSimple (top row), CULane (middle row), and LLAMAS (bottom row). Blue lines are ground-truth, while green and red lines are true-positives and falsepositives, respectively.

在训练过程中，使用式D(Xa,Xb)D(X_{a}, X_{b})D(Xa,Xb)中的距离度量来定义正锚和负锚。首先，测量每个锚(那些在NMS中没有过滤)和地面真相通道之间的距离。距离D(Xa,Xb)D(X_{a}, X_{b})D(Xa,Xb)低于阈值τp\tau _pτp的锚被认为是正的，而距离大于τn\tau _nτn的锚被认为是负的。在这些阈值之间有距离的锚点(及其相关建议)被忽略。剩余的$N_{p&n}$用于定义为的多任务损失

L({pi,ri}i=0Np&n−1)=λ∑iLcls(pi,pi∗)+∑iLreg(ri,ri∗) \begin{aligned} \mathcal{L}\left(\left\{\mathbf{p}_{i}, \mathbf{r}_{i}\right\}_{i=0}^{N_{p \& n}-1}\right)=& \lambda \sum_{i} \mathcal{L}_{c l s}\left(\mathbf{p}_{i}, \mathbf{p}_{i}^{*}\right) +\sum_{i} \mathcal{L}_{r e g}\left(\mathbf{r}_{i}, \mathbf{r}_{i}^{*}\right) \end{aligned} L({pi,ri}i=0Np&n−1)=λi∑Lcls(pi,pi∗)+i∑Lreg(ri,ri∗)

实验结果与分析

数据集：TuSimple， CULane，LLAMAS

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509172348751.png

Overview of the datasets used in this work

输入数据：HI×WI=360×640H_I \times W_I = 360 \times 640HI×WI=360×640
Epochs: CULane为15;TuSimple为100
Intel i9-9900KS ，an RTX 2080 Ti ;
Npts=72,Nanc=1000,τp=20,τn=20,K=1N_{pts} = 72, N_{anc}=1000, \tau _p = 20, \tau _n = 20, K = 1Npts=72,Nanc=1000,τp=20,τn=20,K=1

Results

TuSimple

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173204379.png

Model latency vs. F1 of state-of-the-art methods on CULane and TuSimple

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173255841.png

State-of-the-art results on TuSimple. For a fairer comparison, the FPS of the fastest method ([20]) was measured on the same machine and conditions as our method. Additionally, all metrics for this method were computed using the official source code, since only the accuracy was available in the paper. The best and second-best results across methods with source-code available are in bold and underlined, respectively

CULane

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173417745.png

State-of-the-art results on CULane. Since the images in the “Cross” category have no lanes, the reported number is the amount of false-positives. For a fairer comparison, we measured the FPS of the fastest method ([20]) under the same machine and conditions as ours. The best and second-best results across methods with source-code available are in bold and underlined, respectively

LLAMAS

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173529336.png

State-of-the-art results on LLAMAS

Efficiency trade-offs

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173558198.png

Efficiency trade-offs on CULane using the ResNet-34 backbone. “TT” stands for training time in hours

Ablation study

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173726868.png

Ablation study results on CULane

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

Lane and anchor representation

Backone

Anchor-based feature pooling

Attention mechanism

Proposal prediction

Non-maximum Supression (NMS)

实验结果与分析

Recommend

314送索尼开放式真无线耳机LinkBuds，连接你我，与爱自在相遇-品玩

一等奖！海信激光电视技术荣获山东省科技进步奖-品玩

零售即服务，那数据分析更得是服务

一等奖！特大规模城市的规划创新正走在路上

豆瓣2018年度书影音榜单

2017 年中随笔

唯快不破！鹅厂直播的技术突围之战-品玩

Swin Transformer论文解读与源码分析

LaneAF: Robust Multi-Lane Detection with Affinity Fields

Micro-segmentation

About Joyk