2

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

 2 years ago
source link: https://blog.aimoon.top/laneatt/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

 2021-05-08  约 2656 字   预计阅读 6 分钟   91 次阅读  

标题 Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

年份: 2020 年 10 月

GB/T 7714: [1] Tabelini L , Berriel R , Paixo T M , et al. Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection[J]. 2020.

本文提出了LaneATT:一种基于锚点的深车道检测模型,它与其他一般的深度目标检测器类似,使用锚点进行特征池化操作。由于车道遵循规则模式,且高度相关。全局信息可能对推断它们的位置至关重要,特别是在闭塞、车道标志缺失等情况下。因此,本研究提出了一种新的基于锚点的注意机制,可以聚集全局信息。

主要贡献

  • 一个SOTA车道线检测方法;
  • 一个更快的训练收敛时间的模型;
  • 一种新的基于锚点的车道检测注意机制。

LaneATT是一个基于锚的单级车道检测模型(比如YOLOv3、SSD),方法框架如图1,输入为一个由前置摄像头拍摄的RGB图I=R3×HI×WII = \R^{3\times H_I\times W_I}I=R3×HI​×WI​,输出为车道边界线。主干网络为CNN,生成一个特征映射,然后池提取每个锚的特征。将提取的特征与全局注意力特征相结合,该模型可以更方便地利用其他车道的信息,已解决遮挡问题。最后,将组合特征传递到全连通层,预测最终输出通道。

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509135412454.pngOverview of the method

主干从输入图像生成特征映射。然后,将每个锚点投影到特征图上。这个投影用于汇集与注意力模块中创建的另一组功能相连接的功能。最后,使用这个结果特征集,两个层,一个用于分类,另一个用于回归,做出最终的预测。

Lane and anchor representation

车道线点定义:(X,Y)(X,Y)(X,Y)

Y=yii=0Npts−1Y = {y_i}^{N_{}{pts}-1}_{i=0}Y=yi​i=0N​pts−1​,其中yi=i⋅HINpts−1y_i = i\cdot \frac{H_I}{N_{pts}-1} yi​=i⋅Npts​−1HI​​;

X=xii=0Npts−1X = {x_i}^{N_{}{pts}-1}_{i=0}X=xi​i=0N​pts−1​

由于大多数车道不会垂直穿过整个图像,所以使用起始索引sss和结束索引eee来定义XXX的有效连续序列。

锚的定义:使用lines而不是boxes,预测的车道有锚(lines)作为参考,锚点是图像平面上的一条“虚”线

Origin point: O=(xorig,yorig)O = (x_{orig},y_{orig})O=(xorig​,yorig​),其中 yorig∈Yy_{orig} \in Yyorig​∈Y ,位于图像的边框处,除上边框

Direction: θ\thetaθ,使用文献[Likewise Line-CNN]中的锚集。

Backone

首先进行特征提取,使用任意的CNN网络都可以

通过池化操作输出特征图 Fback∈RCF′×HF×WFF_{back} \in \R^{C^\prime _F \times H_F \times W_F}Fback​∈RCF′​×HF​×WF​ ,

对FbackF_{back}Fback​使用1×11 \times 11×1的卷积进行降维,得到F∈RCF×HF×WFF \in \R^{C_F \times H_F \times W_F}F∈RCF​×HF​×WF​

Anchor-based feature pooling

一个锚定义了将用于各自的FFF点。由于锚点被建模为直线,对于一个给定锚点的兴趣点是那些截距锚点的虚拟线(考虑到栅格化的线减少到特征地图的尺寸)。对于每个yj=0,1,2,…,HF−1,y_j = 0, 1, 2, … , H_F−1,yj​=0,1,2,…,HF​−1,将有一个对应的x xj=⌊1tan⁡θ(yj−yorig /δback )+xorig /δback ⌋ x_{j}=\left\lfloor\frac{1}{\tan \theta}\left(y_{j}-y_{\text {orig }} / \delta_{\text {back }}\right)+x_{\text {orig }} / \delta_{\text {back }}\right\rfloor xj​=⌊tanθ1​(yj​−yorig ​/δback ​)+xorig ​/δback ​⌋

式中(xorig,yorig)(x_{orig}, y_{orig})(xorig​,yorig​)和θθθ分别为锚定线的原点和斜率,δbackδ_{back}δback​为全局步幅。每个锚iii将有其对应的特征向量ailoc∈RCF⋅HFa^{loc }_i∈\R ^{C_F·H_F}ailoc​∈RCF​⋅HF​(列向量表示法)从携带局部特征信息(局部特征)的F中池化。锚的一部分在FFF的边界之外的情况下,ailoca^{loc}_iailoc​填充零。

Attention mechanism

作用于局部特性a.loca^{loc}_.a.loc​来产生辅助特性a.globa^{glob}_.a.glob​来聚合全局信息。由全连接层LattL_{att}Latt​组成,LattL_{att}Latt​处理局部特征向量a.loca^{loc}_.a.loc​,以及 每个锚点j,j≠ij, j \neq ij,j​=i输出概率(权重)wi,jw_{i,j}wi,j​

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509163911177.png

然后,将这些权重与局部特征相结合,得到相同维数的全局特征向量

aiglob=∑jwi,jajloc \mathbf{a}_{i}^{g l o b}=\sum_{j} w_{i, j} \mathbf{a}_{j}^{l o c} aiglob​=j∑​wi,j​ajloc​

整个过程可以通过矩阵乘法有效地实现,因为所有锚点都执行相同的过程。锚的数量: NancN_{anc}Nanc​, 局部特征向量的矩阵: Aloc=[a0loc,…,aNanc−1loc]TA^{loc} = [a^{loc}_0 ,…,a^{loc}_{N_{anc}-1}]^TAloc=[a0loc​,…,aNanc​−1loc​]T,方程(2)中定义的权值矩阵: W=[wi,j]Nanc×NancW = [w_{i,j}]_{N_{anc} \times N_{anc}}W=[wi,j​]Nanc​×Nanc​​,wi,j,因此,全局特征可以计算为

Aglob =WAloc \mathbf{A}^{\text {glob }}=\mathbf{W} \mathbf{A}^{l o c} Aglob =WAloc

Aglob ,Aloc\mathbf{A}^{\text {glob }}, \mathbf{A}^{l o c}Aglob ,Aloc维度相同Aglob ∈RNanc×CF⋅HF\mathbf{A}^{\text {glob }} \in \R^{N_{anc}\times C_F \cdot H_F}Aglob ∈RNanc​×CF​⋅HF​

Proposal prediction

每个锚的车道预测由三个主要部分组成:

  • K+1K+1K+1 的概率(K lane types and one class for “background” or invalid proposal)
  • NptsN_{pts}Npts​补偿(the horizontal distance between the prediction and the anchor’s line)
  • 预测的长lll(the number of valid offsets); 起始索引sss( The start-index (s) for the proposal is directly determined by the y-coordinate of the anchor’s origin); 终止索引e=s+[l]−1e = s + [l] - 1e=s+[l]−1

将局部和全局特征ailoc,aigloba^{loc}_i, a^{glob}_iailoc​,aiglob​结合,得到一个增广特征向量ailoc∈R2⋅CF⋅HFa^{loc}_i \in \R^{2\cdot C_F \cdot H_F}ailoc​∈R2⋅CF​⋅HF​,反馈给两个平行的全连接层,一个是分类层(LclsL_{cls}Lcls​),一个是分割层(LregL_{reg}Lreg​),LclsL_{cls}Lcls​预测 pi=p0,…,pK+1p_i = {p_0,…,p_{K+1}}pi​=p0​,…,pK+1​;LregL_{reg}Lreg​预测 ri=(l,x0,…,xNpts−1)r_i = (l,{x_0,…,x_{N_{pts}-1}})ri​=(l,x0​,…,xNpts​−1​)

Non-maximum Supression (NMS)

两个车道 Xa=xiai=1NptsX_{a}={x_{i}^{a}}_{i=1}^{N_{p t s}}Xa​=xia​i=1Npts​​ 和 Xb=xibi=1NptsX_{b}={x_{i}^{b}}_{i=1}^{N_{p t s}}Xb​=xib​i=1Npts​​ 之间的距离是基于它们的共同有效指数(或yyy坐标)计算的。设s′=max(sa,sb)s\prime = max(s_a, s_b)s′=max(sa​,sb​)和e′=min(ea,eb)e\prime = min(e_a, e_b)e′=min(ea​,eb​)为定义的范围。因此,车道距离度量被定义为

D(Xa,Xb)={1e′−s′+1⋅∑i=s′e′∣xia−xib∣,e′≥s′+∞, otherwise  D\left(X_{a}, X_{b}\right)=\left\{\begin{array}{ll} \frac{1}{e^{\prime}-s^{\prime}+1} \cdot \sum_{i=s^{\prime}}^{e^{\prime}}\left|x_{i}^{a}-x_{i}^{b}\right|, & e^{\prime} \geq s^{\prime} \\ +\infty, & \text { otherwise } \end{array}\right. D(Xa​,Xb​)={e′−s′+11​⋅∑i=s′e′​∣∣∣​xia​−xib​∣∣∣​,+∞,​e′≥s′ otherwise ​
### Model training

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173054232.png. LaneATT qualitative results on TuSimple (top row), CULane (middle row), and LLAMAS (bottom row). Blue lines are ground-truth, while green and red lines are true-positives and falsepositives, respectively.

在训练过程中,使用式D(Xa,Xb)D(X_{a}, X_{b})D(Xa​,Xb​)中的距离度量来定义正锚和负锚。首先,测量每个锚(那些在NMS中没有过滤)和地面真相通道之间的距离。距离D(Xa,Xb)D(X_{a}, X_{b})D(Xa​,Xb​)低于阈值τp\tau _pτp​的锚被认为是正的,而距离大于τn\tau _nτn​的锚被认为是负的。在这些阈值之间有距离的锚点(及其相关建议)被忽略。剩余的$N_{p&n}$用于定义为的多任务损失

L({pi,ri}i=0Np&n−1)=λ∑iLcls(pi,pi∗)+∑iLreg(ri,ri∗) \begin{aligned} \mathcal{L}\left(\left\{\mathbf{p}_{i}, \mathbf{r}_{i}\right\}_{i=0}^{N_{p \& n}-1}\right)=& \lambda \sum_{i} \mathcal{L}_{c l s}\left(\mathbf{p}_{i}, \mathbf{p}_{i}^{*}\right) +\sum_{i} \mathcal{L}_{r e g}\left(\mathbf{r}_{i}, \mathbf{r}_{i}^{*}\right) \end{aligned} L({pi​,ri​}i=0Np&n​−1​)=​λi∑​Lcls​(pi​,pi∗​)+i∑​Lreg​(ri​,ri∗​)​

实验结果与分析

  • 数据集:TuSimple, CULane,LLAMAS
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509172348751.pngOverview of the datasets used in this work
  • 输入数据:HI×WI=360×640H_I \times W_I = 360 \times 640HI​×WI​=360×640

  • Epochs: CULane为15;TuSimple为100

  • Intel i9-9900KS ,an RTX 2080 Ti ;

  • Npts=72,Nanc=1000,τp=20,τn=20,K=1N_{pts} = 72, N_{anc}=1000, \tau _p = 20, \tau _n = 20, K = 1Npts​=72,Nanc​=1000,τp​=20,τn​=20,K=1

Results

  • TuSimple
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173204379.pngModel latency vs. F1 of state-of-the-art methods on CULane and TuSimple

https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173255841.pngState-of-the-art results on TuSimple. For a fairer comparison, the FPS of the fastest method ([20]) was measured on the same machine and conditions as our method. Additionally, all metrics for this method were computed using the official source code, since only the accuracy was available in the paper. The best and second-best results across methods with source-code available are in bold and underlined, respectively
  • CULane
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173417745.pngState-of-the-art results on CULane. Since the images in the “Cross” category have no lanes, the reported number is the amount of false-positives. For a fairer comparison, we measured the FPS of the fastest method ([20]) under the same machine and conditions as ours. The best and second-best results across methods with source-code available are in bold and underlined, respectively
  • LLAMAS
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173529336.pngState-of-the-art results on LLAMAS
  • Efficiency trade-offs
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173558198.pngEfficiency trade-offs on CULane using the ResNet-34 backbone. “TT” stands for training time in hours
  • Ablation study
https://gitee.com/xiaomoon/image/raw/master/Img/image-20210509173726868.pngAblation study results on CULane

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK