多层次融合与双模态感知的场景文本检测
首发时间:2023-02-21
摘要:目前,卷积神经网络在场景文本检测中得到了普遍应用,很大程度上提升了场景文本检测的效果。然而,文本分布的分散性和文本尺度的差异性仍然给文本检测任务带来了挑战,复杂场景背景区域内的像素也容易被误判。针对文本尺度差异大的问题,提出多层次权重融合模块,对网络最深层的特征图生成具有不同感受野大小的特征图和权重,使得最终得到的特征图能够拥有更丰富的感受野,从而捕获不同尺度的文本特征。除此之外,提出双模态感知模块,从局部和全局的角度感知特征图文本区域的信息,有效缓解文本分布的分散性问题。针对背景误判的问题,提出前景背景强化分支,既强化对文本区域的监督,同时也抑制背景区域的噪声误判。在ICDAR2015,Total-Text和MSRA-TD500三个公开数据集上的实验结果验证了提出方法的有效性。
关键词: 深度学习 场景文本检测 卷积神经网络 复杂场景 感受野 背景误判
For information in English, please click here
Scene text detection based on multi-layer fusion and dual-modal perception
Abstract:Nowadays, convolutional neural network is wildly used in scene text detection, which has greatly improved the results of the scene text detection. However, the dispersion of text distribution and the difference of text scale still bring challenges to the text detection, and pixels in text regions may be mistaken as background in complex scenes, leading to the degradation of detection performance. In order to deal with the problem of large scaling gap between texts, this paper presented a multi-layer weight fusion module. In this module, the deepest feature of backbone is used to generate features and weights with different receptive fields, which makes the output of the module has a comprehensive receptive field to capture text instances with various scales. Besides, this paper put forward a dual-modal perception module to capture the information of the text regions from local and global perspectives of the feature map, which effectively alleviates the problem of the dispersion of text distribution. To solve the problem of background misjudgment, foreground and background enhancement branch is used to strength the supervision of text regions and suppress background noises. Extensive experiments are conducted on three datasets, and the results prove the effectiveness of the proposed method in this paper.
Keywords: deep learning scene text detection neural network complex scenes receptive field background misjudgement
引用
No.****
同行评议
共计0人参与
勘误表
多层次融合与双模态感知的场景文本检测
评论
全部评论