Papers
arxiv:2402.13643

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Published on Feb 21, 2024
Authors:
,
,
,

Abstract

Scene text recognition is a rapidly developing field that faces numerous challenges due to the complexity and diversity of scene text, including complex backgrounds, diverse fonts, flexible arrangements, and accidental occlusions. In this paper, we propose a novel approach called Class-Aware Mask-guided feature refinement (CAM) to address these challenges. Our approach introduces canonical class-aware glyph masks generated from a standard font to effectively suppress background and text style noise, thereby enhancing feature discrimination. Additionally, we design a feature alignment and fusion module to incorporate the canonical mask guidance for further feature refinement for text recognition. By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance. We first evaluate CAM on six standard text recognition benchmarks to demonstrate its effectiveness. Furthermore, CAM exhibits superiority over the state-of-the-art method by an average performance gain of 4.1% across six more challenging datasets, despite utilizing a smaller model size. Our study highlights the importance of incorporating canonical mask guidance and aligned feature refinement techniques for robust scene text recognition. The code is available at https://github.com/MelosY/CAM.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.13643 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.