Research on Small Target Detection Algorithm of DETR Network Based on Improved SWIN Transformer
Main Article Content
Abstract
A small target object refers to a target with a very small bounding box size. The definition methods include: (1) Relative definition: the ratio of the width of the bounding box to the width and height of the original image is less than 10%, or the ratio of the area of the bounding box to the total area of the original image is less than 3%; (2) Absolute definition: the size of the bounding box is less than 32×32 pixels. Small target detection has important application value in remote sensing image, medical image, industrial quality inspection and automatic driving. Although the detection of large targets has achieved remarkable results, the detection of small targets still faces challenges such as low image resolution, small size, strong background interference and insufficient samples, resulting in low detection accuracy and slow speed. In view of the fact that the detection accuracy of DETR small target detection algorithm of Swin Transformer on Tiny Person and Wider Face data sets still needs to be improved. In order to solve the problem of low detection rate, this paper puts forward an improved DETR small target detection algorithm model of Swin Transformer, and adopts the following optimization strategies: Firstly, BiFPN (Bidirectional Feature Pyramid) is introduced to enrich multi-scale features, so that its small target features can flow more fully between shallow and deep layers, thus improving the detection accuracy; Secondly, by dynamically adjusting the window size and increasing the attention weight of small target area, the feature loss caused by local self-attention mechanism is compensated; Furthermore, the Hungarian matching algorithm of DETR is improved, and Simota (Optimal Transport Assignment) strategy is introduced to match the prediction frame and the small target more efficiently, so as to improve the recall rate and detection accuracy of the small target and reduce the detection speed. Finally, multi-scale training and data enhancement strategies (random cropping, scaling, small target random resampling, etc.) are used to increase the number of small target samples, and progressive learning rate decline and data disturbance are used to enhance the perceptual ability of the model. The small target detection rate mAP of the improved algorithm model on Tiny Person and Wider Face datasets reaches 50.8% and 70.3% respectively. The results show that the improved DETR algorithm of Swin Transformer has excellent generalization ability, detection accuracy and robustness in different data sets and scenarios.