4 September 2024 End-to-end multitasking network for smart container product positioning and segmentation
Wenzhong Shen, Xuejian Cai
Author Affiliations +
Abstract

The current smart cooler’s commodity identification system first locates the item being purchased, followed by feature extraction and matching. However, this method often suffers from inaccuracies due to the presence of background in the detection frame, leading to missed detections and misidentifications. To address these issues, we propose an end-to-end You Only Look Once (YOLO) for detection and segmentation algorithm. In the backbone network, we combine deformable convolution with a channel-to-pixel (C2f) module to enhance the model feature extraction capability. In the neck network, we introduce an optimized feature fusion structure, which is based on the weighted bi-directional feature pyramid. To further enhance the model’s understanding of both global and local context, a triple feature encoding module is employed, seamlessly fusing multi-scale features for improved performance. The convolutional block attention module is connected to the improved C2f module to enhance the network’s attention to the commodity image channel and spatial information. A supplementary segmentation branch is incorporated into the head of the network, allowing it to not only detect targets within the image but also generate precise segmentation masks for each detected object, thereby enhancing its multi-task capabilities. Compared with YOLOv8, for box and mask, the precision increases by 3% and 4.7%, recall increases by 2.8% and 4.7%, and mean average precision (mAP) increases by 4.9% and 14%. The frames per second is 119, which meets the demand for real-time detection. The results of comparative and ablation studies confirm the high accuracy and performance of the proposed algorithm, solidifying its foundation for fine-grained commodity identification.

© 2024 SPIE and IS&T
Wenzhong Shen and Xuejian Cai "End-to-end multitasking network for smart container product positioning and segmentation," Journal of Electronic Imaging 33(5), 053009 (4 September 2024). https://doi.org/10.1117/1.JEI.33.5.053009
Received: 14 May 2024; Accepted: 21 August 2024; Published: 4 September 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Feature fusion

Neck

Convolution

Head

Target detection

Education and training

Back to Top