End-to-end multitasking network for smart container product positioning and segmentation

Wenzhong Shen; Xuejian Cai

doi:10.1117/1.JEI.33.5.053009

4 September 2024 End-to-end multitasking network for smart container product positioning and segmentation

Wenzhong Shen, Xuejian Cai

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 5, 053009 (September 2024). https://doi.org/10.1117/1.JEI.33.5.053009

Abstract

The current smart cooler’s commodity identification system first locates the item being purchased, followed by feature extraction and matching. However, this method often suffers from inaccuracies due to the presence of background in the detection frame, leading to missed detections and misidentifications. To address these issues, we propose an end-to-end You Only Look Once (YOLO) for detection and segmentation algorithm. In the backbone network, we combine deformable convolution with a channel-to-pixel (C2f) module to enhance the model feature extraction capability. In the neck network, we introduce an optimized feature fusion structure, which is based on the weighted bi-directional feature pyramid. To further enhance the model’s understanding of both global and local context, a triple feature encoding module is employed, seamlessly fusing multi-scale features for improved performance. The convolutional block attention module is connected to the improved C2f module to enhance the network’s attention to the commodity image channel and spatial information. A supplementary segmentation branch is incorporated into the head of the network, allowing it to not only detect targets within the image but also generate precise segmentation masks for each detected object, thereby enhancing its multi-task capabilities. Compared with YOLOv8, for box and mask, the precision increases by 3% and 4.7%, recall increases by 2.8% and 4.7%, and mean average precision (mAP) increases by 4.9% and 14%. The frames per second is 119, which meets the demand for real-time detection. The results of comparative and ablation studies confirm the high accuracy and performance of the proposed algorithm, solidifying its foundation for fine-grained commodity identification.

Citation Download Citation

Wenzhong Shen and Xuejian Cai "End-to-end multitasking network for smart container product positioning and segmentation," Journal of Electronic Imaging 33(5), 053009 (4 September 2024). https://doi.org/10.1117/1.JEI.33.5.053009

Received: 14 May 2024; Accepted: 21 August 2024; Published: 4 September 2024

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
16 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Image segmentation

Feature fusion

Neck

Convolution

Head

Target detection

Education and training

Show All Keywords

Keywords/Phrases

Search In:

Publication Years