ROD-YOLO: improved YOLOv8 semantic segmentation of obstacles in complex road scenes based on Swin Transformer

Baoxiang Jiang; Jingbo Xia; Tairui Meng; Yusong Hu; Kai Zhang; Daoqin Lei

doi:10.1117/12.3034934

11 July 2024 ROD-YOLO: improved YOLOv8 semantic segmentation of obstacles in complex road scenes based on Swin Transformer

Baoxiang Jiang, Jingbo Xia, Tairui Meng, Yusong Hu, Kai Zhang, Daoqin Lei

Author Affiliations +

Proceedings Volume 13210, Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024); 1321022 (2024) https://doi.org/10.1117/12.3034934
Event: Third International Symposium on Computer Applications and Information Systems (ISCAIS 2023), 2024, Wuhan, China

Abstract

Deep learning has found extensive applications in the domain of autonomous driving. However, in complex road environments, diverse obstacles such as irregularly shaped objects, children's toys, animals, and other unconventional entities pose significant challenges. Convolutional Neural Network (CNN)-based road detectors struggle to satisfy real-time demands owing to the complexities associated with accommodating multi-scale and intricate backgrounds. In this paper, for the road obstacle detection problem in the field of autonomous driving, we propose a YOLOv8-based detection method, ROD-YOLO (Road Obstacle Detection), which has a better multi-scale adaptability, and the model is used to segment the obstacles on the road. Compared to the original network, ROD-YOLO adds a detection header, and this paper proposes a method to add Transfomer with GAM attention mechanism to the C2f module. In order to make the model better adapt to multi-scale obstacles, we add a new small-scale segmentation header and a special feature fusion part. Specifically the new GlobalCSP C2FGAM module is proposed with the C2STR module that incorporates the Transfomer idea to obtain faster segmentation speed and better accuracy for different obstacles, and the algorithm performs well in real-time object segmentation tasks and is able to maintain a high level of accuracy. It improves the mAP by 1.9% compared to the original network YOLOv8, which significantly improves the segmentation of small object samples. The research results in this paper are of great significance for improving the safety and efficiency of self-driving vehicles.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Baoxiang Jiang, Jingbo Xia, Tairui Meng, Yusong Hu, Kai Zhang, and Daoqin Lei "ROD-YOLO: improved YOLOv8 semantic segmentation of obstacles in complex road scenes based on Swin Transformer", Proc. SPIE 13210, Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024), 1321022 (11 July 2024); https://doi.org/10.1117/12.3034934

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Object detection

Roads

Transformers

Detection and tracking algorithms

Data modeling

Semantics

Target detection

Show All Keywords

Keywords/Phrases

Search In:

Publication Years