No SPIE Account? Create one
Editor Affiliations +
Jianguo Liu,1 Zhong Chen,2 Changxin Gao,2 Yang Xiao,2 Sheng Zhong,2 Hanyu Hong,3 Xiaofeng Yue1
1Huazhong Univ. of Science and Technology (China)
2Huazhong University of Science and Technology (China)
3Wuhan Institute of Technology (China)
1Huazhong Univ. of Science and Technology (China)
2Huazhong University of Science and Technology (China)
3Wuhan Institute of Technology (China)
Visit My Account to manage your email alerts.
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
This PDF file contains the front matter associated with SPIE Proceedings Volume 13085, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Selecting the appropriate matching area is one of the crucial tasks in scene matching systems, serving as the basis for trajectory planning. Currently, there are two main approaches for adaptation zone selection: traditional statistical feature-based adaptation zone selection methods and deep learning-based adaptation zone selection methods. The former relies on multi-threshold screening to choose adaptation zones, resulting in a complex and less accurate process. The latter employs model training to select adaptation zones, improving accuracy, but it tends to yield less stable results. Therefore, this paper utilizes the Key.Net network to extract keypoints and their feature descriptors within the scene region as the initial feature set. It employs support vector regression as the matching probability prediction model and uses threshold segmentation on the predicted probabilities to ultimately determine the adaptability of the region. Experimental results demonstrate that the adaptation zone selection method proposed in this paper offers higher accuracy and stability.
In long-term target tracking, target occlusion is an important factor that affects the performance of the tracker, and it is also an inevitable problem in long-term target tracking. Therefore, it is necessary to adopt corresponding strategies for target occlusion to improve the robustness of the tracker in corresponding scenarios. In response to the problem of target loss in complex background scenarios such as target occlusion in current filtering based tracking algorithms, this paper studies the use of masking strategies and adaptive response fusion methods to improve the accuracy and robustness of tracking algorithms.The experimental results demonstrate the effectiveness of various strategies in resisting occlusion in tracking algorithms. The experimental results of the overall algorithm indicate that the algorithm can effectively solve occlusion problems while balancing accuracy, speed, and robustness.
Recently, object segmentation of remote sensing images has achieved great progress in many fields, such as transportation, natural resource, ecology, et al. A lot of works mainly performed object segmentation in fully-supervised mode. However, training models in such mode usually need craft large-scale annotations, which is usually an expensive work and costs much time. In this paper, a novel semi-supervised network for object segmentation of remote sensing images is proposed, which is only fed with a small amount of labeled data and relatively more unlabeled data. Rather than using the same architecture as previous semi-supervised works, we exploit two networks with different architectures, i.e. CNN and Transformer, as the cross-supervised models. Moreover, three types of loss functions, namely fully-supervised loss, cross-supervised loss and consistency loss, are introduced to enhance the model's robustness. The effectiveness of our proposed method is evaluated on two annotated remote sensing datasets, outperforming several state-of-the-art semi-supervised approaches.
The growing number of satellites in orbit has resulted in a rise in defunct satellites and space debris, posing a significant risk to valuable spacecraft like normal satellites and space stations. Therefore, the removal of defunct satellites and space debris has become increasingly crucial. This article presents a segmentation method for satellite images captured in the visible light spectrum in space. Firstly, due to the lack of real space satellite images, we used optical simulation and Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (U-GAT-IT) to generate realistic space satellite images in the visible light spectrum and constructed a dataset. Secondly, we proposed an Attention Supervision Transformer Full-Resolution Residual Network (ASTransFRRN), which integrates transformer, attention mechanism and deep supervision, to segment satellite bodies, solar panels, and the cosmic background. Finally, we evaluated the proposed method using the U-GAT-IT simulated dataset and compared its performance with state-of-the-art methods. The proposed method achieved a segmentation accuracy of 90.77%±7.04% for satellite bodies, 90.61%±16.48% for satellite solar panels, and 97.66%±1.94% for the cosmic background. The overall pixel segmentation accuracy was 97.22%±2.78%, outperforming the compared methods in terms of segmentation accuracy. The proposed ASTransFRRN demonstrated a significant improvement in the segmentation accuracy of the main components of space satellites.
Aiming at the problems of low detection accuracy and slow detection speed of existing helmet detection models in complex environments, an improved YOLOv8n helmet wearing detection algorithm is proposed in this paper. Firstly, CBAM attention mechanism is added to the backbone network to strengthen the feature extraction capability of the backbone network. Then SimSPPF module is used to replace SPPF module in backbone network to improve the speed of model detection. Finally, DIoU-NMS is used instead of NMS to enhance the detection of occluded targets. The experimental results show that the average detection accuracy of the improved YOLOv8n algorithm is 94.85% and the detection speed is 109.11 FPS, which is 1.41% higher than that of the improved YOLOv8n algorithm.
When in an environment with insufficient light, RGB images cannot provide clear pedestrian information, while multispectral thermal imaging can provide accurate positioning information. Therefore, multispectral object detection is more reliable and robust in the open world. Most fusion methods for existing multispectral object detection are static, that is, after the network is trained, all the modalities of each piece of data are input into the network to perform static inference. However, when the light is good enough or the light is very dark, multi-spectral input will bring unnecessary noise interference and computational redundancy. Therefore, a dynamic fusion network (EDFNet) is proposed to selectively fuse RGB and thermal data, so that the network can efficiently perform multispectral fusion and improve the accuracy of object detection. The gating function included in the fusion network is able to perform modality-level decisions based on multimodal features, enabling dynamic inference on data. The Extensive experiments on multiple datasets demonstrate that the proposed fusion method can reduce the computation costs and obtain comparable detection performance.
A visual integrated position estimation by using simple Optical Flow (OF) is presented for visual robot’s precise navigation. This position estimation consists of a relative and an absolute position estimation. In the relative position estimation, an OF method inspired by honeybee-vision is improved to obtain robot’s relative position. It can measure robot’s translation precisely with regardless of robot’s rotation, while its position errors will be accumulated unavoidably. In the absolute position estimation, an OF-based Kalman Filter (KF) is used to iteratively correct robot’s position errors according the actual and the predicted OF, so the accumulated errors within the relative position estimation is eliminated. As the OF method and the OF-based KF adopt the same method to measure the OF information, the relative position and the absolute position estimation can share input signal and some executive stages for timesaving and costsaving. The experiments using a visual mobile robot in workshop have demonstrated the proposal’s efficiency.
In a military scenario, targets release decoy projectiles to achieve cover, then escape, evading the attack range of the guidance system. Therefore, recognizing target behaviors such as releasing decoys, escaping, and maneuvering, and locating the spatial position and behavior of the target can provide richer information for the precise targeting of the guidance system. Based on the behaviors of both the target and decoy, a behavioraware anti-interference algorithm is proposed. The core idea of this method is to link event recognition with spatio-temporal action localization tasks. It extracts spatio-temporal features of consecutive frames through a 3D backbone network and utilizes a 2D backbone network with a feature pyramid to extract spatial features at different levels of key frames for behavior detection at different scales. Additionally, a context extraction network is introduced in the key frames to achieve global context awareness, improving the accuracy of event recognition. Taking into account the correlation between spatial information, temporal information, and contextual information, this paper introduces channel attention mechanisms to fuse various feature components. Simultaneously, a non-local attention module is incorporated into the 3D backbone network to enhance the understanding of global content and the connection between different frames. Experimental results demonstrate that the proposed algorithm outperforms other algorithms, achieving a frame mean Average Precision (mAP) of 92.69% on our anti-interference dataset,enabling better resistance to interference.
In response to the issues of over-segmentation, excessive noise, and suboptimal segmentation results commonly encountered in existing image segmentation algorithms based on Markov Random Fields (MRF), this paper proposes an enhanced image segmentation algorithm that integrates MRF with the Sobel operator. The algorithm begins by performing an initial segmentation of the image using a Markov Random Field (MRF)-based method. Subsequently, an enhanced Sobel operator is employed to eliminate noise points and extract fine edge details from the image. Finally, the segmentation result is refined through pixel-wise operations with the edge detection result, resulting in the ultimate segmentation output. The evaluation of segmentation performance is conducted using the Dice coefficient and Mean Hausdorff Distance as assessment metrics. Through experimental analysis, the method in this paper can improve the segmentation effect of the traditional MRF segmentation algorithm, and has better performance and higher adaptivity.
The advancement and improvement of computer computing power have led to rapid development in the field of artificial intelligence. Intelligent information technology has also garnered attention and promotion in the manufacturing industry. However, existing research lacks consideration for the problems existing in the manufacturing field, mainly due to difficulties in acquiring rare scenario datasets. In the case of steel plate sorting, the detection of corner points plays a crucial role in production efficiency, particularly regarding the issue of steel plate adhesion caused by laser cutting. Considering the scarcity of seam-cut steel plate image data and the powerful generalization ability of cross-modal model GLIP, this study adopts the application approach of cross-modal large models from different fields. Firstly, we established a steel plate dataset with corner point information. And the GLIP model was fine-tuned using weakly supervised learning. Then, the inference results of the large teacher model are used as inputs to the lightweight student model YOLOv8, forming a framework for lightweight deployment in the industry. In our experiments, we first compared the effects of different amounts of data on the GLIP model and then demonstrated that the 20-shot model performs comparably to the full-shot model. In addition, YOLOv8 can recognize corner points that have not been manually annotated or labeled by the GLIP model, demonstrating excellent generalization performance. We conducted comparative verification, which showed the advantages of GLIP in terms of time consumption, manually labeled data volume, and deployment scale. This study fully utilizes a sparsely labeled dataset and cross-modal large models, integrating them with a lightweight object detection model to reduce labeling costs and improve production efficiency. Finally, we propose directions for future work.
Infrared small target detection (IRSTD) is a technique that has been developed to detect small and faint targets in cluttered backgrounds. It shows extensive use in medical diagnosis and aircraft navigation. However, due to the limited feature information and low signal-to-noise ratio, accurately detecting infrared targets and avoiding false alarms is a challenging task. Our study presents a Coarse-to-Fine Feature Extraction Network (CFFENet) for accurately classifying targets and backgrounds in blurred images. This network consists of a Two-stream Region Proposal Network (2s-RPN) and a Region Refinement Attention Network (RRAN), specifically designed to tackle the problem of false alarms. Specifically, 2s-RPN adds edge features to the RPN, in order to enable the network to extract discriminative information between targets and false alarms, so as to provide reliable coarse candidate boxes. In RRAN, firstly, attention-based encoder is used to extract the attention-aware features in candidate boxes, and predicted mask map is output through multi-layer perception (MLP). Subsequently, the refined feature map is further processed by attention encoder, and MLP is utilized to generate the ultimate detection results. The effectiveness of our approach is validated through comparative experiment, which shows its superiority over representative state-of-the-art IRSTD methods.
Deep neural networks designed for point cloud processing are challenging due to their regional irregularity and lack of order. Meanwhile, the core self-attention network of the Transformer has a very significant effect on natural language processing, and at the same time makes significant contributions to image analysis tasks such as image classification and object detection. The transformer illustrates great potential in image processing and also has inherent permutation invariance when processing a series of points. However, it has shortcomings in extracting the effective local features from input points which plays an important part of deep learning. To address the issue of disregarding local information, the process of extracting local features from the point cloud commences by solely focusing on the characteristics of points within the local neighborhood and gradually adding various relationships between points. Although the segmentation effect is enhanced to varying extents, it also leads to a considerable number of computations and neglects some contextual information at the local level. Given this, we propose to enhance the local feature input by incorporating the local geometric information of points, directly obtain relevant local feature information through mathematical calculation change, and no longer perform complex spatial change processing. The proposed local geometric feature enhancement network, as well as other popular approaches, were evaluated on a large public database S3DIS. The experimental result demonstrates that the proposed model achieves 77.4 and 70.0 under the mAcc and mIoU metrics that outperform other competing approaches by a significant margin.
The automatic inspection of photovoltaic panels based on infrared images is one of the important tasks in the daily maintenance of photovoltaic panels in photovoltaic power plants. In this paper, a defect detection method of infrared thermal image photovoltaic panel based on morphological segmentation is proposed. First of all, according to the infrared characteristics of the photovoltaic plant station and the morphological characteristics of the photovoltaic panel, the accurate region of each photovoltaic panel is determined in the inspection image. Then, statistical method is used for feature analysis to determine whether there are defects in photovoltaic panels, and then the defective photovoltaic panels are segmented and different defects are classified according to morphological characteristics, and finally the defect detection and classification of photovoltaic panels are realized. Through the experiment in the actual inspection image of photovoltaic plant station, it is verified that the proposed method has a high accuracy of defect detection and recognition.
Image-based photovoltaic panel inspection has become one of the important tasks of photovoltaic power generation. Due to the repeated patterns of photovoltaic string numbers and different specifications, it is easy to cause false detection by traditional template matching or model training methods. In order to solve this problem, this paper proposes a photovoltaic panel string detection method based on prior knowledge and feature learning. First, the infrared inspection image is segmented for the first time by using the difference in radiation characteristics between the photovoltaic string and the environmental background. Subsequently, according to the local features of the border around the PV string template image, feature matching is performed in the segmented foreground area, and then determine the exact position of the string through the closure and integrity of the matching area. Experiments show that this method has high detection accuracy and has good adaptability to complex and diverse string specifications.
Multiple Object Tracking (MOT) which is an important research topic in computer vision, plays an important role in the fields of automatic driving and area monitoring. In the object dense scene, there is a phenomenon of occlusion between a large number of object. A large number of locally visible object lead to the degradation of the tracking performance of the multi-object tracking algorithm in this scene. In this paper, we propose a method that combines high-order modeling, future location feature revision, and conceptual features to address the missing object re-matching problem. Higher-order modeling enables more accurate approximations of actual functions. It modifies the current prediction with one or more features of future locations. The overall feature of one object is composed of multiple local conceptual features. The object can be expressed by the combination of several concept features when it is greater than a certain similarity threshold. Experimental results show that the above three optimization mechanisms can effectively alleviate the problems of multiobject tracking algorithms in dense object scenes, and the optimized algorithm has significantly improved accuracy in multiple tracking scenarios.
The static world assumption is common in most Simultaneous Localization and Mapping (SLAM) algorithms. However, this assumption introduces errors in real-world environments because the real-world is non-static. Furthermore, explicit motion information of the surroundings helps with decision making and scene understanding. In this paper, we present a robust dynamic SLAM for RGB-D cameras that is capable of tracking rigid objects in a scene and generating their 3D bounding box proposals without any prior knowledge, and incorporate this information into the SLAM formulation. As a result, it improves the accuracy of SLAM trajectories in dynamic environments. To achieve this, our system combines instance segmentation and dense optical flow to detect and track dynamic objects. We evaluate our algorithm in TUM and KITTI datasets. The results show that the absolute trajectory accuracy of our system can be improved significantly compared with ORB-SLAM2. We also compare our algorithm with DynaSLAM and VDO-SLAM, which are also designed for dynamic environments, and achieve significant improvement in counterparts.
As a crucial auxiliary technique in inertial navigation, scene matching has been widely applied in aircraft navigation and guidance. To enhance the adaptability and efficiency of scene matching algorithms in complex environments, existing approaches have gradually shifted towards heterogeneous scene matching, such as optical/ infrared scene matching based on deep learning features, with a focus on point features. However, due to the complex and diverse nonlinear distortions between optical and infrared images, the matching accuracy often fails to meet the practical requirements of navigation and positioning. In view of this problem, an EPM-Net algorithm based on edge features is proposed to improve matching performance by extracting more stable and discriminative edge point descriptors. Experimental results on infrared datasets demonstrate that this method achieves an average matching accuracy improvement of more than four times compared to traditional and existing deep learning methods.