PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13257, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image Recognition and Processing Technology Applications
High-resolution optical images are susceptible to atmospheric influences during their formation, and thin clouds are the most important influencing factor. Feature information loss due to thin-cloud coverage is a common problem. To solve this problem, a thin cloud removal method based on multi-scale feature fusion for high-resolution optical remote sensing images is proposed; this method can capture more image details. The multi-scale feature fusion module is introduced into the improved structure of the generative network model. By fusing multiple image scale information, the model can enhance the ability of extracting features such as shape, texture and colour. Experimental results on high-resolution remote sensing dataset show that the method achieves 29.4758 and 0.8902 for the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. Comparing the method with existing methods, it proves that the method has a more accurate effect in remote sensing imagery thin cloud removal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The integration of deep learning methodologies within the realm of digital watermarking has assumed a pivotal role in safeguarding the intellectual property rights associated with images within contemporary contexts. By leveraging an end-to-end architecture comprising noise layers and codec components, the resilience of watermarks across diverse environmental conditions is upheld. To augment the fidelity and perceptibility of watermark representations, a dual encryption paradigm is embraced, integrating a dual attention framework alongside a Least Significant Bit (LSB) processing module. This encoding scheme amalgamates both spatial and channel attention mechanisms, thereby fortifying the resilience of the model. The channel attention mechanism enables precise embedding of watermarks within critical image channels, while the spatial attention mechanism facilitates seamless integration within intricate textural regions. Furthermore, the LSB method is employed for secondary encryption of image data, ensuring both concealment and robustness of the watermarking technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The continuous advancement in the field of deep learning has increasingly drawn attention to underwater image enhancement as a pivotal area of research in underwater robotics. Deep learning techniques have demonstrated significant progress in this domain, furnishing robust tools to tackle image quality challenges in underwater environments. This review presents a comprehensive overview of various deep learning applications in underwater image enhancement, elucidating the scope of each model's utilization while also delineating current challenges and proposing future research directions within the field. The primary objective of this review is to consolidate the latest research advancements in underwater image enhancement through deep learning methodologies, providing researchers with an up-to-date understanding and reference framework to stimulate further progress in this domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing images exhibit the characteristics of large intra-class variances and small inter-class variances, making remote sensing image segmentation a distinctive semantic segmentation task. Currently, the mainstream approach for remote sensing image segmentation is to combine CNN and transformer to enhance the network's feature extraction capability. However, simply concatenating CNN and transformer modules does not effectively leverage the information extraction advantages of each module. In this paper, we propose a U-shaped architecture that utilizes a multi-scale transformer with position-aware attention for remote sensing image segmentation (MTPANet). We specifically design a novel dual-branch self-attention module to extract features. The key feature of this module is the separate incorporation of position-aware attention and multi-scale components into both the CNN and transformer, followed by their fusion. This approach enhances the network's global attention capability and local perception ability, effectively addressing the inter-class and intra-class variability issues. Lastly, we validate the effectiveness and robustness of MTPANet on publicly available datasets, achieving precise segmentation of remote sensing images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce a novel color constancy method designed specifically for color adjustment in scenarios involving multiple light sources. By employing a Bayesian estimator, our method calculates the probability that an image accurately reflects the actual light source in a multi-light environment. We achieve this accuracy by identifying the real light source via linear weighting. The estimated real light source is then refined by integrating a loss function and a discriminator, aiming to restore the image's original color accurately. Experimental results validate that our approach surpasses traditional methods in effectively recovering the original color of images under various lighting conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous blind image super-resolution (BISR) approaches based on iterative methods tend to be relatively time-consuming and cause large amounts of computation, to address the problem, in this paper, we divide BISR into two stages: firstly, pre-kernels are pre-estimated and pre-SR images are pre-restored from LR images. After that, based on the features of different depths in the pre-SR network, we obtain the more accurate blur kernels, which help restore better SR images. We also improve the SR network with single way of fusion degradation by designing expert modules (EM) to deal with multiple degradation and spatial-channel fusion method to deepen the fusion of the degradation information and features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to improve the performance of graph data anomaly detection. To address the limitations of traditional methods in terms of accuracy and robustness of anomaly detection, this paper proposes a new method based on randomconfiguration networks and graph data enhancement techniques. The method utilizes random configuration networks forfeature learning on graph data, combined with graph data augmentation methods to enhance the characterizationof thedata to improve the performance of anomaly detection. This paper conducted experimental validation on several publiclyavailable datasets, and the results show that our method outperforms traditional methods in both anomaly detectionperformance. The contribution of this research is to propose a novel and effective approach while bringing randomizedconfiguration networks into graph data anomaly detection is vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In response to the problem of striped noise in side-scan sonar images caused by oceanic and electronic noise interference, as well as the issues of high computational complexity, insufficient universality, and blurry imaging effects in traditional methods for processing striped noise in side-scan sonar images, a fringe noise suppression method for side-scan sonar images based on the improved Criminisi algorithm is proposed in this paper. Considering that side-scan sonar images are mainly composed of background and repetitive textures, the algorithm replaces global image search with setting appropriate search radii for local search, thereby improving computational efficiency. Through experiments conducted on simulated and real striped noise images, and comparisons with four classic methods, Fourier Transform, which significantly improves denoising effects, simulated noise image 1 shows an increase in PSNR and SSIM values of 2.2% and 1.79% respectively, while simulated noise image 2 shows an increase of 5.21% and 5.87% respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the deficiencies of the Slap Swarm Algorithm in terms of solution accuracy, convergence speed and global search capability, an improved Slap Swarm Algorithm is proposed. Firstly, an improved Tent chaotic sequence is used for population initialization; secondly, adaptive weight factor and Levy flight strategy are used to update the leader position; meanwhile, non-uniform Gaussian variation operator is used to update the follower position. In the experimental results, it is shown that the improved algorithm has higher accuracy as well as convergence; in the application, the selection of the optimal average fitness value (image segmentation threshold) is faster and more accurate, and the segmentation threshold is more appropriate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three dimensional geographic information system (GIS) technology provides accurate three-dimensional representation of terrain and building structures by integrating geographic data, software, and advanced analytical tools. This technology not only provides high-precision and high-resolution data capture capabilities, but also supports real-time data processing, making it particularly important in construction surveying. In construction surveying, the application scope of 3D GIS technology is extensive, covering the entire process from the initial stage of project planning to construction monitoring, and then to facility management in the later stage. This article comprehensively explores the various applications of 3D GIS technology in construction surveying, including determining the location of infrastructure, terrain analysis of construction sites, 3D monitoring of construction progress, accurate calculation of engineering quantity and material requirements, and facility management for later operation and maintenance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Within the domain of ferrite ring production, the vast majority of available data consists of repetitive, defect-free samples, while data containing defects are considerably rare. To tackle this issue, a deep learning-based defect detection strategy tailored for small sample scenarios has been introduced. The approach begins with the precise extraction of defective areas using image masking technology, followed by the processing of these defect images through regional enhancement transformation techniques to create new instances of defects. These newly generated defect images are employed to build an offline defect sample library. During the model training phase, defective samples are randomly selected from this library and merged with the foreground of normal samples to generate new defect images for training. The deep learning network trained using this methodology demonstrates enhanced capability in distinguishing between defective and defect-free data, with experimental results showing an increase of 3.87% in the Area Under the Curve (AUC) metric. This approach not only addresses the challenge of scarce defect data but also significantly improves the performance of defect detection in small sample environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The narrow energy bins of PCD-CT (Photon Counting Detector based Computed Tomography) make it hard to sufficiently collect incident photons. CT is performed at routine dose when dose constraint is met since relatively high doses of X-ray raise concerns about radiation risks. The above two points may both result in a Low-SNR (Low Signal-to-Noise Ratio) situation of projection data, leading noisy results. Besides, the SFA (spectral-feature anisotropy) exists in images corresponding different energy channels, it reflects the pixel saliency and spatial consistency of images related to different energy channels. The compatibility between mechanism of guided filter and the SFA prompted us to proposed a spectral CT reconstruction approach synergizing with both ADGF (Anisotropic Diffusion Guided Filter) and DAI (Dynamic Average Image) to reconstruct satisfying spectral CT image. ADGF was realized by fusing guided filter and anisotropic filter and was incorporated into the iterative process as a regularizer, making it more powerful in noise suppression while maintaining features. To further make full use of redundant information among different energy channels, DAI was utilized to induce channel-wise reconstruction. Experiments show that compared with results of other methods, ours achieve decent performance in visualization and quantitative indexes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transmission lines are an important guarantee for the continuous power supply of power grids. In order to ensure their safe operation and timely and effectively detect potential heating anomalies, this paper proposes an infrared image recognition algorithm for transmission equipment based on improved Cascade R-CNN. The introduction of deformable convolutional networks and classifier enhancement algorithms improves the target recognition capability of infrared images of power transmission equipment. Contrast enhancement and GAN networks are used to improve image quality and alleviate sample imbalance. Experiments show that the algorithm can achieve an accuracy rate of over 87% for various component recognition and over 98% for insulator recognition, greatly improving the quality of detection, providing strong support for the diagnosis of heating defects, and having important application value for the safe maintenance of power grids.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent recognition technology, as an important technical tool, has played a significant role in many fields. In this paper, for the issue of oracle bone image processing, we first conducted image pre-processing, including size adjustment, normalization, and data enhancement steps, to improve image quality and highlight the oracle bone information. Then, the YOLOv5 model is utilized for oracle bone image segmentation, achieving high recognition accuracy after 50 epochs of training. Finally, the model was employed to automatically segment the original oracle bone topography images for individual characters, with an average processing time of only 810.8ms, demonstrating efficient processing speed. In total, 1447 oracle bones were automatically detected and segmented, achieving a high level of accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital watermarking systems based on Third-party Authentication Authorities (TPA) provide exceptional usability and efficacy in copyright protection. Yet, they encounter security and trust issues. A digital blind watermarking technique for color photos integrating Schur decomposition and Visual Cryptography (VC) is presented to improve watermark robustness and TPA's security. Watermark data is first preprocessed using (2, 2) VC to create two shared images. Using this method, two shared images must be superposed to recover the original watermark. The numerical stability of Schur decomposition strengthens the watermark's resistance against geometric attacks. Tests validate the system's enhanced security and resistance to common threats.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current semantic segmentation neural networks have limitations in recognizing unseen classes. To address this problem, the recent studies draw attentions on zero-shot semantic segmentation. The general zero-shot learning works usually focus on image recognition, which requires to extract transferable features across classes from images. However, zero-shot semantic segmentation needs to transfer knowledge at the pixel level. As semantic classes are defined for the whole objects, it is intuitive that the image-level features are more transferable across classes than the pixel-level features. In this work, we propose the Class2Seg approach for zero-shot semantic segmentation based on the above intuition. The core idea of Class2Seg is to learn transferable image-level features to guide zero-shot semantic segmentation. Our approach contains two branches. One is the image-level classification branch, and the other is the semantic segmentation branch. A cross-task correlation layer is designed to fuse the transferable image-level features into the semantic segmentation branch to promote information transfer from source classes to target classes at the pixel level. Extensive experiments on the Pascal-VOC dataset clearly support the effectiveness of our approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The technology for non-contact vibration measurement using video recorded by cameras has rapidly developed in recent years, however, existing methods are unable to effectively handle unstable video sources. This paper proposes a phase vibration measurement algorithm based on digital image stabilization, which employs the Kanade–Lucas–Tomasi feature tracking method to track feature points in the input video and derive affine transformation matrices between adjacent frames. Additionally, it utilizes the random sample consensus algorithm to eliminate outliers and smooth motion trajectories, applies the Hilbert transform to extract the global phase information of the stabilized video, uses the amplitude information to weight the phase information, and designs an adaptive filter to remove noise. Numerical simulations and experiments demonstrate the effectiveness and accuracy of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are many special devices and materials in large optical instruments, among which special chemicals stored in spherical metal containers are only stored in low temperature metal containers due to their special chemical properties. In daily use, it is impossible to know the amount of material left in the container by measurement. It is filled before each use, and each filling will cause material loss. Therefore, this paper presents a method to detect liquid level and calculate the remaining capacity after processing the image generated by X-ray camera based on computer vision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When ships navigate, they often encounter rain, which can reduce the resolution of both onboard intelligent perception systems and shore-based monitoring systems. This reduction compromises ship target detection accuracy, posing a significant threat to maritime safety. To address this issue and ensure safe maritime navigation, this study introduces an enhanced maritime image deraining model based on MPRNet. By improving image quality during rainy conditions, the model supports subsequent ship target detection tasks, ensuring safe vessel navigation. In this study, MPRNet is used as the baseline model. A Pixel Convolution (PConv) structure is added after each stage's channel attention module, creating the Pixel and Channel Attention Block (PCAB) to effectively reduce residual raindrops in images under heavy rain. Additionally, a Total Variation Regularizer Term is included in the loss function to smooth derained images, addressing coarse details. Extensive comparative experiments show that the proposed model significantly outperforms existing models. Compared to the baseline, the proposed model increases PSNR by 2.24 and improves SSIM by 0.045.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to overcome the strong local perception and weak global perception characteristics of traditional convolutional neural networks (CNNs), this paper proposes a multi-spectral guided CNN-Transformer hybrid road extraction model (M2SA-MSNet) which introduces a multi-scale multi-head attention mechanism so as to capture long-range dependencies in the encoding stage. Experimental results suggest that, compared to the current state-of-the-art road extraction methods, M2SA-MSNet achieves better recognition performance and exhibits good accuracy and completeness in road extraction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to extract target information from massive temperature self-recording paper archive images, improve the utilization rate of self-recording paper archives, and solve the problem of poor continuity of target edge feature points and difficulty in forming closed areas after image segmentation using traditional Canny operators, the Bernsen algorithm addresses the problems of noise or uneven lighting in the image, as well as the loss of target image details caused by manually recorded handwriting, pencil scratches, and ink pollution marks in self-recording paper archive images. Based on the characteristics of self-recording paper archive images, this paper proposes a method that combines Gaussian filtering Canny operator and Bernsen algorithm to segment the distribution of feature images, achieving effective segmentation of temperature self-recording paper archives. The experimental results show that the method proposed in the paper can accurately achieve target segmentation of coordinates and temperature curves in temperature self-recording paper archive images, avoiding the impact of edge features such as handwriting, pencil scratches, and ink contamination marks on the image, and achieving good segmentation results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Target Sensing and Detection Technology
Two central problems often faced when shooting at high altitudes from UAVs are many small and dense targets and complex background noise interference. In YOLOv5, due to multiple downsampling, the feature representation of small targets becomes weaker and may even be masked in the background. The Feature Pyramid Network (FPN) diminishes the detection accuracy of small targets due to its basic feature concatenation method, which underutilizes multiscale information and introduces extraneous contextual details. To solve the above problems, we propose a simple and effective improved model called multiscale channel interactive spatial perception yolov5 (MCISP-YOLOv5). First, we design a multiscale channel interaction spatial perception(MCISP) module, which recalibrates the channel features in each scale by interacting with information from different scales, facilitates the information flow between the shallow feature geometric information, and the more profound feature semantic information, and uses adaptive spatial learning to realize spatial perception so that the model can focus on the foreground objects better. Second, we replace the traditional up-sampling module with the Content-Aware ReAssembly of Features (CARAFE) operator for feature extraction, which enhances the feature characterization ability after multiple down-sampling, better recovers the detailed information. Finally, we added an additional, shallower depth feature map as the detection layer in YOLOv5. The supplementary feature map enhances the detection efficacy for small objects without adverse effects on the detection capabilities for other sizes of targets. Extensive experiments on the publicly available VisDrone2019 dataset show that the introduced model exhibits substantial enhancements in performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to ensure the stability of the construction effect, it is extremely necessary to propose accurate detection of compaction information. Therefore, a research on real-time digital compaction detection method for airport earthwork engineering is proposed. After collecting effective digital image information around the target measurement point, the grayscale morphology method is used to enhance the image grayscale, and it is imported into the minimum binomial fitting function to determine the grayscale center. Within the maximum radius range of the grayscale parameter, the specific compaction parameter is calculated based on the pixel value corresponding to one pixel step. In the test results, the error between the detection results of dry density at different measurement points and the actual value is stable within 0.03g/cm3, and the error of compaction detection results is stable within 0.5%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Based on the Visual Transformer (ViT), the infrared dim and small target detection is a pioneering task in the field of deep learning. Existing ViT methods applied in the Unet global network utilize a single attention mechanism for each layer of the network, directing the network's focus towards the regions of dim and small targets in the images. However, these methods neglect the correlation between the encoding and decoding paths of the Unet network, failing to fully exploit the powerful feature extraction capabilities of ViT. Consequently, there is a continuous increase in the false negative and false positive rates of dim and small target detection.This paper proposes an improvement to the Unet-type network by introducing a novel multi-level ViT dim and small target detection network—HVUNet. Specifically, we design low-level feature extraction residual blocks to extract low-level features from each level of the image. Furthermore, we introduce three types of multi-head attention modules in the encoding, decoding, and concatenation paths respectively, to capture the long-range dependencies of the three paths. This overcomes the challenge of significant differences in the distribution of background and target information in infrared dim and small target images.Experimental results on public datasets demonstrate that our HVUNet significantly reduces the false negative and false positive rates of small target detection, thereby improving detection probability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The proliferation of malware variants, fuelled by sophisticated packaging, polymorphism and emulation techniques, has escalated the threat to Internet security. These evolving malware variants are often able to evade traditional detection methods and render them ineffective. Visualisation techniques are able to present complex data in an intuitive manner, and thus have become a promising tool in the field of malware analysis. However, current deep learning-based visualisation techniques tend to suffer from texture feature variations during the pre-processing phase, thus limiting their effectiveness when dealing with complex malware samples. To address this challenge, our research proposes a novel visualisation-based approach for lightweight and fast malware classification for the Windows platform. This approach utilises pixel-filling techniques to mitigate the variation of texture features during preprocessing and incorporates modular design principles to improve the saliency of key features. Experimental results demonstrate the superiority of our approach, achieving 99.14% accuracy on the widely used Malimg dataset, outperforming existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, power grid enterprises have accumulated a large amount of user behavior data in their operations. Effective monitoring and analysis of user operation behavior in complex systems is crucial to ensure the safety and high quality operation of power grids. To address this need, this paper proposes a user abnormal behavior detection technique based on graph matching method. The technique transforms complex user behaviors into graph structures by constructing business process operation graphs and user temporal behavior links, and further converts them into analyzable numerical vectors. Based on these vectors, a Support Vector Machine (SVM) learning model is used and trained to achieve automatic user abnormal behavior detection. Finally, in-depth analyses and weighted calculations are performed to optimize the detection results of business abnormal behaviors of different systems to improve the accuracy and efficiency of detection. The results show that the method can effectively identify and analyze the abnormal behaviors of grid users, which is of great significance for enhancing the security and stability of the grid system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advancement of technology and medicine, X-ray CT has been widely used in medical diagnosis, treatment, and monitoring of diseases. This article aims to further optimize the performance of X-ray fluorescence CT system by studying the relationship between contrast-to-noise ratio and the concentration and size of regions of interest (ROI). Using Geant4 XFCT simulation modeling, this study analyzes the impact of ROI concentration and size on the quality of XFCT reconstructed images. To assess the influence of different ROI concentrations on imaging performance of the X-ray fluorescence CT system, the simulation modeling system was adjusted for different ROI sizes, and twenty experimental groups were conducted. The results indicate that ROI concentration and size have a significant impact on imaging quality. Under specific conditions of concentration and size within the region of interest, optimal imaging effects of X-ray fluorescence CT can be achieved. These two factors interact, and when adjusting the parameters of ROI concentration and size to optimize imaging quality, it is necessary to consider the changes in both parameters rather than just the influence of a single parameter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Autonomous landing technology for UAVs is an important means to ensure its safety, while the on-board point cloud is often sparse and inhomogeneous. The paper proposes a neighbourhood geometric feature-enhanced point cloud semantic segmentation neural network for the problem of the difficulty of extracting local geometric features of the point cloud. The algorithm introduces the FPFH feature information into the neighbourhood feature encoding process of the point cloud semantic segmentation network PointNet++, which enhances the network's description of the geometric relationship between points and points in the local neighbourhood. At the same time, the farthest point sampling method used in the original sampling process of the network is changed to random sampling to compensate for the encoding time of the geometric features. Then, the network of this paper and several other networks are trained and tested on the outdoor public dataset Semantic3D. The final results show that the overall accuracy and Mean Intersection over Union of the segmentation results of the network of this paper are 87.9% and 61.2%, respectively, which are in line with the requirements of the autonomous landing and descent applications of UAVs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A dynamic feature removal method combining the deep learning network YOLOv5 (You Only Look Once Version 5) and geometric constraints was proposed to address the poor positioning accuracy and robustness of the current ORB-SLAM3 system in dynamic scenarios, which could not meet the positioning needs of mobile robots in such scenarios. A dynamic feature point detection and removal thread was embedded in the tracking thread to eliminate the impact of dynamic feature points on camera pose estimation, thereby improving the accuracy and robustness of the SLAM system. The experiment used publicly available TUM datasets to validate the improved algorithm, and the experimental results showed that the improved algorithm had significant improvements in localization accuracy and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Addressing the challenges that latest 3D convolutional neural networks face when recognizing actions, such as the excessive number of parameters, disregarding temporal dynamics, and taking too long to process, we propose a lightweight model for action recognition based on depth-separable convolution with LSTM (Long Short-Term Memory). First, Human behaviour and action regions are derived from video sequences through data pre-processing, followed by normalized pre-processing operations. Then, a rich representation of the activity features is obtained through the use of a deep-separable convolutional model that extracts spatial features of the behaviour and action regions. Next, the temporal dynamics of the actions are captured by capturing the extracted feature sequences in the LSTM network. The LSTM network's use of memory cells and gating mechanisms allows for efficient modeling of time series data. Finally, the action classification and recognition is done by utilizing a softmax classifier at the output layer of the network. The model is capable of capturing subtle changes in actions due to its integration of spatial and temporal features. Through the combination of a deep separable convolutional network and LSTM network, the proposed model has better expressive and generalization abilities, and is able to achieve better performance and results in the field of action recognition. The experimental results show that compared with the traditional action recognition algorithms, the algorithm action recognition rate is up to 95.41% on the UCF101 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To address issues such as fuzzy boundaries, lack of small polyp detection, and image fragmentation in colon polyp segmentation, an enhanced TransFuse model is proposed. By refining the BiFusion and up modules and effectively combining deep and shallow feature information, the improved TransFuse model achieves more accurate colon polyp segmentation. Experimental results on the Kvasir-SEG dataset demonstrate enhancements in accuracy, with increases of 0.7%, 2.3%, and 1.1% in Acc, IOU, and Dice metrics, respectively. Compared to the original model, this approach detects and segments colon polyps more accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid development of Brain Computer Interface technology has led to its application in the fields of health monitoring and rehabilitation. However, the precise identification of EEG signals continues to be a pivotal challenge. In order to solve the problem of low identification rate of visually stimulated EEG signals, this paper designs multimodal visual stimulation EEG signal identification method based on Convolutional Spiking Neural Network. The four directions of the images are employed to stimulate the brain to produce EEG signals. The EEG signal identification process is as follows. Firstly, the dataset is constructed by obtaining and pre-processing EEG signals for multimodal visual stimulation with MI and SSVEP. Secondly the C-SNN structure is designed and the network model parameters are optimized. The experimental results show that the C-SNN network model designed in this paper can effectively identify the fused MI and SSVEP EEG signals, with an identification accuracy of 95%. The advancement of Brain Inspired intelligent technology has facilitated the development of Brain Computer Interaction technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the low efficiency of manual inspection and poor performance of traditional algorithms, an improved YOLOv8 based detection algorithm for surface defects of rolled steel was proposed. For YOLO v8 algorithm, An EMA attention module has been appended to the concluding layer of the backbone network, and the original CIoU was replaced with WIoU loss function. In view of the poor quality of some samples in the dataset, Wise-IoU V1 was selected, which avoided the problem of excessive punishment of low-quality samples by the original loss function to the greatest extent. The average accuracy is 80.4% and the recall rate is 74.7%. Compared to the original network, the average accuracy is improved by 4%. Finally, the paper discusses the attempt of extreme lightweight of the model, reducing the size of the model with a slight loss of mAP. It is possible to deploy the model on low performance devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning, with its data-driven advantages, achieves robustness beyond that of traditional algorithms. The integration of deep learning with visual-inertial odometry (VIO) has been a prominent research topic. However, a mature integration solution has yet to emerge. In this paper, we propose SPL-VINS, which combines the deep learning-based feature point detection algorithm SuperPoint with the Vins Mono. Additionally, we add line features into Vins Mono and propose a non-maximum suppression(NMS) method for line features. The residual of line features is modeled in the form of point-to-line distance. Experimental results on the public dataset Euroc demonstrate a significant reduction in absolute translation error and rotation error compared to Vins Mono.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate extraction of bare soil is crucial for land management, soil conservation, and assessment of natural disaster risks. With the continuous development of remote sensing technology, remote sensing imagery has become an important means of acquiring information about the Earth's surface. Its characteristics of multi-source data, and global coverage provide efficient methods for bare soil extraction. Although deep convolutional networks have made significant progress in image semantic segmentation, they typically require a large number of densely labeled images for training. Therefore, we propose a segmentation network called PAN-Net, which combines unsupervised learning with few-shot segmentation to address the problem of insufficient training samples for the encoder to effectively extract bare soil features. The encoder of PAN-Net is trained on a large number of unlabeled images using unsupervised learning, and the trained encoder performs well in extracting bare soil features. Finally, we apply it to downstream few-shot segmentation tasks, improving the capability of bare soil feature extraction in few-shot segmentation. To better evaluate the model performance, we validate it on our self-made dataset of over 1000 samples. Our model achieves an IoU score of 75% and an F1 score of 80% on the self-made test set, surpassing most existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D-HEVC is the latest 3D video coding standard which encodes a small number of texture and depth maps of a few viewpoints by Multi-view plus depth (MVD) format. The depth map is essential to synthesize virtual viewpoints for display. To ensure the quality of depth map coding, depth modeling modes (DMMs) are introduced and added to the candidate list directly for RD Cost calculation of all PUs to protect the edge information of depth map. Thus, increase the computational burden significantly. A depth mode prediction CNN, named DMP-CNN, is proposed in this paper to decide whether DMM modes should be added to the candidate list or not, and skip unnecessary mode decision calculations. Experimental results show that the proposed method achieves 14.64% of coding time saving compared to HTM-16.0 with 1.41% PSNR degradation of synthesis viewpoints at the same bitrate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Which aims to address the shortcomings of current mainstream target detection methods in detecting small helmet-wearing targets, reducing the miss rate and improving detection accuracy. This work introduces a new target detection technique based on improved YOLOv5s.First, to increase the detection accuracy of the small target helmet, a 160×160 small target detection layer is added to the original model. Secondly, AFPN network structure is added to the neck part of YOLOv5s to carry out asymptotic multi-scale feature fusion to reduce information loss or degradation in multi-level transmission. Finally, in order to improve the EIoU loss function, inner ideas are presented., original function replaced by inner-EIoU, so as to improve the learning ability of samples under complex background. Experimental findings on the helmet dataset SHWD show that the improved algorithm model map@0.5 and map@0.5:0.95 reach 95.4% and 62%, 1.7% and 1% over the YOLOv5s model, respectively, and has better detection effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In its judicial interpretation, the Supreme People's Court clarified that algorithmic information can be protected by trade secrets by way of enumeration, which reflects the increased importance of algorithmic trade secret protection. Deep models have high intellectual property value and commercial value, and have even become the core competitiveness of some individuals or small businesses, making them the target of attacks by some malicious competitors and lawless elements. Current digital watermarking technology has been extended to deep neural networks, but these methods are only suitable for protecting the intellectual property of classification models. Therefore, this article proposes research on the intellectual property protection of semantic segmentation models. This paper abandons the traditional method of selecting trigger sets in classification model protection, and proposes an adversarial generation method to independently design trigger sets, embed patterns or symbols with special markings into pictures, and then use a backdoor mechanism to embed trigger set digital watermarks into semantic segmentation In the model, the backdoor disadvantages of the model are converted into backdoor advantages, making the segmentation model more discriminative during the verification process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article proposes an optimization method for bearing fault diagnosis based on Swin Transformer. Extract features from the bearing dataset through continuous wavelet transform; Then, the obtained features are input into the Swin Transformer model optimized by the HHO optimization algorithm for learning and classification. Output classification results, accuracy, and loss values. By comparing models such as CNN, LSTM, RESNET, GRU. The Swin Transformer model has an accuracy index of 21.01%, respectively; 8.19%; 5.63%; 2.71% increase. The Swin Transformer optimized by the HHO algorithm achieved an accuracy of 99.97%. Greatly improves the accuracy, robustness, and generalization of bearing fault diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying cracks is a core and significant activity, utilized in a wide range of industrial applications. Great progress has been made in the diversity detection of crack identification and crack segmentation. Although the deep learning method has achieved good results in crack identification accuracy and fine granularity, it has entered the bottleneck when facing the complex situation of pixel-level crack detection. Faced with the task of pixel level semantic segmentation, the existing methods are difficult to segment fine-grained information and have low prediction accuracy. To address these challenges, we propose an improved model based on U-Net framework, LSGAU-Net (Snake Global Attention U-Net). The model includes an encoder structure sensitive to linear topology and a feature fusion module based on global attention, which can enhance the learning of the sinuous features of cracks and fuse semantic information at different scales. Compared with some existing methods, our model not only guarantees the prediction accuracy, but also greatly improves the continuity and robustness of the prediction results. Experiments on multiple crack datasets show that our model outperforms existing models. Moreover, we verified the effectiveness of the model modification through a series of ablation experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vehicle re-identification is the task of recognizing the same vehicle from non-overlapping camera views, which holds significant importance in areas such as intelligent security and smart transportation. Vehicle images present the challenges of high inter-class variation and low intra-class variation. Existing methods often employ additional cues and auxiliary inputs to address these challenges, but they face issues of high data requirements and computational complexity. To overcome these challenge, this paper introduce a vehicle re-identification method based on CSWin Transformer and linear attention to effectively extract global contextual information suitable for vehicle images. By replacing the original attention mechanism in CSWin Transformer with linear attention, we achieve strong modeling capabilities while limiting computational costs. Specifically, linear attention employs a simple yet effective mapping function and rank restoration module to focus on specific local regions and enhance the interaction of local features. Experimental results demonstrate that this method achieves a better balance between computational efficiency and model performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning's success in medical image analysis often hinges on large annotated datasets, which are costly and time-consuming to create. This paper proposes One-shot Landmark Detection (OLD) to address this challenge in landmark detection using only a single annotated image. OLD employs a two-stage approach: contrastive learning and pseudo-label supervised training. The former leverages multi-scale feature representations to capture consistent anatomical information, generating predictions for the training set. These predictions then serve as pseudo-labels to train a new landmark detector in the latter stage, further refining performance. Evaluated on a public cephalometric landmark detection dataset, OLD achieves a competitive accuracy of 88.65% within 4.0mm, rivaling state-of-the-art supervised methods trained on significantly larger datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The detection of anatomical landmarks in medical images is a widely studied topic, with a plethora of novel deep neural networks emerging in the field of medical imaging. However, due to an excessive focus on training with a single dataset, most deep neural networks fall short in terms of generalization capability. U-Net has become a popular backbone for image segmentation tasks, but it may have limited generalization when dealing with complex medical images. In this work, to overcome some of the shortcomings of U-Net, we propose an improved deep neural network model, called Position Based Attention-Unet++ (PBA-Unet++). We have integrated learnable positional encodings based on CBAM into Unet++ to better capture image features at different scales. Our new model successfully addresses the issue of insufficient generalization capability in existing methods and accurately detects target landmarks in medical images. We conducted extensive experiments and comparisons on a dataset containing head X-ray images of ISBI 2015. The experimental results demonstrate that our model performs excellently.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, the traditional method of eligibility classification of TMR is mainly operated manually, which is time-consuming and labor-intensive. This paper adopts the method of image classification based on deep learning for online detection of TMR classification. This article first constructs the TMR data set, and then uses GoogLeNet, ResNet and SEnet three classic network models to conduct preliminary experiments. According to the experimental results, the optimal model is selected to improve the model in terms of accuracy and speed. The improved model has an accuracy rate of 96.82%, and the speed basically meets the practical needs, providing a theoretical basis for the subsequent research and development of intelligent TMR batching systems and equipment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Marine pasture, a novel technology for ecological balance and sustainable development, heavily relies on cage farming which is typically in deep, complex sea environments, challenging existing fish detection algorithms. Therefore,this paper presents an automatic, cost-effective fish detection system with high precision. By integrating the CBAM attention mechanism, the network's feature perception is enhanced, and the MPDIoU loss function augments small target detection. Trained on fish datasets, the refined YOLOv8s algorithm achieves an mAP of 85.12%, Precision of 80.66%, and Recall of 90.11%, outperforming the standard YOLOv8s with a 0.88% increase in mAP and 1.24% in Recall, deserving being considered for marine fish object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.