PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13399, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large pre-trained models like CLIP, Llama, and GPT have been trained on vast datasets and possess tens of millions to billions of parameters. Compared to traditional models, these models delve deeply into the rich knowledge embedded in data, demonstrating exceptional generality and generalization capabilities. For instance, the CLIP model has been trained on 400 million image-text pairs, acquiring substantial multimodal abstract knowledge and robust representational abilities[1]. However, in specific downstream tasks, traditional models may suffer severely from the complexity of the task scenario, limitations in computational resources, and poor data quality, among other issues. To address these challenges, we propose a method named CLIP-ZSWAI. This method does not require additional training of the model but instead achieves zero-shot transfer of the CLIP model on specific tasks through optimizing prompt information[2]. This approach leverages CLIP’s multimodal knowledge to mitigate the performance deficiencies of smaller models due to data scarcity. CLIP-ZSWAI inherits the rich abstract knowledge learned by CLIP during training and adjusts classifier weights by optimizing text prompts, thereby achieving efficient transfer on downstream tasks. We have validated the effectiveness of our method on the Animal90, Animal19, and our own collected AnimalQH datasets, demonstrating its superiority over other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to effectively address the issue of brown bear intrusions in the Qinghai region, we are developing a rapid and accurate response early warning system. In consideration of the cost of promotion, we have adopted a method where the front-end devices collect data and transmit it back to the server for information processing. Since the computational resources of front-end devices are typically limited, This paper decides to optimize and improve the You Only Look Once (YOLO) series of algorithms. Replacing the traditional convolution with AKconv (Alterable Kernel Convolution) as the main structure, Introducing the BiFormer (Bilateral Transformer) attention mechanism in the convolutional part of Detect, to enhance the ability of target feature extraction. And in order to further enhance the effect of feature extraction, concatenating the SAM(Segment Anything Model) before the YOLOV8,to assist YOLOV8 in feature extraction. The improved YOLOV8, compared to the original YOLOV8, precision increased by 1.7.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new method for identifying individuals using 2-dimensional joint coordinate data of periodic motion estimated with OpenPose. The proposed approach includes a normalization technique for depth-invariance and a dissimilarity function for lag-invariance that utilizes complex Fourier coefficients. To demonstrate the effectiveness of the proposed method, experiments were conducted using videos taken simultaneously at different depths.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Finger Vein Recognition system is increasingly used for personal recognition. However, unimodal biometrics systems suffer from several limitations, leading to reduced reliability and effectiveness compared to multimodal biometrics systems. This study implements a multi-instance biometric system by leveraging multiple fingers as the input source to enhance the robustness of the finger vein recognition. The works first employs various preprocessing to improve image quality. These techniques include Watershed Segmentation, Morphological Operations, Histogram Equalization, and resizing. Local Binary Pattern (LBP) is then used to extract the features from each finger vein. This study utilizes two fusion methods, including Feature Fusion and Local Feature Aggregation (LFA) to combine results from multiple finger sources. An experimental setup is implemented to evaluate the performance of both fusion methods. The experimental result indicates that the proposed system using LFA achieves higher performance with the lowest Equal Error Rate (EER) of 0.22% and 0.25% for the UTFVP dataset and SDUMLA-HMT dataset, respectively. This emphasizes the ability of the LFA to enhance the robustness of the finger vein recognition system, contributing to future research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Numerous studies have demonstrated significant correlations between segmented pathological objects in various medical imaging modalities and disease-related pathology. While previous investigations have employed handcrafted features for disease prediction, those approaches neglect the vast potential of leveraging latent features from deep learning (DL) models, which could potentially enhance the overall accuracy of differential diagnosis. Recently, the Segment Anything Model (SAM) has demonstrated remarkable zero-shot segmentation capabilities for natural images and garnered significant attention for its potential applications in medical image segmentation. However, to the best of our knowledge, no studies have explored leveraging the latent features extracted through SAM’s encoder for medical image classification. In this paper, we propose the novel SAMLF Diag method, which harnesses the latent features generated by MedSAM for benign and malignant breast cancer classification. Our proposed model leverages the encoded features from MedSAM’s Vision Transformer (ViT) to maximize the attribute-related information contained within the image features. By exploiting the powerful segmentation capabilities of SAM, our approach aims to extract and utilize the most informative and discriminative features for breast cancer classification. Experiments on a public ultrasound breast cancer dataset were conducted to validate the effectiveness of SAMLF Diag, demonstrating its ability to outperform baseline deep learning models for breast cancer classification. Our work highlights the potential of leveraging state-of-the-art foundation segmentation models for enhancing disease diagnosis through latent feature extraction and zero-shot learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional Neural Networks (CNNs), like U-Net have shown success in image segmentation tasks. However they require a lot of power and storage space. The standard convolutional blocks often have feature maps leading to increased costs and potentially impacting model efficiency and performance. To tackle these challenges this study suggests incorporating the SCConv module into the U architecture. Two fusion approaches are introduced; SCU-Net, a version and SCU-Net+ a precise alternative suitable for different scenarios. The SCConv module, which includes a Spatial Reconstruction Unit (SRU) and a Channel Reconstruction Unit (CRU) aims to decrease computations and enhance feature representation. SCU-Net effectively reduces model size while maintaining or enhancing segmentation accuracy—an aspect when deploying learning models in resource limited settings. On the hand SCU-Net+ minimizes redundant feature extraction to improve segmentation accuracy without increasing the models size. Experimental results demonstrate that both models perform across datasets. This innovative approach provides insights into model compression and optimization potentially leading to efficient network structures with lower resource demands in future studies. The code implementation for these models is publicly accessible on GitHub, at https://github.com/zyxhhnkh/SCU-Net.git.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to perform binocular ranging calculation on the linear array camera in the tunnel ranging system, a binocular camera ranging system device was first built, and then a calibration ruler black and white stripe was designed. The binocular linear array camera captured the calibration ruler black and white stripe image, and edge extraction was performed on the black and white stripe image based on the Canny operator edge detection algorithm. Subpixel edge pixel values were calculated, Finally, the BP neural network is optimized based on genetic algorithm for distance prediction of binocular linear array camera imaging. Comparing the GA-BP neural network with the traditional BP neural network algorithm, the average accuracy of the two methods for camera ranging calculation is 0.359 and 0.563 mm, respectively. The results indicate that the GA-BP neural network can improve the ranging accuracy of binocular linear array cameras.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper discusses the innovative application of artificial intelligence in computer vision, focusing on the application of deep learning in the identification of insecure behavior. In recent years, with the development of deep learning, behavior recognition methods have made great progress in the field of insecure behavior recognition. Insecure behavior recognition based on deep learning is a complex and important research field. It relies on deep learning techniques, specifically multi-layer neural networks and related algorithms, to identify and predict unsafe behavior by people. The author systematically summarizes the relevant research work at home and abroad in recent years, and elaborates the common models and research methods of behavior recognition methods in detail. Combined with the existing relevant research, the advantages, limitations and application scenarios of various methods are comprehensively analyzed and compared, and the difficulties in the application of deep learning in behavior recognition in recent years are summarized. On this basis, the future research direction of behavior recognition is prospected, which provides reference for the research in this field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reconstructing soft tissues in endoscopic videos requires the support of advanced 3D reconstruction technology, which plays a key role in robot-assisted minimally invasive surgery. In this paper, we focus on how to reconstruct dynamic soft tissues as realistically as possible with surgical instruments removed. Due to the limited viewing angle and complex soft tissue deformation, current methods suffer from reconstruction blur and long training time. Inspired by the powerful rendering ability of Gaussian Splatting, we propose a dynamic endoscopic scene reconstruction method which yields better rendering quality and real-time rendering speed. We first utilize depth information to project image to point cloud and give 3D Gaussian points initial positions and colors. The properties of Gaussian points at a certain time stamp are predicted by a MLP deformation module. Our network then renders 3D Gaussian points to get 2D images using a tile-based rasterizer. We design reconstruction losses and color variation loss to optimize the rendering results. Extensive experiments show that our network generates high-fidelity reconstructed endoscopic videos with real-time rendering speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Digital twin is the only way to digitize power grid. This paper analyzes the development and application of digital twin in power industry. Based on the problems of traditional power distribution room such as scattering, operation and management inconvenience, a threedimensional modeling method of power distribution room based on CAD drawings of power distribution design is proposed. The total building load is calculated based on the CAD Drawings. Then the various types of distribution cabinets are reasonably allocated by automatic planning algorithm. The 3D model of the distribution room is constructed by Blender 3D modeling to quickly and accurately restore the scene, equipment and its internal structure in the real distribution room and realize the visualization of the distribution room.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.