PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 13138, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We utilize Stimulated Raman spectroscopy with deuterium oxide (DO-SRS) to acquire hyperspectral images of mouse hippocampal tissue. With DO-SRS, we can identify carbon-deuterium (C-D) bonds that are indicative of biomolecule synthesis, providing metabolic information of the tissue. Through k-means clustering analysis of the Raman C-H stretching and C-D bands, we can distinguish the mouse hippocampal regions based on the vibrational mode of deuterated proteins and lipids.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Explainable Machine Learning (XML) approaches are crucial for medical information processing tasks, particularly for multi-omics data analytics. The XML system not only provides better performance but also explains the inside of the finding better. Here, we proposed an end-to-end explainable system for analyzing high dimensional RNA-seq data using an unsupervised gene selection approach and supervised methods, including Deep Neural Network (DNN), Deep Convolutional Neural Network (DCNN), Support Vector Machine (SVM), and Random Forest (RF). The proposed approaches evaluate with publicly available datasets for five different cancers and Kawasaki disease (KD) classification. The deep learning-based approaches yield the 99.62% and 99. 25% average testing accuracy for cancer and KD classification tasks. Additionally, we introduce an explainable system that demonstrates the ability to select cancer and disease-specific gene sets, which could be used for further analysis to discover the biological inside of the cancers and KD diseases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lung segmentation supports essential functionality within the realm of computer-aided detection and diagnosis using chest radiographs. In this research, we present a novel bifurcation approach of segmenting the lungs by separating them down the spinal column and having separate networks for the right and the left lungs respectively. Results from the right lung and left lung networks are then merged to form the overall lung. We utilize DeepLabV3+ network with ResNet50 as the backbone for both left and right lung networks. Results are presented for publicly available datasets such as Shenzhen Dataset and Japanese Society of Radiological Technology (JSRT) dataset. Our proposed bifurcation approach achieved an overall accuracy of 98.8% and an IoU (Intersection over Union) of 0.977 for a set of 100 cases in Shenzhen dataset. We conducted an additional robustness study of this method by training and testing on an independent dataset utilizing a hold-out methodology. We utilize a private dataset for the training and testing occurs on an independent JSRT dataset comprising 140 cases and our algorithm achieved an overall IoU of 0.945 thereby demonstrating its efficacy against other whole lung models and setting a new benchmark for future research works.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Keratoconus is a chronic-degenerative disease which results in progressive corneal thinning and steeping leading to irregular astigmatism and decreased visual acuity that in severe cases may cause debilitating visual impairment. In recent years, Machine Learning methods, especially Convolutional Neural Networks (CNN), have been applied to classify images according to either presence or absence of the disease, based on different corneal maps. This study aims to develop a novel CNN architecture to classify axial curvature maps of the anterior corneal surface in five different grades of disease (i: normal eye; ii: suspect eye; iii: subclinical keratoconus; iv: keratoconus; and v: severe keratoconus). The dataset comprises 3, 832 axial curvature maps represented on relative scale and labeled by ophthalmologists. The images were splitted into three distinct subsets: training (2, 297 images ≈ 60%), validation (771 images ≈ 20%), and test (764 images ≈ 20%) sets. The model achieved an overall accuracy of 78.53%, a macro-average sensitivity of 74.53% (87.50% for normal eyes, 46.56% for suspect eyes, 65.41% for subclinical keratoconus, 93.42% for keratoconus, and 79.25% for severe keratoconus) and a macro-average specificity of 94.42% (92.14% for normal eyes, 95.30% for suspect eyes, 93.82% for subclinical keratoconus, 91.24% for keratoconus, and 99.58% for severe keratoconus). Additionally, the model achieved AUC scores of 0.97, 0.92, 0.90, 0.98, and 0.94 for normal eye, suspect eye, subclinical keratoconus, keratoconus, and severe keratoconus, respectively. The results suggest that the CNN exhibited notable proficiency in distinguishing between normal eyes and various stages of keratoconus, offering potential for enhanced diagnostic accuracy in ocular health assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Worldwide, a considerable number of female cancer cases are attributed to breast cancer, making it a prevalent and serious problem. As diagnoses surge, the traditional approach of manual histological assessment is becoming increasingly inefficient. So, to expedite diagnosis and eliminate the need for specialised expertise, researchers are turning to automated alternatives. Polarization-Sensitive Optical coherence tomography (PS-OCT) emerges as a promising tool, offering a rapid alternative to traditional histology. It stands out by exploiting the polarization of reflected light to boost image contrast. By evaluating polarized backscattered light, the PS-OCT system is able to detect birefringence in cancerous tissue, indicative of collagen changes associated with cancer. The main focus of this study is the development of an automated Full-field PS-OCT (FF-PS-OCT) system for the diagnosis of breast cancer. The system recorded 220 sample images in order to extract phase information. The birefringence and degree of polarization uniformity information is calculated from the recorded phase images. Different features have been extracted from birefringence and degree of polarization uniformity images to train an ensemble model that has been validated by the technique for order preference by similarity to ideal solution (TOPSIS) to distinguish between normal and malignant breast tissue. The multi-layer ensemble model demonstrates enhanced performance in terms of recall and precision, achieving remarkable metrics on the testing dataset: 92.3% precision, 90% recall, 91.1% F-score, and 79.7% Matthews correlation coefficient. These preliminary results underscore the potential of FF-PS-OCT as rapid, non-contact, and label-free imaging tool. Its implementation shows potential in empowering medical professionals with the insights needed for making informed decisions during interventions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Arctic is currently experiencing unprecedented changes, including rapid reductions in sea ice that must be characterized quickly to ensure safe Arctic navigability. While the need for ice classification is well-suited for satellite synthetic aperture radar (SAR) and machine learning (ML) solutions, there is a lack of labeled datasets at the spatial and temporal resolutions needed for pairing with fine-resolution SAR imagery. Previously, we developed an approach to obtain fine resolution ice labels by exploiting polarimetric relationships in single look complex-format (SLC) Sentinel-1 (S1) SAR imagery. The probabilistic nature of this novel approach allows for uncertainty measurements (soft labels) in addition to binary water versus ice labels (hard labels). To determine the effectiveness of these labels, we use them to train ML models with S1 GRD intensity products as input. We consider Na¨ıve Bayes, Random Forest, and XGBoost classifiers, examining the trade-off in model complexity versus recall of ice when trained on hard labels versus soft labels. In addition to assessing if training on soft labels prevents overfitting, we also test the impact of probability calibration on output label probabilities. We use S1 acquisitions that overlap with the AI4Arctic data set for training and testing. We find that training on soft probabilities is beneficial as model complexity increases, emphasizing the value added by our probabilistic approach to sea ice classification. We additionally find that in the absence of soft training labels, probability calibration is important for obtaining representative label probabilities. Moving forward, we will extend this assessment to deep learning models where such effects may be even more substantial.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic identification of wildfires is an area attracting great interest in the past decade. Early detection of fire can help in minimizing disasters and assist decision makers to plan mitigation methods. In this paper, we annotate and utilize a drone imagery dataset with each of its pixels marked as: (a) Burning, (b) Burned, and (c) Unburnt. The dataset is comprised of 22 videos (138,390 frames) among which only a subset of 481 frames (~20 frames from each video) are marked for segmentation. In addition, the entire suite of frames is categorized as either “Smoke” or “No-Smoke”. We implement DeepLab-v3+ architecture to accurately segment affected regions as “Burned”, “Burning”, and “Unburnt”. We adopt a transfer learning-based architecture using an established Xception network to detect smoke within each frame to identify regions that can affect the performance of the proposed segmentation approach. Our segmentation algorithm achieves a mean accuracy of 97% and mean Jaccard Index of 0.93 on three test videos comprising 24,666 frames across all categories. Our classification algorithm achieves 92% for identifying smoke in each of those test frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This conference presentation was prepared for Optical Engineering + Applications, 2024.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The overfitting of deep learning (DL) models is one of the problems in artificial intelligence (AI) systems. It occurs when a DL model learns not only the patterns inherent to the data but also learns the noise characteristics and random fluctuations in the data. This learning behavior and the uncertainty caused by the noisy observations can negatively affect the decisions and the explainers of an AI system. The decision-makers are hidden in the semantic meanings of the feature maps. Hence, locating the feature maps that support/oppose AI’s decisions under uncertainty, caused by the noise characteristics and random fluctuations in the data, is challenging. However, the Bayesian search theory (BST) that has been widely used in target-tracking under noisy conditions could provide us with a solution to this research problem. This paper studies and proposes a BSTbased approach that assumes prior knowledge of the feature maps of the noise-free observations and generates posterior probabilities to find correlated feature maps of the noisy observations. The posterior probabilities built on a Gaussian likelihood is used for extracting semantic meanings that support/oppose the decisions hidden in feature maps. Hence, it can provide post hoc explainers while an AI system makes its predictions by using the feature maps of the final convolutional layer. In our simulation, we have used a pretrained VGG16 model that consists of 512 channels (or feature maps), where each channel provides 196 (i.e., 14x14) semantic meanings, in its final convolutional layer to make predictions. We have also used two bird images (Blue Jay and Indigo Buntings), and added varying uncertainty using Gaussian noise with distinct noise factors (0, 4, .., 20). Simulations show that we can precisely locate the feature maps and their semantic meanings that support/oppose the decision (i.e., the prediction) of an AI system under uncertainty.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Robustness to image quality degradations is critical for developing Convolutional Neural Networks (CNNs) for real-world image classification. This paper advances previous analysis of how optical aberrations and optical scatter degrade classification performance by exploring how they cause classification errors to manifest within CNN layers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The growth of artificial intelligence has led to the widespread use of convolutional neural networks (CNNs) for computer vision applications, traditionally for binary and categorical classification tasks. However, there remains untapped potential for advancing computer vision through deep learning in regression tasks. Design engineers across many disciplines use computer-aided design software to model their designs. These computer-integrated designs often require machinery for construction or fabrication. For many engineering designs, precision and tolerancing is essential for the proper function and performance of the design. The engineering process typically involves manual testing and parameter measurements to ensure the proper function of the design before it is marketed. However, training a neural network to automate these tests and provide accurate numeric estimates of system parameters without manual intervention can significantly increase efficiency and decrease the time to market for many products. This shift from manual to automated testing allows for a heightened focus on innovation and project development while minimizing the time and resource dedication for validation. This article outlines the implementation of CNN models designed to enhance the efficiency of manually validating engineered projects. Our approach involves utilizing computer-aided design simulation image captures as training data for our pipeline. We integrate a real-time color-filtering and fiducial rotation scaling normalization process on any fabricated design image. Through these pre-processing methods, our algorithm can perceive these images in a consistent manner with simulation images from the model training.
Our current model is trained with only 1020 simulation images and achieves a 1.99% average training prediction error on this dataset after training. Before, our errors were a 10.51% average error in our initial model implementation and 3.63% in our second implementation. On our test set, consisting of six captured and preprocessed sample fabricated design images, our model achieves a 3.40% average prediction error. The performance of a regression label neural network of this nature depends largely on the amount and range of data and simulation scenarios considered. As we continue to expand our training dataset through an optimized pipeline, we anticipate a significant improvement in model performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This numerical study uses machine learning techniques to enhance the resolution of local near-field probing measurements when the probe is larger than the examined device. The research shows that machine learning can achieve a spatial resolution of λ/10 with a few wavelength-wide probes while keeping the relative error below 3%. It also finds that fully connected neural networks outperform linear regression with limited training data, but linear regression is both sufficient and efficient for larger data sets. These results suggest that similar machine learning methods can improve the resolution of various experimental measurements.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We characterized manufacturing-induced defects in 316L stainless steels - fabricated by direct metal laser sintering (DMLS) - and investigated their roles in the fatigue behavior of steel parts. The primary defects targeted are porosities, inner cracks, and edge cracks. We used Convolutional Neural Networks (CNNs) to detect and classify these defects and moved toward a machine vision-based metrology technique for metal additive manufacturing (AM). The Moore cyclic loading method was applied to characterize the fatigue behavior of 316L samples. The results indicate a strong correlation between the quality of additive manufacturing, defect levels, and the fatigue properties of the steel samples. Specifically, samples with lower defect levels exhibited significantly higher load endurance and longer life cycles. To further explore the influence of defects on mechanical behavior, we applied image processing techniques to measure the density, size, morphology, and location of defects in the steels. The quantification of AM defects features paves the way for a deeper understanding of microstructure – macro-behavior relations and enhanced fatigue prediction models in additively manufactured steels.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Industry, New Methods, and Science Applications II
I will discuss the integration of programmable diffraction with digital neural networks. Diffractive optical networks are designed by deep learning to all-optically implement various complex functions as the input light diffracts through spatially engineered surfaces. These diffractive processors integrated with digital neural networks have various applications, e.g., image analysis, feature detection, object classification, computational imaging and seeing through diffusers, also enabling task-specific camera designs and new optical components for spatial, spectral and temporal beam shaping and spatially-controlled wavelength division multiplexing. These deep learning-designed diffractive systems can broadly impact (1) optical statistical inference engines, (2) computational camera and microscope designs and (3) inverse design of optical systems that are task-specific. In this talk, I will give examples of each group, enabling transformative capabilities for various applications of interest in e.g., autonomous systems, defense/security, telecommunications as well as biomedical imaging and sensing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detection of defects and damages due to aging and transient events are important contributors to pipeline accidents and monitoring them together is challenging. In this work, we demonstrate an intelligent fiber-optic acoustic sensor system for pipeline monitoring that enables real-time recognition, and classification of defects and transient threats together by analyzing the combined acoustic NDE data from the ultrasonic guidedwaves and acoustic emission methods. A 6"carbon-steel pipeline (16-ft long, SCH40) having multiple structural defects (weld and corrosion) is used with multiplexed optical fiber sensors as acoustic receivers attached to the pipe for ultrasonic GW monitoring to identifying structural defects and transient event (intrusion and impact) detection by the spontaneous acoustic emission method. Finally, we discussed our strategy to apply the convolutional neural network (CNN) model to the acoustic NDE data obtained by two methods to realize an accurate and automated pipeline health monitoring solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the domain of printed circuit board (PCB) X-ray inspection, the effectiveness of deep learning models greatly depends on the availability and quality of annotated data. The utilization of data augmentation techniques, particularly through the utilization of synthetic data, has emerged as a promising strategy to improve model performance and alleviate the burden of manual annotation. However, a significant question remains unanswered: What is the optimal amount of synthetic data required to effectively augment the dataset and enhance model performance? This study introduces the Synthetic Data Tuner, a comprehensive framework developed to address this crucial question and optimize the performance of deep learning models for PCB X-ray inspection tasks. By employing a combination of cutting-edge deep learning architectures and advanced data augmentation techniques, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), the Synthetic Data Tuner systematically assesses the impact of different levels of synthetic data integration on model accuracy, robustness, and generalization. Through extensive experimentation and rigorous evaluation procedures, our results illustrate the intricate relationship between the quantity of synthetic data and model performance. We elucidate the phenomenon of diminishing returns, where model performance reaches a saturation point beyond a specific threshold of synthetic data augmentation. Moreover, we determine the optimal balance between synthetic and real data, achieving a harmonious equilibrium that maximizes performance improvements while mitigating the risk of overfitting. Additionally, our findings emphasize the significance of data diversity and quality in the generation of synthetic data, highlighting the importance of domain-specific knowledge and context-aware augmentation techniques. By providing insights into the complex interplay between synthetic data augmentation and deep learning model performance, the Synthetic Data Tuner not only advances the current state-of-the-art in PCB X-ray inspection but also offers valuable insights and methodologies applicable to various computer vision and industrial inspection domains.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to investigate the potential of enhancing the resilience of computer vision systems in the context of intelligent Printed Circuit Board (PCB) inspection through the integration of techniques that filter out adversarial examples. PCBs, which are crucial components of electronic devices, require reliable inspection methods. However, current computer vision models are vulnerable to adversarial attacks that can compromise their accuracy. Our research introduces an evolving approach that combines advanced deep learning architectures with adversarial training methods. The initial steps involve training a robust PCB inspection model using a diverse dataset and generating adversarial examples through carefully designed perturbations. Subsequently, the model is exposed to these adversarial examples during a dedicated training phase, enabling it to adapt to variations introduced by potential adversaries. To counter the impact of adversarial examples on classification decisions during real-time inspections, a filtration mechanism is implemented to identify and discard them. Preliminary experimentation and ongoing evaluations demonstrate promising progress in enhancing the resilience of PCB inspection models against adversarial attacks. Although the filtration mechanism is still in its early stages, it shows potential in identifying and neutralizing potential threats, contributing to efforts aimed at strengthening the reliability and trustworthiness of inspection outcomes. Moreover, the adaptability of the proposed methodology to various PCB designs, including different components, orientations, and lighting conditions, indicates the potential for transformative advancements in computer vision systems in critical domains. This research underscores the need for continued investigation into the evolving landscape of adversarial example filtration, presenting a potential avenue for fortifying intelligent inspection systems against adversarial threats in PCB inspection and beyond.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Industry, New Methods, and Science Applications III
Segmentation of printed circuit board (PCB) components from X-ray images holds paramount significance as it constitutes a crucial step in design extraction and reverse engineering processes. Conventional pretrained deep learning segmentation models demand considerable resources and produce less-than-optimal outcomes and often results in overfitting due to the scarcity of the labeled PCB X-ray data. The Segment Anything Model (SAM), known for its versatility in semantic segmentation tasks, showcases its capability to effectively segment a wide array of objects found in natural images. Nonetheless, it encounters challenges when it comes to the complex design of PCB X-ray images, causing difficulty in accurately segmenting the components present in the circuit boards design. Adapting this foundation model to the unique challenges posed by PCB X-ray images, such as intricate component structures and variations in X-ray artifacts, requires careful modification and optimization. In this study, we propose a customized approach for segmenting components from X-ray images of PCBs that use a modified SAM model with parameter-efficient fine-tuning and few-shot generalization strategies. We introduce modifications to enhance the model’s ability to capture intricate spatial relationships and effectively segment individual components. Our methodology focuses on the efficient adaptation of the foundation model to the unique characteristics of PCB X-ray images, including complex component structures and varying noise conditions. Leveraging few-shot learning techniques, we address the challenge of limited annotated data in the PCB X-ray domain, towards the aim of enabling the model to generalize effectively with minimal fine-tuning. Our work has the potential to pave the way for a novel solution to the challenge of implementing deep learning in a limited dataset by leveraging the capabilities of a foundation model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying long term user interests, i.e., “evergreen missions”, in retail and e-commerce is a challenging yet important problem. In this work, we propose a machine learning system that is able to identify a user’s long term arbitrary interests by leveraging their site interaction history. Our contribution is a system that is composed of three components (1) projecting our listing inventory to an embedding space with a combination of supervised/unsupervised modeling, (2) inferring personalized interests from the embedding space to a user base with attributed interactions, and (3) estimating the repeat interaction rate with inventory through a rigorous statistical approach. Additionally, we provide novel insights by leveraging the supervised neural network model to produce a clustering approach for interest discovery. The approach has been implemented, validated, and rigorously A/B experimented with and is currently in production at Etsy, Inc., powering its several modules.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In model compression, filter pruning stands out as a pivotal technique. Its significance becomes particularly crucial as the present deep learning models are developed into larger and more complicated architectures, which leads to massive parameters and high floating-point operations per second (FLOPs). Challenges have appeared due to the high computational demands associated with these advanced model structures. In this work, we introduce two novel methods aimed at addressing the challenges above: innovative automatic filter pruning methods via semi-supervised multi-task learning (SSMTL) hypernetwork and partial weight training hypernetwork, respectively. Both methods effectively train the hypernetwork and enhance the precision of the neural architecture search with reinforcement learning. Compared to other filter pruning methods, our approach achieves higher model accuracy at similar pruning ratios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Industry, New Methods, and Science Applications IV
Generative Artificial Intelligence (Generative AI) models, such as the Generative Adversarial Networks (GANs), use noise vectors to access subspaces of a latent space for generating diverse and quality output. In the current GAN models, noise vectors are generally drawn from Gaussian distributions to achieve meaningful diversity and quality images. Since the focus of Generative AI is mainly on the generation of diversity and quality, it is important to understand the contributions of the noise vectors that are drawn from different statistical distributions to access the quality latent subspaces. This is currently a gap in the Generative AI research. This paper presents a learning-based modeling and simulation framework that can help improve our understanding of the effect of different statistical distributions on the randomness of the noise vectors for enhancing the diversity and quality of Generative AI models. The proposed modeling and simulation framework currently adapts eight statistical distributions, including Gaussian, Cauchy, Beta, Laplace, and Gamma distributions to draw different types of noise vectors and utilize them to generate bird images for studying their contributions to diversity and quality. A pretrained BigGAN is also used as Generative AI model. A diversity-quality-index (DQI) that utilizes k-means clustering and mean squared error (MSE) is used to measure the diversity and quality of 200 BigGAN-generated images. Simulations show that BigGAN can achieve significant diversity with the help of multiple statistical distributions. Simulations also show that the shape of a distribution, considering Gaussian as a reference distribution, influences randomness of the noise vectors associated with the latent space of images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When laser light beams are allowed to propagate from one end of a muti-mode optical fiber to the other end and further to be output onto a screen, irregular patterns called as speckle patterns can be observed in an output light spot. The authors previously reported the rotating phenomenon of such speckle patterns when the optical fiber was placed onto a support plate in a loop-shape and the support plate was tilted. In this paper, the method for estimating tilted angles of the support plate was tried through classification of speckle pattern images by ResNet-18 trained using transfer learning. As a result, the model with a classification accuracy of approximately 99.9% in the measurement range of -10 to +10 degrees of tilted angles was realized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Situational awareness is vital for safe autonomous driving. With the recent developments in deep neural networks, detection of vehicles, pedestrians, and traffic signs become popular topics with high performance, but the detection of an unusual object that has not been encountered before in the scene also called a corner case, is not studied well. Although there are some studies on corner case detection in the visible domain, detectors developed for the visible domain are susceptible to light and weather conditions. Therefore, models might hardly detect corner cases that occur in poor lighting conditions, which can also happen in the real world, and put lives at risk by failing early detection. On the other hand, infrared cameras provide high performance in poor light and foggy weather conditions. However, corner cases in infrared images are not included in the datasets and this issue has not been studied before. Therefore, in this paper, we introduce a synthetically generated high-quality infrared dataset with stable diffusion for corner case detection in infrared images. This dataset addresses situations that may cause a hazard to autonomous vehicles in poor visibility by generating these situations in the infrared domain. As another contribution of this study, we present a detection model trained with the corner cases in infrared images and establish a baseline performance for the model. We believe this work will create a foundation for studies on corner cases in infrared images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ensuring public safety is a critical concern in our modern society, and as technology advances, so do the methods for enhancing security. While some people develop sophisticated security systems, others seek ways to bypass them. This dynamic necessitates the continuous development of innovative technologies for detecting concealed weapons. In this paper, we compare state-of-the-art methods for automatic weapon detection using computer vision techniques, specifically focusing on hand pose classification. We propose a novel approach that combines hand pose analysis to enhance the accuracy and reliability of weapon detection through camera systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Globally, brain tumors are a pressing health concern due to their substantial contribution to cancer-related deaths. With brain cancer survival rates lingering at a low 35.7%, there is a critical demand for advancements in diagnostic methods and treatment strategies. Magnetic resonance imaging (MRI) plays a pivotal role in the detection and analysis of brain tumors, yet traditional manual interpretation of MRI scans is challenged by the complex morphological changes tumors introduce to brain structures. This study evaluates the efficacy of automated detection methods using advanced convolutional neural network (CNN) architectures. We specifically compare the performance of two CNN models, ResNet50 and MobileNetV2, on a dataset comprising 20,670 MRI images across four categories, including healthy brain scans. Our findings reveal that ResNet50 significantly surpasses MobileNetV2, achieving a validation accuracy of 99.31% and a test accuracy of 92.06%, compared to MobileNetV2’s validation accuracy of 80.7% and test accuracy of 82.0%. These results underscore ResNet50’s superior diagnostic capabilities, suggesting that the efficiency trade-offs associated with less complex models like MobileNetV2 are not justified in this scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The UN reports that almost 700 million people can't afford food, while 1.3 billion tons of food is wasted yearly. The wasted food can feed four times the hungry population, as per the Food and Agriculture Organization. The waste happens throughout the process and many factors contribute to it. The solution presented in this study utilizes a multimodal deep learning network to predict the decay of agricultural produce several days in advance and enables timely interventions to reduce spoilage and enhance food accessibility and affordability.
A new set of data was collected to analyze the decay pattern of the selected agricultural produce. Raw tomatoes and strawberries were used for the study and were placed at multiple locations to capture their decay patterns. Temperature, humidity, and images were collected at regular intervals until the produce was considered decayed. This dataset trained detection AI models to isolate produce in large batches, and multimodal regression networks to forecast decay. A model that provided the least error in decay prediction was selected by experimentation of different models. The selected models were then deployed to a Raspberry Pi system with a camera, temperature sensors, and humidity sensors to prototype how well the trained model would perform in field deployment scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Glaucoma (GC), diabetic retinopathy (DR), and cataracts are the leading causes of vision loss globally. However, the diseases can be prevented from further progression when detected in the early stages. Conventional manual diagnosis is prolonged and requires the experience of a trained ophthalmologist. Therefore, automated techniques using artificial intelligence algorithms are evaluated for their ability to identify these eye diseases. In this study, transfer learning and featurization techniques are employed using pre-trained deep-learning (DL) neural networks (ResNet50, MobileNetV2, and VGG16) and statistical machine learning (ML) algorithms (MLP Classifier, KNN, and Random Forest Classifier). These architectures were trained, validated, and tested using a public dataset that included retinal images for diseased (GC, DR, and cataracts) and normal eyes. The ResNet50 neural network architecture had the highest testing accuracy of 91.51% among the deep learning methods. Due to its high performance, features were extracted from this model (featurization) and used for the statistical ML classifiers, creating hybrid models. The MLP Classifier hybrid model had the highest accuracy value at 92.04%. The knowledge from this study has the potential to aid, hasten, and improve accuracy in the process of eye disease diagnosis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Car crashes cause approximately 1.19 million fatalities a year worldwide. Hazardous road infrastructure and damaged roads are a leading cause for a large part of these deaths. Traditional road inspection approaches typically involve scheduled maintenance and repair activities over predetermined periods. This paper addresses the problem of identifying road damage by using automated techniques. We utilized footage taken by ourselves and a public dataset with 6359 images to automate defect detection using a YOLOv8 and YOLOv9 object detection model focusing on road damage. This model, validated with a mAP50 (mean average precision) value, shows its effectiveness and ability to run in real-world scenarios. By incorporating new potholes and cracks categories, we enhanced the model’s ability to find specific road defects. This study not only introduces an original and refined dataset but also demonstrates the potential of object detection based on YOLOv8 and YOLOv9 in streamlining road damage detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Direct imaging of exoplanets is a challenging task that involves distinguishing faint planetary signals from the overpowering glare of their host stars, often obscured by time-varying stellar noise known as ”speckles”. The predominant algorithms for speckle noise subtraction employ principal-based point spread function (PSF) fitting techniques to discern planetary signals from stellar speckle noise. We introduce torchKLIP, a benchmark package developed within the machine learning (ML) framework PyTorch. This work enables ML techniques to utilize extensive PSF libraries to enhance direct imaging post-processing. Such advancements promise to improve the post-processing of high-contrast images from leading-edge astronomical instruments like the James Webb Space Telescope and extreme adaptive optics systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A variety of complex channel effects can occur during radio frequency (RF) transmissions. Generative machine learning techniques have the potential to model these complex processes directly from labelled data. To this end a vector-quantized variational autoencoder (VQ-VAE) was trained on synthetic radio frequency data, produced by RadioML, with eight digital modulation schemes at ten signal-to-noise ratios (SNRs). A PixelSNAIL model was used to learn the latent space of the trained VQ-VAE. After training, the PixelSNAIL network was paired with the VQ-VAE decoder and used to generate new RF signals from a class label. The conditional generative model produced outputs that qualitatively match the training data classes. This is a first step toward training a model that can qualitatively and quantitatively reproduce the transformation between the transmitted RF data and the received RF signal. Ultimately this approach can be applied to a dataset recorded in real-world conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In fallowed fields, the presence of broadleaf and grassy weeds poses a significant threat to crop yield and quality if left uncontrolled. Broadleaf weeds, characterized by their wide leaves, and grassy weeds, with their narrow blades, compete vigorously with crops for essential resources such as sunlight, water, and nutrients. Identifying and managing these weed species effectively is paramount for agricultural success. Traditional weed control methods often rely on the use of broad-spectrum herbicides applied across entire fields, regardless of the specific weed composition. This method not only contributes to environmental damage but also incurs unnecessary costs for farmers. In recent years, the Vision Transformers (ViT) have revolutionized the field of Computer Vision, offering unprecedented capabilities in image understanding and analysis. This technique can be applied as a powerful tool to automatically detect and classify both broadleaf and grassy weeds in pre-planting herbicide spraying (known as green-on-brown application). This study aims to develop a system to detect and classify broadleaf and grassy weeds in fallowed fields using a Transformer-based algorithm, YOLOS (You Only Look One Sequence). The dataset comprises 15, 542 images collected from a real fallowed field. Images were splitted into three distinct subsets: training (10, 879 images ≈ 70%), validation (2, 798 images ≈ 18%), and test (1, 865 images ≈ 12%) sets. The model achieved an overall precision of 90.7% (88.3% for broadleaf weeds and 93.0% for grassy weeds) and an average recall of 86.3% (85.3% for broadleaf weeds and 87.2% for grassy weeds). The results suggest that the YOLOS presents a compelling alternative for distinguishing between broadleaf and grassy weeds in fallowed fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral sensing is a valuable tool for detecting anomalies and distinguishing between materials in a scene. Hyperspectral anomaly detection (HS-AD) helps characterize the captured scenes and separates them into anomaly and background classes. It is vital in agriculture, environment, and military applications such as RSTA (reconnaissance, surveillance, and target acquisition) missions. We previously designed an equal voting ensemble of hyperspectral unmixing and three unsupervised HS-AD algorithms. We later utilized a supervised classifier to determine the weights of a voting ensemble, creating a hybrid of heterogeneous unsupervised HS-AD algorithms with a supervised classifier in a model stacking, which improved detection accuracy. However, supervised classification methods usually fail to detect novel or unknown patterns that substantially deviate from those seen previously. In this work, we evaluate our technique and other supervised and unsupervised methods using general hyperspectral data to provide new insights.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.