Presentation + Paper
7 June 2024 Generative EO/IR multi-scale vision transformer for improved object detection
Jonathan Christian, Max Bright, Jason Summers, Ashley Olson, Tim Havens
Author Affiliations +
Abstract
For certain objects, panchromatic or 3-band (RGB) imagery may be insufficient to achieve accurate object identification, thus, additional bandwidths within the infrared (IR) spectrum may be needed to exploit unique spectral characteristics for improving object detection. Many of the existing generative modeling techniques are applied solely to the visible wavelengths. A need exists to fully explore the application of generative modeling techniques to multispectral imagery (MSI) and specifically the IR bands. Generative models used for data augmentation for object detection must have sufficient fidelity to avoid generating data that are out of distribution with respect to actual measured data, or that contain systemic bias or artifacts. This work demonstrates the utility of a conditionally generative, multi-scale vision transformer that learns the spatial and spectral structures and the interactions between them in order to accurately synthesize near-infrared (NIR) and short-wave infrared (SWIR) data from RGB. This synthesis is performed over a diverse set of target objects observed over multiple seasons, at multiple look angles, over varying terrains, with images sampled globally from multiple satellites. For both training and inference, the model is provided no contextual information or metadata as input. Compared to using RGB alone, the average precision (AP) of an off-the-shelf object detection model (YOLOv5) trained with the additional synthesized IR data improves by up to 48% on a target class that is difficult for an analyst to identify. In conjunction with RGB data, using synthetic instead of true IR data for object detection provides higher AP values over all target classes.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jonathan Christian, Max Bright, Jason Summers, Ashley Olson, and Tim Havens "Generative EO/IR multi-scale vision transformer for improved object detection", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350N (7 June 2024); https://doi.org/10.1117/12.3023596
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
RGB color model

Data modeling

Education and training

Infrared imaging

Transformers

Machine learning

Object detection

Back to Top