The military is looking to adopt artificial intelligence (AI)-based computer vision for autonomous systems and decision-support. This transition requires test methods to ensure safe and effective use of such systems. Performance assessment of deep learning (DL) models, such as object detectors, typically requires extensive datasets. Simulated data offers a cost-effective alternative for generating large image datasets, without the need for access to potentially restricted operational data. However, to effectively use simulated data as a virtual proxy for real-world testing, the suitability and appropriateness of the simulation must be evaluated. This study evaluates the use of simulated data for testing DL-based object detectors, focusing on three key aspects: comparing performance on real versus simulated data, assessing the cost-effectiveness of generating simulated datasets, and evaluating the accuracy of simulations in representing reality. Using two automotive datasets, one publicly available (KITTI) and one internally developed (INDEV), we conducted experiments with both real and simulated versions. We found that although simulations can approximate real-world performance, evaluating whether a simulation accurately represents reality remains challenging. Future research should focus on developing validation approaches independent of real-world datasets to enhance the reliability of simulations in testing AI models.
The use of deep neural networks (DNNs) is the dominant approach for image classification, as it achieves state-of-the-art performance when sufficiently large training datasets are available. The best DNN performance is reached when test data conditions are similar to the training data conditions. However, if the test conditions differ, there will usually be a loss of classification performance, for instance when the test targets are more distant, blurry or occluded than those observed in the training data. It is desirable to have an estimate of the expected classification performance prior to using a DNN in practice. A low expected performance may deem the DNN unsuitable for the operational task at hand. While the effect on classification performance of a single changed test condition has been investigated before, this paper studies the combined effect of multiple changed test conditions. In particular, we will compare two prediction models for the estimation of the expected performance compared to the DNN performance on the development data. Our approach allows performance estimation in operation based on knowledge of the expected operational conditions, but without having access to operational data itself. We investigate the aforementioned steps for image classification on the MARVEL vessel dataset and the Stanford Cars dataset. The changing test conditions consist of several common image degradations that are imposed on the original images. We find that the prediction models produce acceptable results in case of small degradations, and when degradations show a constant accuracy falloff over their range.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.