Many studies show that using synthetic data or mixed synthetic and real data might improve machine learning (ML) performance but it is difficult to draw generalizable conclusions. A contribution to this problem is the fact that the synthetic data from most vendors are improperly filtered and contain aliased (wrapped-around) high frequency information which they should not possess. Most vendors use spatial-domain, low-pass FIR filters to generate synthetic images at various ranges. Unfortunately, these FIR filters aim for interpolation with a desired frequency domain cutoff and spatial spacing (general non-integer scale factor and/or decimation). Hence, instead of a sharp cutoff at the desired low-band, they produce aliased data. This erroneous information in synthetic imagery could actually mislead an ML algorithm. In addition, most synthetic images do not account for a camera’s MTF (Modulation Transfer Function). A Fourier-based filtering can easily incorporate any MTF in the frequency domain based on Rayleigh resolution theory, properties of a camera lens and digital image properties. The spectral properties of images which are acquired with real sensors are studied and compared them with synthetic images from several vendors. We have also developed a metric that exhibits that the camera system’s MTF shows the same spectral property for real images at different ranges. The metric can help us to determine if a synthetic image generation engine violates this property and, hence, produces erroneous information.
|