The emergence of Generative Adversarial Network (GAN)-based single-image super-resolution (SISR) has allowed for finer textures in the super-resolved images, thus making them seem realistic to humans. However, GANbased models may depend on extensive high-quality data and are known to be very costly and unstable to train. On the other hand, Variational Autoencoders (VAEs) have inherent mathematical properties, and they are relatively cheap and stable to train; but VAEs produce blurry images that prevent them from being used for super-resolution. In this paper, we propose a first of its kind SISR method that takes advantage of a selfevaluating Variational Autoencoder (IntroVAE). Our network, called SRVAE, judges the quality of generated high-resolution (HR) images with the target images in an adversarial manner, which allows for high perceptual image generation. First, the encoder and the decoder of our introVAE-based method learn the manifold of HR images. In parallel, another encoder and decoder are simultaneously learning the reconstruction of the lowresolution (LR) images. Next, reconstructed LR images are fed to the encoder of the HR network to learn a mapping from LR images to corresponding HR versions. Using the encoder as a discriminator allows SRVAE to be a fast single-stream framework that performs super-resolution through generating photo-realistic images. Moreover, SRVAE has the same training stability and "nice" latent manifold structure as of VAEs, while playing a max-min adversarial game between the generator and the encoder like GANs. Our experiments show that our super-resolved images are comparable to the state-of-the-art GAN-based super-resolution.
|