Video super-resolution (VSR) aims to generate a high-resolution (HR) video by reconstructing each frame from the corresponding low-resolution (LR) frame and its neighboring ones. However, it is challenging to generate a clean HR frame due to displacement between the frames. To alleviate this problem, most existing approaches explicitly aligned neighboring frames to the reference. However, these methods tend to generate noticeable artifacts in the aligned results. In this paper, we propose a detail-structure blending network (DSBN), which structurally aligns the adjacent frames by blending the frames in deep-feature-domain. The DSBN extracts deep features from the reference frame and each neighboring frame separately, and blends the extracted detail and structure information to obtain the well-aligned results. Afterward, a simple reconstruction network generates a final HR frame using the aligned frames. Experimental results on the benchmark datasets demonstrate that our method produces high-quality HR results even from videos with non-rigid motions and occlusions.
|