In this paper, we introduce a new solution and underlying architecture that allows remote participants to interact with hosts in a broadcast scenario. To achieve this, background extraction is first applied to video received from remote participants to extract their faces and bodies. Considering that the video from remote participants are usually of lower resolutions when compared to content produced by professional cameras in production, we propose to scale the extracted video with a super-resolution module. Finally, the processed video from remote participants are merged with studio video and streamed to audiences. Given the real-time and high-quality requirements, both background extraction and super-resolution modules are learning-based solutions and run on GPUs. The proposed solution has been deployed in the Advance Mixed Reality (AdMiRe) project. The objective and subjective assessment results show that the proposed solution works well in real world applications.
|