Paper
23 August 2022 TFD-MelGAN: personalized voiceprint feature speech synthesis based on multi-domain signal processing
Daigang Chen, Hua Jiang, Chengxi Pu, Shaowen Yao
Author Affiliations +
Proceedings Volume 12305, International Symposium on Artificial Intelligence Control and Application Technology (AICAT 2022); 1230509 (2022) https://doi.org/10.1117/12.2645688
Event: International Symposium on Artificial Intelligence Control and Application Technology (AICAT 2022), 2022, Hangzhou, China
Abstract
In recent years, with the popularity of deep learning, speech synthesis technology has developed rapidly and achieved many good achievements. Among them, the technology of speech synthesis for personalized voiceprint features has also become a research focus. In the existing work, the model for personalized voiceprint feature speech synthesis based on GANs has achieved certain results. The model successfully synthesized speech with personalized voiceprint features in a non-autoregressive way, but the audio quality of the synthesized speech and efficiency was low, and the model training time was long. In this paper, we improve the model through techniques such as multi-domain signal processing. Specifically, we reduce a lot of training time by optimizing several parameters of the model. In addition, the architecture of the model has been improved to a certain extent, which effectively improves the MOS score of synthesized speech.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daigang Chen, Hua Jiang, Chengxi Pu, and Shaowen Yao "TFD-MelGAN: personalized voiceprint feature speech synthesis based on multi-domain signal processing", Proc. SPIE 12305, International Symposium on Artificial Intelligence Control and Application Technology (AICAT 2022), 1230509 (23 August 2022); https://doi.org/10.1117/12.2645688
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Autoregressive models

Signal processing

Systems modeling

Back to Top