ON ADVERSARIAL TRAINING AND LOSS FUNCTIONS FOR SPEECH ENHANCEMENT

Published in ICASSP, 2018

Abstract

Generative adversarial networks (GANs) are becoming increasingly popular for image processing tasks. Researchers have started using GANs for speech enhancement, but the advantage of using the GAN framework has not been established for speech enhancement. For example, a recent study reports encouraging enhancement results, but we find that the architecture of the generator used in the GAN gives better performance when it is trained alone using the L1 loss. This work presents a new GAN for speech enhancement, and obtains performance improvement with the help of adversarial training. A deep neural network (DNN) is used for time-frequency mask estimation, and it is trained in two ways: regular training with the L1 loss and training using the GAN framework with the help of an adversary discriminator. Experimental results suggest that the GAN framework improves speech enhancement performance. Further exploration of loss functions, for speech enhancement, suggests that the L1 loss is consistently better than the L2 loss for improving the perceptual quality of noisy speech. —