DNN, Predicted f0, Predicted ceps
VCC2SF1 Original
VCC2TF1 Original
Converted TF1 with Frame DNN with 64T64T64T64T
Converted TF1 with Frame DNN with 128T128T128T128T
Converted TF1 with Frame DNN with 256T256T5256T256T
Converted TF1 with Frame DNN with 1024T1024T0124T0124T
AugmentedDNN, Predicted f0, Predicted ceps
VCC2SF1 Original
VCC2TF1 Original
Converted TF1 with frame DNN of 128T128T128T128T augmented with 100 arctic sentneces
Converted TF1 with frame DNN of 256T256T256T256T augmented with 100 arctic sentences
TargetSpeakerPretrainedDNN, Predicted f0, Predicted ceps
VCC2SF1 Original
VCC2TF1 Original
Converted TF1 with pretrained frame DNN of 128T128T128T128T
Converted TF1 with pretrained frame DNN of 256T256T256T256T
VED, Predicted f0, Predicted ceps
VCC2SF1 Original
VCC2TF1 Original
Converted TF1 with Frame Variational Encoder Decoder with 32T32T32T32T
Converted TF1 with Frame Variational Encoder Decoder with 512T512T512T512T