Second fine-tuning try of wav2vec2-base. Results are similar to the ones reported in https://huggingface.co/facebook/wav2vec2-base-100h.

Model was trained on librispeech-clean-train.100 with following hyper-parameters:

2 GPUs Titan RTX
Total update steps 11000
Batch size per GPU: 32 corresponding to a total batch size of ca. ~750 seconds
Adam with linear decaying learning rate with 3000 warmup steps
dynamic padding for batch
fp16
attention_mask was not used during training

Result (WER) on Librispeech:

"clean" (% rel difference to results in paper)	"other" (% rel difference to results in paper)
6.2 (-1.6%)	15.2 (-11.2%)

patrickvonplaten
/

wav2vec2-base-100h-2nd-try

Dataset used to train patrickvonplaten/wav2vec2-base-100h-2nd-try