Full-Transformer-Encoder-Decoder-for-Speech-Recognition

For running the model, just run through the cells in the notebook.

the experiments: I tuned the number of attention heads from 1-4, layers from 1-2, and dropout rate from 0.1 to 0.3.

architectures you tried: the provided speech transformer architecture.

the number of epochs you trained for: for part 1, i stopped at epoch 52. For part2, i stopped at epoch 12.

hyperparameters you used (and specify which combination resulted in the best score): ] config {'train_dataset': 'train-clean-100', 'cepstral_norm': True, 'input_dim': 27, 'batch_size': 64, 'enc_dropout': 0.23, 'enc_num_layers': 1, 'enc_num_heads': 1, 'dec_dropout': 0.23, 'dec_num_layers': 2, 'dec_num_heads': 2, 'd_model': 512, 'd_ff': 2048, 'learning_rate': '1E-4', 'optimizer': 'AdamW', 'momentum': 0.0, 'nesterov': True, 'scheduler': 'CosineAnnealing', 'factor': 0.9, 'patience': 3, 'epochs': 100, 'Name': 'Zhenjian Wang'}

I have tried to set dropout rate to 0.1, and adding layers and heads, but got worse result, with distance higher than 100. I also tried to increase the attention head in the second part of the HW, but it did not increase the performance of the model.

data loading scheme: download the data from kaggle, then use the speechdataset class to read in the data.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Full-Transformer-Encoder-Decoder-for-Speech-Recognition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Full-Transformer-Encoder-Decoder-for-Speech-Recognition

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages