Synesthetic-AI

(Image from the article: https://www.frontiersin.org/articles/10.3389/fnins.2018.00777/full, to represent a 2D-CNN)

The aim of this project is to apply 2D convolution neural networks (CNNs) to visually perform speaker recognition. The CNNs are applied to "see" audios representations generated from the original audios by an algorithm.

requirements

The following libs were used:

pandas
numpy
torch
matplotlib
librosa
sklearn

The versions are the default used on google colab in the ned of 2022, as the project was done using it. I hope the most recent versions installed with pip (whenever you are reading this) might work as well.

Original Dataset

This project uses the dataset: https://www.kaggle.com/datasets/vjcalling/speaker-recognition-audio-dataset, which has 50 different speakers. Only 10 of them were used due to lack of memory resources. All this audios were converted to a single vector with each position being a value related to the volume at some moment of time.

The following image shows an example:

Audio's Images dataset

There are 1D convolution neural nets, which might work for this problem. But this project uses 2D CNNs that works on images. To adapt the representation it is necessary to transform the input into a series of images. The algorithm used was a composition of the results of many Fast Fourier Transforms stacked applied to sections of the original audio. Besides this main part of the algorithm, several approchas to split the audios were tried to increase the number of samples of each speaker and decrease the size of the images without compromising their quality.

The following image is an example of the transformation:

Models

3 types of sizes of CNNs were used (gradually increasing the number of parameters), with 3 types of challenges: binary classification, classification of 5 speakers, classification of 10 speakers.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Preparacao.ipynb		Preparacao.ipynb
README.md		README.md
redes_sinais.ipynb		redes_sinais.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synesthetic-AI

requirements

Original Dataset

Audio's Images dataset

Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synesthetic-AI

requirements

Original Dataset

Audio's Images dataset

Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages