I think a discussion post would be more appropriate for this, but discussions are not open at the moment of writing this.
Have you considered making a version of this repo that works with audio files instead of images? My understanding is that replacing CLIP by CLAP should give a similar performance but for audio files instead.
I think a discussion post would be more appropriate for this, but discussions are not open at the moment of writing this.
Have you considered making a version of this repo that works with audio files instead of images? My understanding is that replacing CLIP by CLAP should give a similar performance but for audio files instead.