File size: 1,408 Bytes
dac1ac2 a866d07 dac1ac2 c01369b dac1ac2 c01369b 78b57f1 c01369b 64e2af0 c01369b 061848d c01369b b54023f afc5605 a866d07 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
license: cc-by-4.0
datasets:
- sarulab-speech/mls_sidon
- mythicinfinity/Libriheavy-HQ
language:
- en
pipeline_tag: audio-to-audio
tags:
- Audio
- Codec
- TTS
---
# LayaCodec
LayaCodec: Rapid, High-Fidelity Audio Compression: Reaching the Pareto Frontier in Neural Audio Codecs
This is a neural audio codec/tokenizer that encodes 16khz at a rate from 12.5 t/s(0.16 kpbs) to 50 t/s(0.65 kpbs) using a single 8192 size codebook and decodes it into 44.1khz audio.
This allows for much faster and scalable TTS models compared to othern modern codecs for several reasons.
1. **Much** lower token rates than other single pass codecs such as Xcodec2(50 t/s), Snac(83 t/s), Dac(774 t/s), etc.
2. **Much** smaller codebook size(8192) compared to Xcodec2(65536) for faster TTS model training speed.
3. Over 40x faster then most diffusion based codecs allowing for **much** simpler and larger scale TTS models where codecs are not the bottleneck.
4. Decodes audio into 44.1khz which is much higher quality then the common 24khz or 16khz sampling rate.
Repo: https://github.com/ysharma3501/LayaCodec
This is still W.I.P, it has only seen a few hundred hours of training data but surprisingly good quality. It will still need some more training.
Model is released with a permissive CC-BY-4.0 license and Code is released with Apache-2.0 license.
Thanks very much to the authors of FocalCodec and Anime-XCodec2. |