File size: 1,408 Bytes

dac1ac2
 
a866d07
 
 
 
 
 
 
 
 
 
dac1ac2
c01369b
dac1ac2
c01369b
 
 
78b57f1
c01369b
64e2af0
 
c01369b
 
 
061848d
 
c01369b
 
b54023f
afc5605
a866d07

---
license: cc-by-4.0
datasets:
- sarulab-speech/mls_sidon
- mythicinfinity/Libriheavy-HQ
language:
- en
pipeline_tag: audio-to-audio
tags:
- Audio
- Codec
- TTS
---
# LayaCodec

LayaCodec: Rapid, High-Fidelity Audio Compression: Reaching the Pareto Frontier in Neural Audio Codecs


This is a neural audio codec/tokenizer that encodes 16khz at a rate from 12.5 t/s(0.16 kpbs) to 50 t/s(0.65 kpbs) using a single 8192 size codebook and decodes it into 44.1khz audio.
This allows for much faster and scalable TTS models compared to othern modern codecs for several reasons.
1. **Much** lower token rates than other single pass codecs such as Xcodec2(50 t/s), Snac(83 t/s), Dac(774 t/s), etc.
2. **Much** smaller codebook size(8192) compared to Xcodec2(65536) for faster TTS model training speed.
3. Over 40x faster then most diffusion based codecs allowing for **much** simpler and larger scale TTS models where codecs are not the bottleneck.
4. Decodes audio into 44.1khz which is much higher quality then the common 24khz or 16khz sampling rate.

Repo: https://github.com/ysharma3501/LayaCodec

This is still W.I.P, it has only seen a few hundred hours of training data but surprisingly good quality. It will still need some more training.

Model is released with a permissive CC-BY-4.0 license and Code is released with Apache-2.0 license.

Thanks very much to the authors of FocalCodec and Anime-XCodec2.