VibeVoice CoreMl
Collection
VibeVoice models (TTS/STT) converted to CoreML • 4 items • Updated
How to use gafiatulin/vibevoice-tts-7b-coreml with VibeVoice:
import torch, soundfile as sf, librosa, numpy as np
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
# Load voice sample (should be 24kHz mono)
voice, sr = sf.read("path/to/voice_sample.wav")
if voice.ndim > 1: voice = voice.mean(axis=1)
if sr != 24000: voice = librosa.resample(voice, sr, 24000)
processor = VibeVoiceProcessor.from_pretrained("gafiatulin/vibevoice-tts-7b-coreml")
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
"gafiatulin/vibevoice-tts-7b-coreml", torch_dtype=torch.bfloat16
).to("cuda").eval()
model.set_ddpm_inference_steps(5)
inputs = processor(text=["Speaker 0: Hello!\nSpeaker 1: Hi there!"],
voice_samples=[[voice]], return_tensors="pt")
audio = model.generate(**inputs, cfg_scale=1.3,
tokenizer=processor.tokenizer).speech_outputs[0]
sf.write("output.wav", audio.cpu().numpy().squeeze(), 24000)VibeVoice 7B (Qwen2.5-7B) — CoreML INT8, fused LM+head, fused diffusion loop, DPM-Solver++ 10-step. Multi-speaker TTS with voice cloning.
Add vibevoice-coreml to your Swift package. Models auto-download from this repo on first use.
import VibeVoiceCoreML
let tts = try await MultispeakerTTS(architecture: .model7B)
let voices = try await tts.encodeVoices(from: [referenceAudioURL])
for try await frame in tts.speak("Hello world", config: MultispeakerConfig(), voices: voices) {
// frame.samples: [Float] at 24kHz
}
See the GitHub repo for CLI usage, Python pipelines, and conversion scripts.
ct.StateType for stateful models).mlmodelc — no on-device compilation neededlm_decoder_fused_int8.mlmodelcdiffusion_loop.mlmodelcvae_decoder_streaming.mlmodelcsemantic_encoder_streaming.mlmodelcacoustic_connector.mlmodelcsemantic_connector.mlmodelcvae_encoder.mlmodelcembed_tokens.bintokenizer.jsontokenizer_config.jsonMIT (same as upstream VibeVoice models from Microsoft)