Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
fabiosuizu 
posted an update 11 days ago
Post
1476
Hi everyone!

I've been working on a pronunciation assessment engine optimized for edge deployment and real-time feedback. Wanted to share it with the community and get feedback.

**What it does**: Scores English pronunciation at 4 levels of granularity — phoneme, word, sentence, and overall (0-100 each). Returns IPA and ARPAbet notation for every phoneme.

**Key specs**:
- 17MB total model size (NeMo Citrinet-256, INT4 quantized)
- 257ms median inference on CPU
- Exceeds human inter-annotator agreement at phone-level (+4.5%) and sentence-level (+5.2%)
- Benchmarked on speechocean762 (2,500 test utterances)
- Tested across 7 L1 backgrounds (Chinese, Japanese, Korean, Arabic, Spanish, Vietnamese, Russian)

**Architecture**: CTC forced alignment + Viterbi decoding + GOP (Goodness of Pronunciation) scoring + MLP/XGBoost ensemble heads. No wav2vec2 dependency — the entire pipeline runs in 17MB.

**Try it**: fabiosuizu/pronunciation-assessment

The demo lets you record audio or upload a file, enter the expected text, and get instant scoring down to individual phonemes.

**API access**: Available via REST API, MCP servers (for AI agents), and Azure Marketplace. Details in the Space description.

Would love feedback on:
1. Use cases you'd find this useful for
2. Languages you'd want supported next
3. Whether the scoring feels calibrated for your experience level

Thanks!

Wow this is super cool! Thanks for sharing!