AI Engine Model Summary
Simplified ASR-Only Configuration
This engine has been simplified to use ONLY the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).
Active Model
1. IndicWav2Vec Hindi (Primary & Only Model)
- Model ID:
ai4bharat/indicwav2vec-hindi - Type:
Wav2Vec2ForCTC - Purpose: Automatic Speech Recognition (ASR) for Hindi and Indian languages
- Status: β Active - Loaded at startup
- Location:
detect_stuttering.pylines 26, 148-156 - Authentication: Requires
HF_TOKENenvironment variable
Features:
- Speech-to-text transcription
- Confidence scoring from model predictions
- Text-based stutter analysis (simple repetition detection)
Removed Models
The following models have been removed to simplify the engine:
β MMS Language Identification (LID) -
facebook/mms-lid-126- Previously used for language detection
- No longer needed - IndicWav2Vec handles Hindi natively
β Isolation Forest (sklearn)
- Previously used for anomaly detection
- Removed - using simple text-based analysis instead
Removed Libraries
The following signal processing libraries are no longer used:
- β
parselmouth(Praat) - Voice quality analysis - β
fastdtw- Repetition detection via DTW - β
sklearn- Machine learning algorithms - β Complex acoustic feature extraction (MFCC, formants, etc.)
Current Pipeline
Audio Input
β
IndicWav2Vec Hindi ASR
β
Text Transcription
β
Basic Text Analysis
β
Results (transcript + simple stutter detection)
API Response Format
The simplified engine returns:
{
"actual_transcript": "transcribed text",
"target_transcript": "expected text (if provided)",
"mismatched_chars": ["timestamps of low confidence regions"],
"mismatch_percentage": 0.0,
"ctc_loss_score": 0.0,
"stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
"total_stutter_duration": 0.0,
"stutter_frequency": 0.0,
"severity": "none|mild|moderate|severe",
"confidence_score": 0.8,
"speaking_rate_sps": 0.0,
"analysis_duration_seconds": 0.0,
"model_version": "indicwav2vec-hindi-asr-v1"
}
Dependencies
Required:
transformers4.35.0 - For IndicWav2Vec modeltorch2.0.1 - PyTorch backendlibrosaβ₯0.10.0 - Audio loading (16kHz resampling)numpy- Array operations
Optional (for legacy methods, not used in ASR mode):
parselmouth- Voice quality (not used)fastdtw- DTW algorithm (not used)sklearn- ML algorithms (not used)
Usage
from diagnosis.ai_engine.detect_stuttering import get_stutter_detector
detector = get_stutter_detector()
result = detector.analyze_audio(
audio_path="path/to/audio.wav",
proper_transcript="expected text", # optional
language="hindi" # default: hindi
)
print(result['actual_transcript']) # ASR transcription
Notes
- The engine focuses only on ASR transcription
- Stutter detection is simplified to text-based repetition analysis
- No complex acoustic feature extraction
- Faster and lighter than the previous multi-model approach
- Optimized for Hindi but can handle other Indian languages