slaq-version-c-ai-enginee / MODEL_SUMMARY.md
anfastech's picture
Updation: Simplifying the AI engine to use only ai4bharat/indicwav2vec-hindi for ASR.
e7e9fa8

AI Engine Model Summary

Simplified ASR-Only Configuration

This engine has been simplified to use ONLY the IndicWav2Vec Hindi model for Automatic Speech Recognition (ASR).


Active Model

1. IndicWav2Vec Hindi (Primary & Only Model)

  • Model ID: ai4bharat/indicwav2vec-hindi
  • Type: Wav2Vec2ForCTC
  • Purpose: Automatic Speech Recognition (ASR) for Hindi and Indian languages
  • Status: βœ… Active - Loaded at startup
  • Location: detect_stuttering.py lines 26, 148-156
  • Authentication: Requires HF_TOKEN environment variable

Features:

  • Speech-to-text transcription
  • Confidence scoring from model predictions
  • Text-based stutter analysis (simple repetition detection)

Removed Models

The following models have been removed to simplify the engine:

  1. ❌ MMS Language Identification (LID) - facebook/mms-lid-126

    • Previously used for language detection
    • No longer needed - IndicWav2Vec handles Hindi natively
  2. ❌ Isolation Forest (sklearn)

    • Previously used for anomaly detection
    • Removed - using simple text-based analysis instead

Removed Libraries

The following signal processing libraries are no longer used:

  • ❌ parselmouth (Praat) - Voice quality analysis
  • ❌ fastdtw - Repetition detection via DTW
  • ❌ sklearn - Machine learning algorithms
  • ❌ Complex acoustic feature extraction (MFCC, formants, etc.)

Current Pipeline

Audio Input
    ↓
IndicWav2Vec Hindi ASR
    ↓
Text Transcription
    ↓
Basic Text Analysis
    ↓
Results (transcript + simple stutter detection)

API Response Format

The simplified engine returns:

{
  "actual_transcript": "transcribed text",
  "target_transcript": "expected text (if provided)",
  "mismatched_chars": ["timestamps of low confidence regions"],
  "mismatch_percentage": 0.0,
  "ctc_loss_score": 0.0,
  "stutter_timestamps": [{"type": "repetition", "start": 0.0, "end": 0.5, ...}],
  "total_stutter_duration": 0.0,
  "stutter_frequency": 0.0,
  "severity": "none|mild|moderate|severe",
  "confidence_score": 0.8,
  "speaking_rate_sps": 0.0,
  "analysis_duration_seconds": 0.0,
  "model_version": "indicwav2vec-hindi-asr-v1"
}

Dependencies

Required:

  • transformers 4.35.0 - For IndicWav2Vec model
  • torch 2.0.1 - PyTorch backend
  • librosa β‰₯0.10.0 - Audio loading (16kHz resampling)
  • numpy - Array operations

Optional (for legacy methods, not used in ASR mode):

  • parselmouth - Voice quality (not used)
  • fastdtw - DTW algorithm (not used)
  • sklearn - ML algorithms (not used)

Usage

from diagnosis.ai_engine.detect_stuttering import get_stutter_detector

detector = get_stutter_detector()
result = detector.analyze_audio(
    audio_path="path/to/audio.wav",
    proper_transcript="expected text",  # optional
    language="hindi"  # default: hindi
)

print(result['actual_transcript'])  # ASR transcription

Notes

  • The engine focuses only on ASR transcription
  • Stutter detection is simplified to text-based repetition analysis
  • No complex acoustic feature extraction
  • Faster and lighter than the previous multi-model approach
  • Optimized for Hindi but can handle other Indian languages