# Transcript Debugging Guide ## Issue: Empty Transcripts ("No transcript available") ## Complete Flow Analysis ### 1. Django App → API Request (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`) **Location:** Line 269-274 ```python response = requests.post( self.api_url, files=files, data={ "transcript": proper_transcript if proper_transcript else "", "language": lang_code, }, timeout=self.api_timeout ) ``` **Status:** ✅ Sending transcript parameter correctly --- ### 2. API Receives Request (`slaq-version-c-ai-enginee/app.py`) **Location:** Line 70-73 ```python @app.post("/analyze") async def analyze_audio( audio: UploadFile = File(...), transcript: str = Form("") # ✅ Fixed: Now uses Form() for multipart ): ``` **Status:** ✅ Fixed - Now correctly receives transcript via Form() --- ### 3. API Calls Model (`slaq-version-c-ai-enginee/app.py`) **Location:** Line 106 ```python result = detector.analyze_audio(temp_file, transcript) ``` **Status:** ✅ Passing transcript correctly --- ### 4. Model Transcribes Audio (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`) **Location:** Line 313-369 (`_transcribe_with_timestamps`) **Potential Issues:** - ❓ IndicWav2Vec decoding might not work with `processor.batch_decode()` - ❓ Need to use tokenizer directly - ❓ Model might not be producing valid predictions **Status:** ⚠️ **LIKELY ISSUE HERE** - Decoding method may be incorrect --- ### 5. Model Returns Result (`slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py`) **Location:** Line 787-794 ```python actual_transcript = transcript if transcript else "" target_transcript = proper_transcript if proper_transcript else transcript if transcript else "" return { 'actual_transcript': actual_transcript, 'target_transcript': target_transcript, ... } ``` **Status:** ✅ Returns transcripts correctly (if transcript is not empty) --- ### 6. API Returns Response (`slaq-version-c-ai-enginee/app.py`) **Location:** Line 109-113 ```python actual = result.get('actual_transcript', '') target = result.get('target_transcript', '') logger.info(f"📝 Result transcripts - Actual: '{actual[:100]}' (len: {len(actual)}), Target: '{target[:100]}' (len: {len(target)})") return result ``` **Status:** ✅ Returns JSON with transcripts --- ### 7. Django Receives Response (`slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`) **Location:** Line 279-410 ```python result = response.json() # ... formatting ... actual_transcript = str(api_result.get('actual_transcript', '')).strip() target_transcript = str(api_result.get('target_transcript', '')).strip() ``` **Status:** ✅ Extracts transcripts correctly --- ### 8. Django Saves to Database (`slaq-version-c/diagnosis/tasks.py`) **Location:** Line 141-142 ```python actual_transcript=actual_transcript, target_transcript=target_transcript, ``` **Status:** ✅ Saves correctly --- ## Root Cause Analysis ### Most Likely Issue: Transcription Decoding The IndicWav2Vec model (`ai4bharat/indicwav2vec-hindi`) may require: 1. **Direct tokenizer access** instead of `processor.batch_decode()` 2. **CTC decoding** with proper tokenizer 3. **Special handling** for Indic scripts ### Fix Applied Updated `_transcribe_with_timestamps()` to: 1. Try multiple decoding methods 2. Use tokenizer directly if available 3. Add comprehensive error logging 4. Log predicted IDs for debugging --- ## Debugging Steps ### 1. Check API Logs When processing audio, look for: ``` 📝 Transcribed text: '...' (length: X) 📝 Final return - Actual: '...' (len: X), Target: '...' (len: Y) 📝 Result transcripts - Actual: '...' (len: X), Target: '...' (len: Y) ``` ### 2. Check Django Logs Look for: ``` 📝 Final transcripts - Actual: X chars, Target: Y chars 📝 Saving transcripts - Actual: X chars, Target: Y chars ``` ### 3. Check Database Query the `AnalysisResult` table: ```sql SELECT actual_transcript, target_transcript, LENGTH(actual_transcript) as actual_len, LENGTH(target_transcript) as target_len FROM diagnosis_analysisresult ORDER BY created_at DESC LIMIT 5; ``` ### 4. Test API Directly ```bash curl -X POST "http://localhost:7860/analyze" \ -F "audio=@test.wav" \ -F "transcript=test transcript" \ -F "language=hin" ``` Check the response JSON for `actual_transcript` and `target_transcript`. --- ## Next Steps 1. **Rebuild Docker image** with latest changes 2. **Check logs** during audio processing 3. **Verify processor structure** - logs will show processor attributes 4. **Test with Hindi audio** - model is optimized for Hindi 5. **Check if model is loaded correctly** - verify HF_TOKEN is working --- ## Expected Log Output (Success) ``` 🚀 Initializing Advanced AI Engine on cpu... ✅ HF_TOKEN found - using authenticated model access 📋 Processor type: 📋 Processor attributes: ['batch_decode', 'decode', 'feature_extractor', 'tokenizer', ...] 📋 Tokenizer type: 📝 Transcribed text: 'नमस्ते मैं हिंदी बोल रहा हूं' (length: 25) 📝 Final return - Actual: 'नमस्ते मैं हिंदी बोल रहा हूं' (len: 25), Target: '...' (len: X) ``` --- ## If Still Empty 1. **Model may not be loaded correctly** - check HF_TOKEN 2. **Audio format issue** - ensure 16kHz mono WAV 3. **Model not producing predictions** - check predicted_ids in logs 4. **Tokenizer mismatch** - IndicWav2Vec may need special tokenizer initialization