BioMistral-7B Symptom-to-Diagnosis Classifier
Fine-tuned BioMistral-7B for medical symptom classification using QLoRA
Model Summary
This is a fine-tuned BioMistral-7B model for classifying medical symptoms into 10 common diagnoses. The model was trained using QLoRA (Quantized Low-Rank Adaptation) on a curated dataset of 10,000 symptom-diagnosis pairs, achieving 99.1% accuracy on the test set.
- Base Model: BioMistral/BioMistral-7B
- Task: Multi-class Text Classification (10 classes)
- Fine-tuning Method: QLoRA with 4-bit quantization
- Training Data: 8,000 samples (10 diagnosis classes)
- Validation Data: 1,000 samples
- Test Data: 1,000 samples
- Model Type: Sequence Classification
- Language: English
- License: MIT
Intended Use
β Appropriate Uses
- Educational demonstrations of medical AI systems
- Research in biomedical NLP and text classification
- Experiments with medical symptom understanding
- Teaching about AI in healthcare contexts
- Baseline model for medical classification tasks
β Not Intended For
- Clinical diagnosis or real medical decision-making
- Emergency medical decisions
- Treatment planning or recommendations
- Any deployment in healthcare settings
- Replacement of professional medical judgment
β οΈ Medical Disclaimer
This model is for educational and research purposes ONLY.
- Outputs may be incorrect, incomplete, or biased
- Does NOT replace professional medical advice
- NOT validated for clinical use
- NOT approved by any regulatory body
Supported Diagnoses (10 Classes)
| Class ID | Diagnosis | Example Symptoms |
|---|---|---|
| 0 | Acute Bronchitis | cough, chest pain, shortness of breath, mucus production |
| 1 | Anxiety | anxiety and nervousness, rapid heartbeat, shortness of breath, panic attacks |
| 2 | Conjunctivitis due to Allergy | eye redness, itchiness of eye, lacrimation, watery eyes |
| 3 | Eczema | skin rash, skin dryness, itching of skin, abnormal appearing skin |
| 4 | Infectious Gastroenteritis | nausea, vomiting, diarrhea, abdominal cramps |
| 5 | Pneumonia | fever, cough, difficulty breathing, chest pain |
| 6 | Psoriasis | abnormal appearing skin, skin lesion, skin rash |
| 7 | Spondylosis | back pain, neck pain, neck stiffness, limited mobility |
| 8 | Sprain or Strain | joint pain, swelling, bruising, limited movement |
| 9 | Strep Throat | sore throat, fever, difficulty swallowing, swollen lymph nodes |
Performance
Test Set Results (n=1,000)
| Metric | Score |
|---|---|
| Overall Accuracy | 99.1% |
| Precision (weighted) | 99.11% |
| Recall (weighted) | 99.10% |
| F1-Score (weighted) | 99.10% |
| Test Loss | 0.0313 |
Per-Class Performance
| Diagnosis | Accuracy | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|---|
| Acute Bronchitis | 97.0% | 97.98% | 97.0% | 97.49% | 100 |
| Anxiety | 100.0% | 100.0% | 100.0% | 100.0% | 100 |
| Conjunctivitis | 100.0% | 100.0% | 100.0% | 100.0% | 100 |
| Eczema | 100.0% | 100.0% | 100.0% | 100.0% | 100 |
| Gastroenteritis | 100.0% | 100.0% | 100.0% | 100.0% | 100 |
| Pneumonia | 98.0% | 96.08% | 98.0% | 97.03% | 100 |
| Psoriasis | 100.0% | 100.0% | 100.0% | 100.0% | 100 |
| Spondylosis | 100.0% | 97.09% | 100.0% | 98.52% | 100 |
| Sprain or Strain | 97.0% | 100.0% | 97.0% | 98.48% | 100 |
| Strep Throat | 99.0% | 100.0% | 99.0% | 99.50% | 100 |
Error Analysis:
- Total misclassifications: 9 out of 1,000 (0.9% error rate)
- Main confusion: Acute Bronchitis β Pneumonia (5 errors)
- Minor confusion: Sprain/Strain β Spondylosis (3 errors)
Validation Performance
| Metric | Score |
|---|---|
| Validation Accuracy | 97.7% |
| Validation Loss | 0.0576 |
Model Architecture
Base Model: BioMistral-7B
- Parameters: 7 billion
- Architecture: Mistral-based transformer optimized for biomedical text
- Specialization: Pre-trained on biomedical literature
Fine-Tuning: QLoRA Configuration
LoRA Config:
- Task Type: SEQ_CLS (Sequence Classification)
- Rank (r): 16
- Alpha: 32
- Dropout: 0.1
- Target Modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj']
- Bias: none
- Trainable Parameters: 13,672,448 (0.19% of total)
Quantization
BitsAndBytes Config:
- Load in 4-bit: True
- Quantization Type: nf4
- Compute dtype: float16
- Double Quantization: True
Total Parameters: 7,124,373,504
Trainable Parameters: 13,672,448 (0.1919%)
Memory Footprint: ~4.5 GB (4-bit quantized)
Training Details
Dataset
Total Samples: 10,000 symptom-diagnosis pairs
| Split | Samples | Percentage |
|---|---|---|
| Train | 8,000 | 80% |
| Validation | 1,000 | 10% |
| Test | 1,000 | 10% |
Data Format:
{
"text": "cough ,fever and difficulty breathing",
"diagnosis": "pneumonia",
"label": 5 # Mapped to class ID
}
Important: Symptoms follow specific formatting:
- Space before comma:
symptom1 ,symptom2 - Use
andbefore last symptom - Lowercase medical terminology
- Example:
"nausea ,vomiting ,diarrhea and abdominal cramps"
Training Hyperparameters
| Parameter | Value |
|---|---|
| Training Regime | Supervised Fine-tuning |
| Epochs | 10 (early stopped at epoch 4) |
| Batch Size (per device) | 8 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 32 |
| Learning Rate | 2e-4 |
| Learning Rate Scheduler | Linear |
| Warmup Steps | 100 |
| Weight Decay | 0.01 |
| Max Sequence Length | 128 |
| Optimizer | AdamW (8-bit paged) |
| Early Stopping Patience | 3 epochs |
| FP16 Training | Enabled |
Training Infrastructure
- Hardware: NVIDIA A100 GPU
- Training Time: 44 minutes (2,668 seconds)
- Training Steps: 1,000 (out of planned 2,500)
- Evaluation Strategy: Every 100 steps
- Save Strategy: Best model based on accuracy
Training Progress
| Step | Train Loss | Val Loss | Val Acc | Val F1 |
|---|---|---|---|---|
| 100 | 0.3509 | 0.3096 | 93.4% | 93.3% |
| 200 | 0.2489 | 0.3245 | 96.4% | 96.4% |
| 300 | 0.1496 | 0.1042 | 96.9% | 96.9% |
| 400 | 0.0973 | 0.0994 | 96.8% | 96.8% |
| 500 | 0.1595 | 0.1555 | 97.5% | 97.5% |
| 600 | 0.0907 | 0.0850 | 97.3% | 97.3% |
| 700 | 0.1784 | 0.0576 | 97.7% | 97.7% |
| 800 | 0.0646 | 0.0857 | 97.7% | 97.7% |
| 900 | 0.0559 | 0.1474 | 97.7% | 97.7% |
| 1000 | 0.0732 | 0.0958 | 96.7% | 96.7% |
Best Checkpoint: Step 700 (lowest validation loss: 0.0576)
How to Use
Installation
pip install transformers peft torch bitsandbytes accelerate
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel
import torch
# Model configuration
MODEL_NAME = "Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis"
BASE_MODEL = "BioMistral/BioMistral-7B"
# Quantization config (for efficient inference)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
# Load base model
model = AutoModelForSequenceClassification.from_pretrained(
BASE_MODEL,
num_labels=10,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.float16
)
# Load LoRA adapters
model = PeftModel.from_pretrained(model, MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id
model.eval()
print("Model loaded successfully!")
Inference
# Diagnosis class mapping
DIAGNOSIS_CLASSES = {
0: "acute bronchitis",
1: "anxiety",
2: "conjunctivitis due to allergy",
3: "eczema",
4: "infectious gastroenteritis",
5: "pneumonia",
6: "psoriasis",
7: "spondylosis",
8: "sprain or strain",
9: "strep throat"
}
def predict_diagnosis(symptoms_text):
"""
Predict diagnosis from symptoms
Args:
symptoms_text: str, formatted symptoms
e.g., "nausea ,vomiting ,diarrhea and fever"
Returns:
diagnosis: str, predicted diagnosis name
confidence: float, prediction confidence (0-1)
"""
# Tokenize
inputs = tokenizer(
symptoms_text,
return_tensors="pt",
padding=True,
truncation=True,
max_length=128
)
# Move to device
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
# Predict
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Get probabilities
probabilities = torch.softmax(logits, dim=-1)
confidence, predicted_class = torch.max(probabilities, dim=-1)
# Map to diagnosis
diagnosis = DIAGNOSIS_CLASSES[predicted_class.item()]
confidence_score = confidence.item()
return diagnosis, confidence_score
# Example usage
symptoms = "nausea ,vomiting ,diarrhea and abdominal cramps"
diagnosis, confidence = predict_diagnosis(symptoms)
print(f"Diagnosis: {diagnosis}")
print(f"Confidence: {confidence:.1%}")
Output:
Diagnosis: infectious gastroenteritis
Confidence: 100.0%
Batch Prediction
def batch_predict(symptoms_list):
"""Predict multiple symptom texts at once"""
inputs = tokenizer(
symptoms_list,
return_tensors="pt",
padding=True,
truncation=True,
max_length=128
)
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
confidences, predicted_classes = torch.max(probabilities, dim=-1)
results = []
for pred_class, conf in zip(predicted_classes, confidences):
results.append({
'diagnosis': DIAGNOSIS_CLASSES[pred_class.item()],
'confidence': conf.item()
})
return results
# Example
symptoms_batch = [
"fever ,cough and difficulty breathing",
"anxiety and nervousness ,rapid heartbeat and shortness of breath",
"skin rash ,itching of skin and abnormal appearing skin"
]
results = batch_predict(symptoms_batch)
for i, result in enumerate(results):
print(f"{i+1}. {result['diagnosis']} ({result['confidence']:.1%})")
Input Format Requirements
CRITICAL: The model expects symptoms in a specific format matching its training data.
β Correct Format
# Space BEFORE comma, 'and' before last symptom
"nausea ,vomiting ,diarrhea and abdominal cramps"
"cough ,fever and difficulty breathing"
"eye redness ,itchiness of eye and lacrimation"
β Incorrect Format
# No spaces before commas
"nausea, vomiting, diarrhea, fever" # Will likely fail
# Missing 'and' before last symptom
"nausea ,vomiting ,diarrhea ,fever" # Suboptimal
# Capitalized
"Nausea ,Vomiting ,Diarrhea and Fever" # Wrong case
Format Rules
- Spacing: Space before each comma (
symptom1 ,symptom2) - Conjunction: Use
andbefore the last symptom - Case: Lowercase text
- Terminology: Medical terminology preferred
- Punctuation: No period at the end
Limitations
Scope Limitations
- Limited Conditions: Only 10 diagnoses (not comprehensive)
- Symptom Format: Highly dependent on exact text formatting
- No Severity: Cannot assess urgency or severity levels
- Single Diagnosis: Returns only one diagnosis (no differential)
- No Confidence Threshold: Always returns a prediction
Performance Limitations
- Symptom Overlap: Lower accuracy on conditions with similar symptoms
- Eczema vs. Psoriasis (both skin conditions)
- Acute Bronchitis vs. Pneumonia (both respiratory)
- Format Sensitivity: Performance drops with incorrectly formatted input
- Training Distribution: Best performance on symptoms similar to training data
- No Rare Conditions: Cannot identify conditions outside the 10 classes
Technical Limitations
- Quantization Effects: 4-bit quantization may introduce minor accuracy variations
- Context Window: Limited to 128 tokens (sufficient for symptom lists)
- No Multi-label: Cannot predict multiple concurrent conditions
- Fixed Vocabulary: Limited to medical terms seen during training
Bias and Ethical Considerations
Potential Biases
- Training Data Bias: Reflects symptom descriptions in training corpus
- Language Bias: English-only; may not generalize to other languages
- Medical Terminology: May perform better on formal medical terms
- Demographic Bias: Training data may not represent all populations equally
Ethical Use
- Transparency: Always disclose AI-generated predictions
- Human Oversight: Require medical professional review
- Educational Context: Frame as learning tool, not diagnostic tool
- No Harm: Do not use in ways that could harm patients
- Privacy: Do not input actual patient data without proper safeguards
Environmental Impact
- Training: ~44 minutes on 1Γ NVIDIA A100 GPU
- Carbon Footprint: Estimated ~0.05 kg CO2eq (training only)
- Inference: Efficient 4-bit quantization reduces deployment carbon cost
Citation
If you use this model in your research or applications, please cite:
@misc{chauhan2025biomistral_symptom_classifier,
title={BioMistral-7B Symptom-to-Diagnosis Classifier},
author={Sugandha Chauhan},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis}},
note={Fine-tuned with QLoRA for medical symptom classification}
}
Acknowledgments
- BioMistral Team: For the excellent biomedical language model
- Hugging Face: For transformers library and model hosting
- PEFT Team: For the efficient fine-tuning framework
- Medical Dataset: Curated from publicly available resources
Model Card Authors
- Sugandha Chauhan (@Sugandha-Chauhan)
Model Card Contact
For questions, issues, or feedback:
- Open an issue in the Community tab
- Hugging Face: @Sugandha-Chauhan
Last Updated: November 2025
Model Version: 1.0
- Downloads last month
- 8
Model tree for Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis
Base model
BioMistral/BioMistral-7B