BioMistral-7B Symptom-to-Diagnosis Classifier

Fine-tuned BioMistral-7B for medical symptom classification using QLoRA

Model Summary

This is a fine-tuned BioMistral-7B model for classifying medical symptoms into 10 common diagnoses. The model was trained using QLoRA (Quantized Low-Rank Adaptation) on a curated dataset of 10,000 symptom-diagnosis pairs, achieving 99.1% accuracy on the test set.

Base Model: BioMistral/BioMistral-7B
Task: Multi-class Text Classification (10 classes)
Fine-tuning Method: QLoRA with 4-bit quantization
Training Data: 8,000 samples (10 diagnosis classes)
Validation Data: 1,000 samples
Test Data: 1,000 samples
Model Type: Sequence Classification
Language: English
License: MIT

Intended Use

✅ Appropriate Uses

Educational demonstrations of medical AI systems
Research in biomedical NLP and text classification
Experiments with medical symptom understanding
Teaching about AI in healthcare contexts
Baseline model for medical classification tasks

❌ Not Intended For

Clinical diagnosis or real medical decision-making
Emergency medical decisions
Treatment planning or recommendations
Any deployment in healthcare settings
Replacement of professional medical judgment

⚠️ Medical Disclaimer

This model is for educational and research purposes ONLY.

Outputs may be incorrect, incomplete, or biased
Does NOT replace professional medical advice
NOT validated for clinical use
NOT approved by any regulatory body

Supported Diagnoses (10 Classes)

Class ID	Diagnosis	Example Symptoms
0	Acute Bronchitis	cough, chest pain, shortness of breath, mucus production
1	Anxiety	anxiety and nervousness, rapid heartbeat, shortness of breath, panic attacks
2	Conjunctivitis due to Allergy	eye redness, itchiness of eye, lacrimation, watery eyes
3	Eczema	skin rash, skin dryness, itching of skin, abnormal appearing skin
4	Infectious Gastroenteritis	nausea, vomiting, diarrhea, abdominal cramps
5	Pneumonia	fever, cough, difficulty breathing, chest pain
6	Psoriasis	abnormal appearing skin, skin lesion, skin rash
7	Spondylosis	back pain, neck pain, neck stiffness, limited mobility
8	Sprain or Strain	joint pain, swelling, bruising, limited movement
9	Strep Throat	sore throat, fever, difficulty swallowing, swollen lymph nodes

Performance

Test Set Results (n=1,000)

Metric	Score
Overall Accuracy	99.1%
Precision (weighted)	99.11%
Recall (weighted)	99.10%
F1-Score (weighted)	99.10%
Test Loss	0.0313

Per-Class Performance

Diagnosis	Accuracy	Precision	Recall	F1-Score	Support
Acute Bronchitis	97.0%	97.98%	97.0%	97.49%	100
Anxiety	100.0%	100.0%	100.0%	100.0%	100
Conjunctivitis	100.0%	100.0%	100.0%	100.0%	100
Eczema	100.0%	100.0%	100.0%	100.0%	100
Gastroenteritis	100.0%	100.0%	100.0%	100.0%	100
Pneumonia	98.0%	96.08%	98.0%	97.03%	100
Psoriasis	100.0%	100.0%	100.0%	100.0%	100
Spondylosis	100.0%	97.09%	100.0%	98.52%	100
Sprain or Strain	97.0%	100.0%	97.0%	98.48%	100
Strep Throat	99.0%	100.0%	99.0%	99.50%	100

Error Analysis:

Total misclassifications: 9 out of 1,000 (0.9% error rate)
Main confusion: Acute Bronchitis ↔ Pneumonia (5 errors)
Minor confusion: Sprain/Strain ↔ Spondylosis (3 errors)

Validation Performance

Metric	Score
Validation Accuracy	97.7%
Validation Loss	0.0576

Model Architecture

Base Model: BioMistral-7B

Parameters: 7 billion
Architecture: Mistral-based transformer optimized for biomedical text
Specialization: Pre-trained on biomedical literature

Fine-Tuning: QLoRA Configuration

LoRA Config:
  - Task Type: SEQ_CLS (Sequence Classification)
  - Rank (r): 16
  - Alpha: 32
  - Dropout: 0.1
  - Target Modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj']
  - Bias: none
  - Trainable Parameters: 13,672,448 (0.19% of total)

Quantization

BitsAndBytes Config:
  - Load in 4-bit: True
  - Quantization Type: nf4
  - Compute dtype: float16
  - Double Quantization: True

Total Parameters: 7,124,373,504
Trainable Parameters: 13,672,448 (0.1919%)
Memory Footprint: ~4.5 GB (4-bit quantized)

Training Details

Dataset

Total Samples: 10,000 symptom-diagnosis pairs

Split	Samples	Percentage
Train	8,000	80%
Validation	1,000	10%
Test	1,000	10%

Data Format:

{
  "text": "cough ,fever and difficulty breathing",
  "diagnosis": "pneumonia",
  "label": 5  # Mapped to class ID
}

Important: Symptoms follow specific formatting:

Space before comma: symptom1 ,symptom2
Use and before last symptom
Lowercase medical terminology
Example: "nausea ,vomiting ,diarrhea and abdominal cramps"

Training Hyperparameters

Parameter	Value
Training Regime	Supervised Fine-tuning
Epochs	10 (early stopped at epoch 4)
Batch Size (per device)	8
Gradient Accumulation Steps	4
Effective Batch Size	32
Learning Rate	2e-4
Learning Rate Scheduler	Linear
Warmup Steps	100
Weight Decay	0.01
Max Sequence Length	128
Optimizer	AdamW (8-bit paged)
Early Stopping Patience	3 epochs
FP16 Training	Enabled

Training Infrastructure

Hardware: NVIDIA A100 GPU
Training Time: 44 minutes (2,668 seconds)
Training Steps: 1,000 (out of planned 2,500)
Evaluation Strategy: Every 100 steps
Save Strategy: Best model based on accuracy

Training Progress

Step	Train Loss	Val Loss	Val Acc	Val F1
100	0.3509	0.3096	93.4%	93.3%
200	0.2489	0.3245	96.4%	96.4%
300	0.1496	0.1042	96.9%	96.9%
400	0.0973	0.0994	96.8%	96.8%
500	0.1595	0.1555	97.5%	97.5%
600	0.0907	0.0850	97.3%	97.3%
700	0.1784	0.0576	97.7%	97.7%
800	0.0646	0.0857	97.7%	97.7%
900	0.0559	0.1474	97.7%	97.7%
1000	0.0732	0.0958	96.7%	96.7%

Best Checkpoint: Step 700 (lowest validation loss: 0.0576)

How to Use

Installation

pip install transformers peft torch bitsandbytes accelerate

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel
import torch

# Model configuration
MODEL_NAME = "Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis"
BASE_MODEL = "BioMistral/BioMistral-7B"

# Quantization config (for efficient inference)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load base model
model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL,
    num_labels=10,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model.config.pad_token_id = tokenizer.pad_token_id
model.eval()

print("Model loaded successfully!")

Inference

# Diagnosis class mapping
DIAGNOSIS_CLASSES = {
    0: "acute bronchitis",
    1: "anxiety",
    2: "conjunctivitis due to allergy",
    3: "eczema",
    4: "infectious gastroenteritis",
    5: "pneumonia",
    6: "psoriasis",
    7: "spondylosis",
    8: "sprain or strain",
    9: "strep throat"
}

def predict_diagnosis(symptoms_text):
    """
    Predict diagnosis from symptoms
    
    Args:
        symptoms_text: str, formatted symptoms
                      e.g., "nausea ,vomiting ,diarrhea and fever"
    
    Returns:
        diagnosis: str, predicted diagnosis name
        confidence: float, prediction confidence (0-1)
    """
    # Tokenize
    inputs = tokenizer(
        symptoms_text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    )
    
    # Move to device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    
    # Get probabilities
    probabilities = torch.softmax(logits, dim=-1)
    confidence, predicted_class = torch.max(probabilities, dim=-1)
    
    # Map to diagnosis
    diagnosis = DIAGNOSIS_CLASSES[predicted_class.item()]
    confidence_score = confidence.item()
    
    return diagnosis, confidence_score

# Example usage
symptoms = "nausea ,vomiting ,diarrhea and abdominal cramps"
diagnosis, confidence = predict_diagnosis(symptoms)

print(f"Diagnosis: {diagnosis}")
print(f"Confidence: {confidence:.1%}")

Output:

Diagnosis: infectious gastroenteritis
Confidence: 100.0%

Batch Prediction

def batch_predict(symptoms_list):
    """Predict multiple symptom texts at once"""
    
    inputs = tokenizer(
        symptoms_list,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    )
    
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    
    probabilities = torch.softmax(logits, dim=-1)
    confidences, predicted_classes = torch.max(probabilities, dim=-1)
    
    results = []
    for pred_class, conf in zip(predicted_classes, confidences):
        results.append({
            'diagnosis': DIAGNOSIS_CLASSES[pred_class.item()],
            'confidence': conf.item()
        })
    
    return results

# Example
symptoms_batch = [
    "fever ,cough and difficulty breathing",
    "anxiety and nervousness ,rapid heartbeat and shortness of breath",
    "skin rash ,itching of skin and abnormal appearing skin"
]

results = batch_predict(symptoms_batch)
for i, result in enumerate(results):
    print(f"{i+1}. {result['diagnosis']} ({result['confidence']:.1%})")

Input Format Requirements

CRITICAL: The model expects symptoms in a specific format matching its training data.

✅ Correct Format

# Space BEFORE comma, 'and' before last symptom
"nausea ,vomiting ,diarrhea and abdominal cramps"
"cough ,fever and difficulty breathing"
"eye redness ,itchiness of eye and lacrimation"

❌ Incorrect Format

# No spaces before commas
"nausea, vomiting, diarrhea, fever"  # Will likely fail

# Missing 'and' before last symptom
"nausea ,vomiting ,diarrhea ,fever"  # Suboptimal

# Capitalized
"Nausea ,Vomiting ,Diarrhea and Fever"  # Wrong case

Format Rules

Spacing: Space before each comma (symptom1 ,symptom2)
Conjunction: Use and before the last symptom
Case: Lowercase text
Terminology: Medical terminology preferred
Punctuation: No period at the end

Limitations

Scope Limitations

Limited Conditions: Only 10 diagnoses (not comprehensive)
Symptom Format: Highly dependent on exact text formatting
No Severity: Cannot assess urgency or severity levels
Single Diagnosis: Returns only one diagnosis (no differential)
No Confidence Threshold: Always returns a prediction

Performance Limitations

Symptom Overlap: Lower accuracy on conditions with similar symptoms
- Eczema vs. Psoriasis (both skin conditions)
- Acute Bronchitis vs. Pneumonia (both respiratory)
Format Sensitivity: Performance drops with incorrectly formatted input
Training Distribution: Best performance on symptoms similar to training data
No Rare Conditions: Cannot identify conditions outside the 10 classes

Technical Limitations

Quantization Effects: 4-bit quantization may introduce minor accuracy variations
Context Window: Limited to 128 tokens (sufficient for symptom lists)
No Multi-label: Cannot predict multiple concurrent conditions
Fixed Vocabulary: Limited to medical terms seen during training

Bias and Ethical Considerations

Potential Biases

Training Data Bias: Reflects symptom descriptions in training corpus
Language Bias: English-only; may not generalize to other languages
Medical Terminology: May perform better on formal medical terms
Demographic Bias: Training data may not represent all populations equally

Ethical Use

Transparency: Always disclose AI-generated predictions
Human Oversight: Require medical professional review
Educational Context: Frame as learning tool, not diagnostic tool
No Harm: Do not use in ways that could harm patients
Privacy: Do not input actual patient data without proper safeguards

Environmental Impact

Training: ~44 minutes on 1× NVIDIA A100 GPU
Carbon Footprint: Estimated ~0.05 kg CO2eq (training only)
Inference: Efficient 4-bit quantization reduces deployment carbon cost

Citation

If you use this model in your research or applications, please cite:

@misc{chauhan2025biomistral_symptom_classifier,
  title={BioMistral-7B Symptom-to-Diagnosis Classifier},
  author={Sugandha Chauhan},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis}},
  note={Fine-tuned with QLoRA for medical symptom classification}
}

Acknowledgments

BioMistral Team: For the excellent biomedical language model
Hugging Face: For transformers library and model hosting
PEFT Team: For the efficient fine-tuning framework
Medical Dataset: Curated from publicly available resources

Model Card Authors

Sugandha Chauhan (@Sugandha-Chauhan)

Model Card Contact

For questions, issues, or feedback:

Open an issue in the Community tab
Hugging Face: @Sugandha-Chauhan

Last Updated: November 2025
Model Version: 1.0

Downloads last month: 8

Model tree for Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis

Base model

BioMistral/BioMistral-7B

Adapter

(22)

this model