BioMistral-7B Symptom-to-Diagnosis Classifier

Fine-tuned BioMistral-7B for medical symptom classification using QLoRA

Model License Accuracy

Model Summary

This is a fine-tuned BioMistral-7B model for classifying medical symptoms into 10 common diagnoses. The model was trained using QLoRA (Quantized Low-Rank Adaptation) on a curated dataset of 10,000 symptom-diagnosis pairs, achieving 99.1% accuracy on the test set.

  • Base Model: BioMistral/BioMistral-7B
  • Task: Multi-class Text Classification (10 classes)
  • Fine-tuning Method: QLoRA with 4-bit quantization
  • Training Data: 8,000 samples (10 diagnosis classes)
  • Validation Data: 1,000 samples
  • Test Data: 1,000 samples
  • Model Type: Sequence Classification
  • Language: English
  • License: MIT

Intended Use

βœ… Appropriate Uses

  • Educational demonstrations of medical AI systems
  • Research in biomedical NLP and text classification
  • Experiments with medical symptom understanding
  • Teaching about AI in healthcare contexts
  • Baseline model for medical classification tasks

❌ Not Intended For

  • Clinical diagnosis or real medical decision-making
  • Emergency medical decisions
  • Treatment planning or recommendations
  • Any deployment in healthcare settings
  • Replacement of professional medical judgment

⚠️ Medical Disclaimer

This model is for educational and research purposes ONLY.

  • Outputs may be incorrect, incomplete, or biased
  • Does NOT replace professional medical advice
  • NOT validated for clinical use
  • NOT approved by any regulatory body

Supported Diagnoses (10 Classes)

Class ID Diagnosis Example Symptoms
0 Acute Bronchitis cough, chest pain, shortness of breath, mucus production
1 Anxiety anxiety and nervousness, rapid heartbeat, shortness of breath, panic attacks
2 Conjunctivitis due to Allergy eye redness, itchiness of eye, lacrimation, watery eyes
3 Eczema skin rash, skin dryness, itching of skin, abnormal appearing skin
4 Infectious Gastroenteritis nausea, vomiting, diarrhea, abdominal cramps
5 Pneumonia fever, cough, difficulty breathing, chest pain
6 Psoriasis abnormal appearing skin, skin lesion, skin rash
7 Spondylosis back pain, neck pain, neck stiffness, limited mobility
8 Sprain or Strain joint pain, swelling, bruising, limited movement
9 Strep Throat sore throat, fever, difficulty swallowing, swollen lymph nodes

Performance

Test Set Results (n=1,000)

Metric Score
Overall Accuracy 99.1%
Precision (weighted) 99.11%
Recall (weighted) 99.10%
F1-Score (weighted) 99.10%
Test Loss 0.0313

Per-Class Performance

Diagnosis Accuracy Precision Recall F1-Score Support
Acute Bronchitis 97.0% 97.98% 97.0% 97.49% 100
Anxiety 100.0% 100.0% 100.0% 100.0% 100
Conjunctivitis 100.0% 100.0% 100.0% 100.0% 100
Eczema 100.0% 100.0% 100.0% 100.0% 100
Gastroenteritis 100.0% 100.0% 100.0% 100.0% 100
Pneumonia 98.0% 96.08% 98.0% 97.03% 100
Psoriasis 100.0% 100.0% 100.0% 100.0% 100
Spondylosis 100.0% 97.09% 100.0% 98.52% 100
Sprain or Strain 97.0% 100.0% 97.0% 98.48% 100
Strep Throat 99.0% 100.0% 99.0% 99.50% 100

Error Analysis:

  • Total misclassifications: 9 out of 1,000 (0.9% error rate)
  • Main confusion: Acute Bronchitis ↔ Pneumonia (5 errors)
  • Minor confusion: Sprain/Strain ↔ Spondylosis (3 errors)

Validation Performance

Metric Score
Validation Accuracy 97.7%
Validation Loss 0.0576

Model Architecture

Base Model: BioMistral-7B

  • Parameters: 7 billion
  • Architecture: Mistral-based transformer optimized for biomedical text
  • Specialization: Pre-trained on biomedical literature

Fine-Tuning: QLoRA Configuration

LoRA Config:
  - Task Type: SEQ_CLS (Sequence Classification)
  - Rank (r): 16
  - Alpha: 32
  - Dropout: 0.1
  - Target Modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj']
  - Bias: none
  - Trainable Parameters: 13,672,448 (0.19% of total)

Quantization

BitsAndBytes Config:
  - Load in 4-bit: True
  - Quantization Type: nf4
  - Compute dtype: float16
  - Double Quantization: True

Total Parameters: 7,124,373,504
Trainable Parameters: 13,672,448 (0.1919%)
Memory Footprint: ~4.5 GB (4-bit quantized)

Training Details

Dataset

Total Samples: 10,000 symptom-diagnosis pairs

Split Samples Percentage
Train 8,000 80%
Validation 1,000 10%
Test 1,000 10%

Data Format:

{
  "text": "cough ,fever and difficulty breathing",
  "diagnosis": "pneumonia",
  "label": 5  # Mapped to class ID
}

Important: Symptoms follow specific formatting:

  • Space before comma: symptom1 ,symptom2
  • Use and before last symptom
  • Lowercase medical terminology
  • Example: "nausea ,vomiting ,diarrhea and abdominal cramps"

Training Hyperparameters

Parameter Value
Training Regime Supervised Fine-tuning
Epochs 10 (early stopped at epoch 4)
Batch Size (per device) 8
Gradient Accumulation Steps 4
Effective Batch Size 32
Learning Rate 2e-4
Learning Rate Scheduler Linear
Warmup Steps 100
Weight Decay 0.01
Max Sequence Length 128
Optimizer AdamW (8-bit paged)
Early Stopping Patience 3 epochs
FP16 Training Enabled

Training Infrastructure

  • Hardware: NVIDIA A100 GPU
  • Training Time: 44 minutes (2,668 seconds)
  • Training Steps: 1,000 (out of planned 2,500)
  • Evaluation Strategy: Every 100 steps
  • Save Strategy: Best model based on accuracy

Training Progress

Step Train Loss Val Loss Val Acc Val F1
100 0.3509 0.3096 93.4% 93.3%
200 0.2489 0.3245 96.4% 96.4%
300 0.1496 0.1042 96.9% 96.9%
400 0.0973 0.0994 96.8% 96.8%
500 0.1595 0.1555 97.5% 97.5%
600 0.0907 0.0850 97.3% 97.3%
700 0.1784 0.0576 97.7% 97.7%
800 0.0646 0.0857 97.7% 97.7%
900 0.0559 0.1474 97.7% 97.7%
1000 0.0732 0.0958 96.7% 96.7%

Best Checkpoint: Step 700 (lowest validation loss: 0.0576)

How to Use

Installation

pip install transformers peft torch bitsandbytes accelerate

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification, BitsAndBytesConfig
from peft import PeftModel
import torch

# Model configuration
MODEL_NAME = "Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis"
BASE_MODEL = "BioMistral/BioMistral-7B"

# Quantization config (for efficient inference)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load base model
model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL,
    num_labels=10,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16
)

# Load LoRA adapters
model = PeftModel.from_pretrained(model, MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model.config.pad_token_id = tokenizer.pad_token_id
model.eval()

print("Model loaded successfully!")

Inference

# Diagnosis class mapping
DIAGNOSIS_CLASSES = {
    0: "acute bronchitis",
    1: "anxiety",
    2: "conjunctivitis due to allergy",
    3: "eczema",
    4: "infectious gastroenteritis",
    5: "pneumonia",
    6: "psoriasis",
    7: "spondylosis",
    8: "sprain or strain",
    9: "strep throat"
}

def predict_diagnosis(symptoms_text):
    """
    Predict diagnosis from symptoms
    
    Args:
        symptoms_text: str, formatted symptoms
                      e.g., "nausea ,vomiting ,diarrhea and fever"
    
    Returns:
        diagnosis: str, predicted diagnosis name
        confidence: float, prediction confidence (0-1)
    """
    # Tokenize
    inputs = tokenizer(
        symptoms_text,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    )
    
    # Move to device
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    
    # Get probabilities
    probabilities = torch.softmax(logits, dim=-1)
    confidence, predicted_class = torch.max(probabilities, dim=-1)
    
    # Map to diagnosis
    diagnosis = DIAGNOSIS_CLASSES[predicted_class.item()]
    confidence_score = confidence.item()
    
    return diagnosis, confidence_score

# Example usage
symptoms = "nausea ,vomiting ,diarrhea and abdominal cramps"
diagnosis, confidence = predict_diagnosis(symptoms)

print(f"Diagnosis: {diagnosis}")
print(f"Confidence: {confidence:.1%}")

Output:

Diagnosis: infectious gastroenteritis
Confidence: 100.0%

Batch Prediction

def batch_predict(symptoms_list):
    """Predict multiple symptom texts at once"""
    
    inputs = tokenizer(
        symptoms_list,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    )
    
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
    
    probabilities = torch.softmax(logits, dim=-1)
    confidences, predicted_classes = torch.max(probabilities, dim=-1)
    
    results = []
    for pred_class, conf in zip(predicted_classes, confidences):
        results.append({
            'diagnosis': DIAGNOSIS_CLASSES[pred_class.item()],
            'confidence': conf.item()
        })
    
    return results

# Example
symptoms_batch = [
    "fever ,cough and difficulty breathing",
    "anxiety and nervousness ,rapid heartbeat and shortness of breath",
    "skin rash ,itching of skin and abnormal appearing skin"
]

results = batch_predict(symptoms_batch)
for i, result in enumerate(results):
    print(f"{i+1}. {result['diagnosis']} ({result['confidence']:.1%})")

Input Format Requirements

CRITICAL: The model expects symptoms in a specific format matching its training data.

βœ… Correct Format

# Space BEFORE comma, 'and' before last symptom
"nausea ,vomiting ,diarrhea and abdominal cramps"
"cough ,fever and difficulty breathing"
"eye redness ,itchiness of eye and lacrimation"

❌ Incorrect Format

# No spaces before commas
"nausea, vomiting, diarrhea, fever"  # Will likely fail

# Missing 'and' before last symptom
"nausea ,vomiting ,diarrhea ,fever"  # Suboptimal

# Capitalized
"Nausea ,Vomiting ,Diarrhea and Fever"  # Wrong case

Format Rules

  1. Spacing: Space before each comma (symptom1 ,symptom2)
  2. Conjunction: Use and before the last symptom
  3. Case: Lowercase text
  4. Terminology: Medical terminology preferred
  5. Punctuation: No period at the end

Limitations

Scope Limitations

  1. Limited Conditions: Only 10 diagnoses (not comprehensive)
  2. Symptom Format: Highly dependent on exact text formatting
  3. No Severity: Cannot assess urgency or severity levels
  4. Single Diagnosis: Returns only one diagnosis (no differential)
  5. No Confidence Threshold: Always returns a prediction

Performance Limitations

  1. Symptom Overlap: Lower accuracy on conditions with similar symptoms
    • Eczema vs. Psoriasis (both skin conditions)
    • Acute Bronchitis vs. Pneumonia (both respiratory)
  2. Format Sensitivity: Performance drops with incorrectly formatted input
  3. Training Distribution: Best performance on symptoms similar to training data
  4. No Rare Conditions: Cannot identify conditions outside the 10 classes

Technical Limitations

  1. Quantization Effects: 4-bit quantization may introduce minor accuracy variations
  2. Context Window: Limited to 128 tokens (sufficient for symptom lists)
  3. No Multi-label: Cannot predict multiple concurrent conditions
  4. Fixed Vocabulary: Limited to medical terms seen during training

Bias and Ethical Considerations

Potential Biases

  • Training Data Bias: Reflects symptom descriptions in training corpus
  • Language Bias: English-only; may not generalize to other languages
  • Medical Terminology: May perform better on formal medical terms
  • Demographic Bias: Training data may not represent all populations equally

Ethical Use

  • Transparency: Always disclose AI-generated predictions
  • Human Oversight: Require medical professional review
  • Educational Context: Frame as learning tool, not diagnostic tool
  • No Harm: Do not use in ways that could harm patients
  • Privacy: Do not input actual patient data without proper safeguards

Environmental Impact

  • Training: ~44 minutes on 1Γ— NVIDIA A100 GPU
  • Carbon Footprint: Estimated ~0.05 kg CO2eq (training only)
  • Inference: Efficient 4-bit quantization reduces deployment carbon cost

Citation

If you use this model in your research or applications, please cite:

@misc{chauhan2025biomistral_symptom_classifier,
  title={BioMistral-7B Symptom-to-Diagnosis Classifier},
  author={Sugandha Chauhan},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis}},
  note={Fine-tuned with QLoRA for medical symptom classification}
}

Acknowledgments

  • BioMistral Team: For the excellent biomedical language model
  • Hugging Face: For transformers library and model hosting
  • PEFT Team: For the efficient fine-tuning framework
  • Medical Dataset: Curated from publicly available resources

Model Card Authors

  • Sugandha Chauhan (@Sugandha-Chauhan)

Model Card Contact

For questions, issues, or feedback:


Last Updated: November 2025
Model Version: 1.0

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis

Adapter
(22)
this model

Space using Sugandha-Chauhan/BioMistral-7B-SymptomDiagnosis 1