RL adapter (LoRA, 4 generators) — best model

GRPO LoRA adapter (r=32, α=64) trained on top of the CPT (4 generators) model, using the HealthBench-BR train split as reward. This is the best configuration in the paper, outperforming GPT-5.2, Claude Sonnet 4.6, Gemini 3.1 Pro and Google AI Overview on both benchmarks.

Base model: hugo/protocolos-clinicos-br-cpt-4gen-14b
Type: LoRA adapter (PEFT)

Test-split accuracy

Benchmark	Accuracy
HealthBench-BR	83.9%
PCDT-QA	85.4%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("hugo/protocolos-clinicos-br-cpt-4gen-14b", torch_dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained("hugo/protocolos-clinicos-br-rl-4gen-14b")
model = PeftModel.from_pretrained(base, "hugo/protocolos-clinicos-br-rl-4gen-14b")

Intended use & limitations

Research model for studying domain adaptation of LLMs to Brazilian clinical guidelines. Not a certified medical device. Even at the best accuracy reported in the paper, residual errors may involve consequential details (dosages, contraindications). Use only under qualified professional supervision.

Citation

See the paper and code at the project repository:

Code & paper: https://github.com/hugoabonizio/clinical-protocols-br

Downloads last month: 16

Model tree for hugo/protocolos-clinicos-br-rl-4gen-14b

Base model

hugo/protocolos-clinicos-br-cpt-4gen-14b

Adapter

(1)

this model

Collection including hugo/protocolos-clinicos-br-rl-4gen-14b

Protocolos Clínicos BR

Collection

Adapting Qwen2.5-14B to Brazilian SUS clinical guidelines: 2 benchmarks, the synthetic corpus, and 8 model checkpoints from the paper's ablations. • 11 items • Updated May 4