You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

xml-RoBERTa based model with CRF layer for location and address span extraction.
Trained on weakly labeled dataset (Part of Dataset 10 of Epstein files). Geographical entities in dataset labeled with qwen3 70b + BIO-tags added automatically.
Still needs tests in the wild.

Trained until: Epoch 6 | Loss: 62.9988 (CFR-loss) | Token F1: 0.8452 | Binary F1: 0.8170 | Token Acc: 0.9842 | Span Acc: 0.6357 | Partial: 0.7419
Token F1 - based on token matching
Binary F1 - shows performance of Geo Entety extraction only
Token Acc - based on token matching
Span Acc - based on span (whole geo entity) matching
partial - based on span (whole geo entity) matching (at least 50% correct overlap)

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support