XLM-RoBERTa for Nepali-English Bilingual Fake News Detection
This model is a fine-tuned version of XLM-RoBERTa-base optimized for detecting fake news in the bilingual (Nepali and English) media landscape. It specifically addresses challenges in low-resource NLP such as morphological complexity and code-switching.
Model Details
Model Description
- Developed by: Plan Ghimire and Pranjal Shrestha (Department of Electronics and Computer Engineering, IOE, Thapathali Campus, Tribhuvan University, Nepal)
- Model type: Transformer-based Text Classifier
- Language(s) (NLP): Nepali (Devanagari) and English
- License: MIT (or as specified by the authors)
- Finetuned from model:
xlm-roberta-base
Model Sources
- Paper: Bilingual fake-news detection in low-resource media: A Transformer-based framework for Nepali–English content
- Journal: JIEE 2025, Vol. 8, Issue 1.
Uses
Direct Use
This model is intended for the classification of news articles and social media posts into "Real" or "Fake." It is specifically trained to handle:
- Code-switched content (mixing Nepali and English).
- Agglutinative morphology of the Nepali language.
- Social media text from platforms like Facebook, X, and TikTok.
Out-of-Scope Use
The model should not be used as the sole arbiter of truth without human oversight, particularly in sensitive political contexts. It is not designed for languages other than Nepali and English.
Bias, Risks, and Limitations
Limitations
- Sequence Length: Optimized for a maximum sequence length of 128 tokens.
- Context: While the model achieves high accuracy, it may struggle with highly nuanced satire that mimics formal journalism perfectly without linguistic "red flags."
Recommendations
The authors recommend using SHAP (SHapley Additive exPlanations) alongside the model to visualize token-level contributions, ensuring that the classification is based on credible linguistic patterns rather than dataset artifacts.
How to Get Started with the Model
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
MODEL_NAME = "planghimire/nepali-english-fake-news-detector"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME).to(DEVICE)
model.eval()
def predict_news(text: str):
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=512,
padding=True
).to(DEVICE)
with torch.no_grad():
probs = F.softmax(model(**inputs).logits, dim=-1)[0]
fake_prob, real_prob = probs.tolist()
is_real = real_prob > fake_prob
print(
f"Prediction: {'REAL' if is_real else 'FAKE'} | "
f"Confidence: {max(real_prob, fake_prob):.2%} | "
f"Real: {real_prob:.3f} | Fake: {fake_prob:.3f}"
)
return {
"label": "Real" if is_real else "Fake",
"confidence": max(real_prob, fake_prob),
"real_prob": real_prob,
"fake_prob": fake_prob
}
# Test
text = "आर्थिकतामा पर्याप्त ध्यान नदिएको भन्दै आएका गुनासोलाई बेवास्ता गर्न खोज्दै ट्रम्पले डिसेम्बर ९ मा सभामा कडा आलोचना गरे।"
predict_news(text)
- Downloads last month
- 15