Instructions to use Bapynshngain/SmolLM2-360M-Khasi-CPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Bapynshngain/SmolLM2-360M-Khasi-CPT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Bapynshngain/SmolLM2-360M-Khasi-CPT") model = AutoModelForMultimodalLM.from_pretrained("Bapynshngain/SmolLM2-360M-Khasi-CPT") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Bapynshngain/SmolLM2-360M-Khasi-CPT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Bapynshngain/SmolLM2-360M-Khasi-CPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Bapynshngain/SmolLM2-360M-Khasi-CPT
- SGLang
How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Bapynshngain/SmolLM2-360M-Khasi-CPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Bapynshngain/SmolLM2-360M-Khasi-CPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Bapynshngain/SmolLM2-360M-Khasi-CPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Bapynshngain/SmolLM2-360M-Khasi-CPT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with Docker Model Runner:
docker model run hf.co/Bapynshngain/SmolLM2-360M-Khasi-CPT
SmolLM2-360M-Khasi-CPT (Phase 2)
Model Description
This model is a fine-tuned version of Bapynshngain/SmolLM2-360M-Khasi-Base on the Khasi monolingual dataset. It is a Continued Pre-Training (CPT) checkpoint of the SmolLM2-360M-Instruct model, specifically adapted for the Khasi language. It represents Phase 2 of a multi-stage training pipeline aimed at developing lightweight, highly efficient linguistic models for Meghalayan languages under the Tynrai AI initiative.
⚠️ CRITICAL WARNING: INTERMEDIATE CHECKPOINT ⚠️ This is not an instruction-following model or a translator. This is a foundational CPT model trained strictly on next-token prediction. It has acquired the Khasi vocabulary but has not yet undergone semantic alignment. If prompted, it will likely exhibit Token Collision (hallucinating in Romanized Hindi, Vietnamese, or English) because its nascent Khasi neural pathways are still competing with its massive pre-trained Latin-script latent space.
Do not use this model for production tasks. It is published for research tracking and as a base for Supervised Fine-Tuning (SFT).
Training Pipeline & Methodology
This model was adapted using a careful, non-destructive vocabulary injection method to prevent catastrophic forgetting of the base model's English and logical reasoning capabilities.
1. Tokenizer Surgery & Smart Initialization
Rather than completely replacing the base BPE tokenizer (which destroys pre-trained embeddings), we performed a vocabulary merge:
- Extracted tokens from a custom 12K Unigram Khasi SentencePiece model (
Bapynshngain/enkha-hybrid-tokenizer). - Filtered and injected 10,899 strictly new Khasi tokens into the SmolLM2 vocabulary.
- Smart Initialization: The newly added embedding rows were not left randomized. Instead, they were initialized by averaging the weights of the existing English sub-words that previously comprised those Khasi words. This granted the new tokens immediate semantic weight.
2. Continued Pre-Training (CPT)
The resized model underwent standard Causal Language Modeling (CLM) to teach the new tokens syntactic relationships.
- Khasi Data: ~740K monolingual Khasi sentences (
Bapynshngain/Bapyn-Kha-News). - English Anchor Data: ~100K high-quality English documents from FineWeb-Edu (acting as ~15% of the mix to retain structural reasoning and prevent catastrophic forgetting).
- Hardware: Trained via Hugging Face
Trainerwithbfloat16precision and Cosine Learning Rate decay.
How to Use (Inference)
Because this is a base model, you must prompt it with the beginning of a Khasi sentence and allow it to autocomplete. Chat templates will not work correctly yet.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Bapynshngain/SmolLM2-360M-Khasi-CPT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Ka nongbah jong ka Meghalaya ka long"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.2, # Keep temperature LOW (0.1 - 0.2) to prevent latent space bleed
top_p=0.9,
do_sample=True,
repetition_penalty=1.05,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- -
Model tree for Bapynshngain/SmolLM2-360M-Khasi-CPT
Base model
HuggingFaceTB/SmolLM2-360M