Instructions to use Bapynshngain/SmolLM2-360M-Khasi-CPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Bapynshngain/SmolLM2-360M-Khasi-CPT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Bapynshngain/SmolLM2-360M-Khasi-CPT")
model = AutoModelForMultimodalLM.from_pretrained("Bapynshngain/SmolLM2-360M-Khasi-CPT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Bapynshngain/SmolLM2-360M-Khasi-CPT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bapynshngain/SmolLM2-360M-Khasi-CPT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Bapynshngain/SmolLM2-360M-Khasi-CPT

SGLang

How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Bapynshngain/SmolLM2-360M-Khasi-CPT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bapynshngain/SmolLM2-360M-Khasi-CPT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Bapynshngain/SmolLM2-360M-Khasi-CPT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bapynshngain/SmolLM2-360M-Khasi-CPT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Bapynshngain/SmolLM2-360M-Khasi-CPT with Docker Model Runner:
```
docker model run hf.co/Bapynshngain/SmolLM2-360M-Khasi-CPT
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SmolLM2-360M-Khasi-CPT (Phase 2)

Model Description

This model is a fine-tuned version of Bapynshngain/SmolLM2-360M-Khasi-Base on the Khasi monolingual dataset. It is a Continued Pre-Training (CPT) checkpoint of the SmolLM2-360M-Instruct model, specifically adapted for the Khasi language. It represents Phase 2 of a multi-stage training pipeline aimed at developing lightweight, highly efficient linguistic models for Meghalayan languages under the Tynrai AI initiative.

⚠️ CRITICAL WARNING: INTERMEDIATE CHECKPOINT ⚠️ This is not an instruction-following model or a translator. This is a foundational CPT model trained strictly on next-token prediction. It has acquired the Khasi vocabulary but has not yet undergone semantic alignment. If prompted, it will likely exhibit Token Collision (hallucinating in Romanized Hindi, Vietnamese, or English) because its nascent Khasi neural pathways are still competing with its massive pre-trained Latin-script latent space.

Do not use this model for production tasks. It is published for research tracking and as a base for Supervised Fine-Tuning (SFT).

Training Pipeline & Methodology

This model was adapted using a careful, non-destructive vocabulary injection method to prevent catastrophic forgetting of the base model's English and logical reasoning capabilities.

1. Tokenizer Surgery & Smart Initialization

Rather than completely replacing the base BPE tokenizer (which destroys pre-trained embeddings), we performed a vocabulary merge:

Extracted tokens from a custom 12K Unigram Khasi SentencePiece model (Bapynshngain/enkha-hybrid-tokenizer).
Filtered and injected 10,899 strictly new Khasi tokens into the SmolLM2 vocabulary.
Smart Initialization: The newly added embedding rows were not left randomized. Instead, they were initialized by averaging the weights of the existing English sub-words that previously comprised those Khasi words. This granted the new tokens immediate semantic weight.

2. Continued Pre-Training (CPT)

The resized model underwent standard Causal Language Modeling (CLM) to teach the new tokens syntactic relationships.

Khasi Data: ~740K monolingual Khasi sentences (Bapynshngain/Bapyn-Kha-News).
English Anchor Data: ~100K high-quality English documents from FineWeb-Edu (acting as ~15% of the mix to retain structural reasoning and prevent catastrophic forgetting).
Hardware: Trained via Hugging Face Trainer with bfloat16 precision and Cosine Learning Rate decay.

How to Use (Inference)

Because this is a base model, you must prompt it with the beginning of a Khasi sentence and allow it to autocomplete. Chat templates will not work correctly yet.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Bapynshngain/SmolLM2-360M-Khasi-CPT"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

prompt = "Ka nongbah jong ka Meghalaya ka long"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.2, # Keep temperature LOW (0.1 - 0.2) to prevent latent space bleed
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for Bapynshngain/SmolLM2-360M-Khasi-CPT

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

(147)

this model

Bapynshngain
/

SmolLM2-360M-Khasi-CPT