Model Card for Kirim-1-Math
Model Details
Model Description
Kirim-1-Math is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.
- Developed by: Kirim AI Team
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): Chinese, English
- License: Apache 2.0
- Base Model: Kirim-V1-base (expanded from 13B to 30B)
- Specialization: Mathematical reasoning, theorem proving, symbolic computation
Model Capabilities
- Mathematical Reasoning: Solve problems from elementary to olympiad level
- Tool Calling: Execute calculator, symbolic solver, derivative, integration, and code execution
- Step-by-Step Solutions: Show detailed work for problem-solving
- LaTeX Output: Format mathematical expressions properly
- Bilingual: Handle problems in both Chinese and English
- Code Generation: Write and execute Python/SymPy code for numerical solutions
Model Sources
- Repository: github.com/Kirim-ai/Kirim-1-Math
- Paper: Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling
- Demo: huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo
- Base Model: Kirim-ai/Kirim-V1-base
Uses
Direct Use
The model can be used directly for:
- Educational Tutoring: Explain mathematical concepts with step-by-step reasoning
- Homework Assistance: Solve problems across all difficulty levels
- Competition Preparation: Practice for AMC, AIME, IMO, Putnam
- Research Assistance: Verify proofs and perform symbolic computations
- Code-Assisted Problem Solving: Use numerical methods for complex calculations
Downstream Use
Fine-tuning possibilities:
- Domain-specific mathematical applications (physics, engineering, finance)
- Custom tool integration for specialized computations
- Educational platforms with adaptive difficulty
- Mathematical theorem proving systems
Out-of-Scope Use
The model should NOT be used for:
- Academic dishonesty: Cheating on exams or assignments
- Safety-critical systems: Without human verification (e.g., structural engineering calculations)
- Financial advice: Trading or investment decisions without expert review
- Medical calculations: Drug dosages or medical equipment calibration
- Legal matters: Without professional mathematician/lawyer verification
Bias, Risks, and Limitations
Known Limitations
Technical Limitations:
- Cannot process visual mathematics (diagrams, geometric figures)
- May struggle with extremely novel mathematical concepts
- Limited to training data through October 2024
- Tool execution can fail for edge cases
- Performance degrades on extremely complex graduate-level problems
Reasoning Limitations:
- May make logical errors in complex proofs
- Can hallucinate intermediate steps
- Occasionally produces incorrect final answers
- May not recognize when a problem has no solution
Computational Limitations:
- Cannot perform arbitrarily large calculations without tools
- Numerical precision limited by underlying libraries
- May timeout on very long computations
Risks and Biases
Potential Risks:
- Students may become over-reliant on AI assistance
- Could generate plausible but incorrect mathematical reasoning
- May perpetuate biases in mathematical education approaches
- Tool execution could consume excessive computational resources
Mitigation Strategies:
- Always verify critical results with human experts
- Use temperature=0.1 for deterministic mathematical reasoning
- Enable tool calling for numerical verification
- Cross-check answers with multiple methods
- Implement appropriate safeguards in educational settings
How to Get Started
Installation
pip install torch transformers accelerate sympy
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained(
"Kirim-ai/Kirim-1-Math",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Kirim-ai/Kirim-1-Math",
trust_remote_code=True
)
# Solve a problem
messages = [
{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Using the Inference Script
# Interactive mode
python inference_math.py --interactive
# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"
# With quantization
python inference_math.py --load_in_4bit --interactive
Training Details
Training Data
Mathematical Corpus (500B tokens):
- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
- Textbooks: undergraduate to graduate level (75B tokens)
- Q&A: Math StackExchange, MathOverflow (50B tokens)
Code Corpus (200B tokens):
- Mathematical Python libraries (NumPy, SymPy, SciPy)
- Computational notebooks from Kaggle, GitHub
- Algorithm implementations
General Corpus (800B tokens):
- From Kirim-V1-base pre-training
Total: 1.5 Trillion tokens
Training Procedure
Stage 1: Model Expansion (15 days)
- Expanded from 13B to 30B parameters
- Progressive width and depth scaling
- Hidden size: 4096 → 5120
- Layers: 32 → 48
Stage 2: Mathematical Pre-training (30 days)
- 500B tokens of mathematical content
- Hardware: 512x NVIDIA H100 80GB
- Batch size: 2048
- Learning rate: 1.5e-4 with cosine decay
- Optimization: AdamW, BF16 precision
Stage 3: Instruction Tuning (5 days)
- 200K mathematical instruction-response pairs
- Balanced across algebra, calculus, geometry, etc.
- Learning rate: 2e-5
- 3 epochs
Stage 4: Tool Calling Training (3 days)
- 50K tool-calling examples
- Function definition and execution
- Error handling and recovery
Stage 5: Reinforcement Learning (7 days)
- PPO-based training
- Reward based on solution correctness
- Symbolic and numerical verification
Training Hyperparameters
- Optimizer: AdamW
- Learning rate: 1.5e-4 (pre-training), 2e-5 (fine-tuning)
- Weight decay: 0.1
- Warmup steps: 2000
- Gradient clipping: 1.0
- Precision: BFloat16
- Total GPU hours: 30,720
- Estimated cost: $450,000 USD
Compute Infrastructure
- Pre-training: 512x NVIDIA H100 80GB GPUs
- Fine-tuning: 128x NVIDIA H100 80GB GPUs
- Framework: PyTorch 2.1, DeepSpeed ZeRO-3
- Parallelism: Tensor (8-way), Pipeline (4-way), Data (16-way)
Evaluation
Mathematical Reasoning
| Benchmark | Score | Comparison |
|---|---|---|
| GSM8K | 94.2% | GPT-4: 92.0% |
| MATH | 78.5% | GPT-4: 76.4% |
| MMLU-Math | 88.7% | GPT-4: 86.9% |
| AMC10/12 | 72.3% | Human avg: 45% |
| AIME | 38.7% | Human qualifier: 40% |
Tool Calling
| Metric | Score |
|---|---|
| Tool Selection | 96.8% |
| Parameter Extraction | 94.2% |
| Execution Success | 92.5% |
| Result Integration | 95.1% |
Code Generation
| Task | Pass@1 | Pass@10 |
|---|---|---|
| HumanEval-Math | 78.3% | 92.1% |
| SymPy Tasks | 82.5% | 94.7% |
| NumPy Tasks | 75.6% | 89.3% |
Performance
- Inference Speed: 45 tokens/second (A100 80GB)
- Memory: 60GB (BF16), 30GB (INT8), 20GB (INT4)
- Latency: 89ms mean, 145ms p95
Environmental Impact
- Hardware: NVIDIA H100 GPUs
- Training Time: 60 days (30,720 GPU hours)
- Estimated CO₂: ~8,500 kg CO₂eq
- Power Consumption: ~850 MWh
We are committed to reducing environmental impact through efficient training and model optimization.
Technical Specifications
Model Architecture
| Parameter | Value |
|---|---|
| Parameters | 30B |
| Hidden Size | 5,120 |
| Layers | 48 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 13,824 |
| Vocabulary | 102,400 |
| Context Length | 32,768 |
| Position Encoding | RoPE with YaRN |
| Activation | SiLU |
| Normalization | RMSNorm |
Special Features
- Tool Calling: JSON-based function calling
- Symbolic Solver: SymPy integration
- Code Execution: Sandboxed Python runtime
- LaTeX Formatting: Automatic equation formatting
Citation
@misc{kirim2025math,
title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
author={Qiling Research},
year={2025},
publisher={Kirim AI},
url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}
Model Card Authors
Qiling Research
Ethical Considerations
Educational Impact
- May affect traditional mathematics education
- Could reduce development of mental math skills
- Should be used as a learning aid, not replacement
Accessibility
- Makes advanced mathematics more accessible
- Could democratize STEM education
- May widen gap if access is unequal
Verification
- Always verify results for critical applications
- Use multiple methods for important calculations
- Maintain human oversight in education
Glossary
- Tool Calling: Ability to invoke external functions for computation
- Symbolic Solver: Algebraic manipulation system (SymPy)
- GQA: Grouped Query Attention for efficiency
- RoPE: Rotary Position Embedding
- YaRN: Yet another RoPE extension method