Kirim-1-Math / MODEL_CARD.md

Kirim1

Create MODEL_CARD.md

d5d55c9 verified 14 days ago

preview code

raw

history blame contribute delete

9.95 kB

Model Card for Kirim-1-Math

Model Details

Model Description

Kirim-1-Math is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.

Developed by: Kirim AI Team
Model type: Causal Language Model (Decoder-only Transformer)
Language(s): Chinese, English
License: Apache 2.0
Base Model: Kirim-V1-base (expanded from 13B to 30B)
Specialization: Mathematical reasoning, theorem proving, symbolic computation

Model Capabilities

Mathematical Reasoning: Solve problems from elementary to olympiad level
Tool Calling: Execute calculator, symbolic solver, derivative, integration, and code execution
Step-by-Step Solutions: Show detailed work for problem-solving
LaTeX Output: Format mathematical expressions properly
Bilingual: Handle problems in both Chinese and English
Code Generation: Write and execute Python/SymPy code for numerical solutions

Model Sources

Repository: github.com/Kirim-ai/Kirim-1-Math
Paper: Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling
Demo: huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo
Base Model: Kirim-ai/Kirim-V1-base

Uses

Direct Use

The model can be used directly for:

Educational Tutoring: Explain mathematical concepts with step-by-step reasoning
Homework Assistance: Solve problems across all difficulty levels
Competition Preparation: Practice for AMC, AIME, IMO, Putnam
Research Assistance: Verify proofs and perform symbolic computations
Code-Assisted Problem Solving: Use numerical methods for complex calculations

Downstream Use

Fine-tuning possibilities:

Domain-specific mathematical applications (physics, engineering, finance)
Custom tool integration for specialized computations
Educational platforms with adaptive difficulty
Mathematical theorem proving systems

Out-of-Scope Use

The model should NOT be used for:

Academic dishonesty: Cheating on exams or assignments
Safety-critical systems: Without human verification (e.g., structural engineering calculations)
Financial advice: Trading or investment decisions without expert review
Medical calculations: Drug dosages or medical equipment calibration
Legal matters: Without professional mathematician/lawyer verification

Bias, Risks, and Limitations

Known Limitations

Technical Limitations:

Cannot process visual mathematics (diagrams, geometric figures)
May struggle with extremely novel mathematical concepts
Limited to training data through October 2024
Tool execution can fail for edge cases
Performance degrades on extremely complex graduate-level problems

Reasoning Limitations:

May make logical errors in complex proofs
Can hallucinate intermediate steps
Occasionally produces incorrect final answers
May not recognize when a problem has no solution

Computational Limitations:

Cannot perform arbitrarily large calculations without tools
Numerical precision limited by underlying libraries
May timeout on very long computations

Risks and Biases

Potential Risks:

Students may become over-reliant on AI assistance
Could generate plausible but incorrect mathematical reasoning
May perpetuate biases in mathematical education approaches
Tool execution could consume excessive computational resources

Mitigation Strategies:

Always verify critical results with human experts
Use temperature=0.1 for deterministic mathematical reasoning
Enable tool calling for numerical verification
Cross-check answers with multiple methods
Implement appropriate safeguards in educational settings

How to Get Started

Installation

pip install torch transformers accelerate sympy

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    trust_remote_code=True
)

# Solve a problem
messages = [
    {"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using the Inference Script

# Interactive mode
python inference_math.py --interactive

# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"

# With quantization
python inference_math.py --load_in_4bit --interactive

Training Details

Training Data

Mathematical Corpus (500B tokens):

Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
Textbooks: undergraduate to graduate level (75B tokens)
Q&A: Math StackExchange, MathOverflow (50B tokens)

Code Corpus (200B tokens):

Mathematical Python libraries (NumPy, SymPy, SciPy)
Computational notebooks from Kaggle, GitHub
Algorithm implementations

General Corpus (800B tokens):

From Kirim-V1-base pre-training

Total: 1.5 Trillion tokens

Training Procedure

Stage 1: Model Expansion (15 days)

Expanded from 13B to 30B parameters
Progressive width and depth scaling
Hidden size: 4096 → 5120
Layers: 32 → 48

Stage 2: Mathematical Pre-training (30 days)

500B tokens of mathematical content
Hardware: 512x NVIDIA H100 80GB
Batch size: 2048
Learning rate: 1.5e-4 with cosine decay
Optimization: AdamW, BF16 precision

Stage 3: Instruction Tuning (5 days)

200K mathematical instruction-response pairs
Balanced across algebra, calculus, geometry, etc.
Learning rate: 2e-5
3 epochs

Stage 4: Tool Calling Training (3 days)

50K tool-calling examples
Function definition and execution
Error handling and recovery

Stage 5: Reinforcement Learning (7 days)

PPO-based training
Reward based on solution correctness
Symbolic and numerical verification

Training Hyperparameters

Optimizer: AdamW
Learning rate: 1.5e-4 (pre-training), 2e-5 (fine-tuning)
Weight decay: 0.1
Warmup steps: 2000
Gradient clipping: 1.0
Precision: BFloat16
Total GPU hours: 30,720
Estimated cost: $450,000 USD

Compute Infrastructure

Pre-training: 512x NVIDIA H100 80GB GPUs
Fine-tuning: 128x NVIDIA H100 80GB GPUs
Framework: PyTorch 2.1, DeepSpeed ZeRO-3
Parallelism: Tensor (8-way), Pipeline (4-way), Data (16-way)

Evaluation

Mathematical Reasoning

Benchmark	Score	Comparison
GSM8K	94.2%	GPT-4: 92.0%
MATH	78.5%	GPT-4: 76.4%
MMLU-Math	88.7%	GPT-4: 86.9%
AMC10/12	72.3%	Human avg: 45%
AIME	38.7%	Human qualifier: 40%

Tool Calling

Metric	Score
Tool Selection	96.8%
Parameter Extraction	94.2%
Execution Success	92.5%
Result Integration	95.1%

Code Generation

Task	Pass@1	Pass@10
HumanEval-Math	78.3%	92.1%
SymPy Tasks	82.5%	94.7%
NumPy Tasks	75.6%	89.3%

Performance

Inference Speed: 45 tokens/second (A100 80GB)
Memory: 60GB (BF16), 30GB (INT8), 20GB (INT4)
Latency: 89ms mean, 145ms p95

Environmental Impact

Hardware: NVIDIA H100 GPUs
Training Time: 60 days (30,720 GPU hours)
Estimated CO₂: ~8,500 kg CO₂eq
Power Consumption: ~850 MWh

We are committed to reducing environmental impact through efficient training and model optimization.

Technical Specifications

Model Architecture

Parameter	Value
Parameters	30B
Hidden Size	5,120
Layers	48
Attention Heads	40
KV Heads	8 (GQA)
Intermediate Size	13,824
Vocabulary	102,400
Context Length	32,768
Position Encoding	RoPE with YaRN
Activation	SiLU
Normalization	RMSNorm

Special Features

Tool Calling: JSON-based function calling
Symbolic Solver: SymPy integration
Code Execution: Sandboxed Python runtime
LaTeX Formatting: Automatic equation formatting

Citation

@misc{kirim2025math,
  title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
  author={Qiling Research},
  year={2025},
  publisher={Kirim AI},
  url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}

Model Card Authors

Qiling Research

Ethical Considerations

Educational Impact

May affect traditional mathematics education
Could reduce development of mental math skills
Should be used as a learning aid, not replacement

Accessibility

Makes advanced mathematics more accessible
Could democratize STEM education
May widen gap if access is unequal

Verification

Always verify results for critical applications
Use multiple methods for important calculations
Maintain human oversight in education

Glossary

Tool Calling: Ability to invoke external functions for computation
Symbolic Solver: Algebraic manipulation system (SymPy)
GQA: Grouped Query Attention for efficiency
RoPE: Rotary Position Embedding
YaRN: Yet another RoPE extension method