Kirim-1-Math / MODEL_CARD.md
Kirim1's picture
Create MODEL_CARD.md
d5d55c9 verified

Model Card for Kirim-1-Math

Model Details

Model Description

Kirim-1-Math is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.

  • Developed by: Kirim AI Team
  • Model type: Causal Language Model (Decoder-only Transformer)
  • Language(s): Chinese, English
  • License: Apache 2.0
  • Base Model: Kirim-V1-base (expanded from 13B to 30B)
  • Specialization: Mathematical reasoning, theorem proving, symbolic computation

Model Capabilities

  • Mathematical Reasoning: Solve problems from elementary to olympiad level
  • Tool Calling: Execute calculator, symbolic solver, derivative, integration, and code execution
  • Step-by-Step Solutions: Show detailed work for problem-solving
  • LaTeX Output: Format mathematical expressions properly
  • Bilingual: Handle problems in both Chinese and English
  • Code Generation: Write and execute Python/SymPy code for numerical solutions

Model Sources

Uses

Direct Use

The model can be used directly for:

  • Educational Tutoring: Explain mathematical concepts with step-by-step reasoning
  • Homework Assistance: Solve problems across all difficulty levels
  • Competition Preparation: Practice for AMC, AIME, IMO, Putnam
  • Research Assistance: Verify proofs and perform symbolic computations
  • Code-Assisted Problem Solving: Use numerical methods for complex calculations

Downstream Use

Fine-tuning possibilities:

  • Domain-specific mathematical applications (physics, engineering, finance)
  • Custom tool integration for specialized computations
  • Educational platforms with adaptive difficulty
  • Mathematical theorem proving systems

Out-of-Scope Use

The model should NOT be used for:

  • Academic dishonesty: Cheating on exams or assignments
  • Safety-critical systems: Without human verification (e.g., structural engineering calculations)
  • Financial advice: Trading or investment decisions without expert review
  • Medical calculations: Drug dosages or medical equipment calibration
  • Legal matters: Without professional mathematician/lawyer verification

Bias, Risks, and Limitations

Known Limitations

Technical Limitations:

  • Cannot process visual mathematics (diagrams, geometric figures)
  • May struggle with extremely novel mathematical concepts
  • Limited to training data through October 2024
  • Tool execution can fail for edge cases
  • Performance degrades on extremely complex graduate-level problems

Reasoning Limitations:

  • May make logical errors in complex proofs
  • Can hallucinate intermediate steps
  • Occasionally produces incorrect final answers
  • May not recognize when a problem has no solution

Computational Limitations:

  • Cannot perform arbitrarily large calculations without tools
  • Numerical precision limited by underlying libraries
  • May timeout on very long computations

Risks and Biases

Potential Risks:

  • Students may become over-reliant on AI assistance
  • Could generate plausible but incorrect mathematical reasoning
  • May perpetuate biases in mathematical education approaches
  • Tool execution could consume excessive computational resources

Mitigation Strategies:

  • Always verify critical results with human experts
  • Use temperature=0.1 for deterministic mathematical reasoning
  • Enable tool calling for numerical verification
  • Cross-check answers with multiple methods
  • Implement appropriate safeguards in educational settings

How to Get Started

Installation

pip install torch transformers accelerate sympy

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    trust_remote_code=True
)

# Solve a problem
messages = [
    {"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using the Inference Script

# Interactive mode
python inference_math.py --interactive

# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"

# With quantization
python inference_math.py --load_in_4bit --interactive

Training Details

Training Data

Mathematical Corpus (500B tokens):

  • Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
  • Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
  • arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
  • Textbooks: undergraduate to graduate level (75B tokens)
  • Q&A: Math StackExchange, MathOverflow (50B tokens)

Code Corpus (200B tokens):

  • Mathematical Python libraries (NumPy, SymPy, SciPy)
  • Computational notebooks from Kaggle, GitHub
  • Algorithm implementations

General Corpus (800B tokens):

  • From Kirim-V1-base pre-training

Total: 1.5 Trillion tokens

Training Procedure

Stage 1: Model Expansion (15 days)

  • Expanded from 13B to 30B parameters
  • Progressive width and depth scaling
  • Hidden size: 4096 → 5120
  • Layers: 32 → 48

Stage 2: Mathematical Pre-training (30 days)

  • 500B tokens of mathematical content
  • Hardware: 512x NVIDIA H100 80GB
  • Batch size: 2048
  • Learning rate: 1.5e-4 with cosine decay
  • Optimization: AdamW, BF16 precision

Stage 3: Instruction Tuning (5 days)

  • 200K mathematical instruction-response pairs
  • Balanced across algebra, calculus, geometry, etc.
  • Learning rate: 2e-5
  • 3 epochs

Stage 4: Tool Calling Training (3 days)

  • 50K tool-calling examples
  • Function definition and execution
  • Error handling and recovery

Stage 5: Reinforcement Learning (7 days)

  • PPO-based training
  • Reward based on solution correctness
  • Symbolic and numerical verification

Training Hyperparameters

  • Optimizer: AdamW
  • Learning rate: 1.5e-4 (pre-training), 2e-5 (fine-tuning)
  • Weight decay: 0.1
  • Warmup steps: 2000
  • Gradient clipping: 1.0
  • Precision: BFloat16
  • Total GPU hours: 30,720
  • Estimated cost: $450,000 USD

Compute Infrastructure

  • Pre-training: 512x NVIDIA H100 80GB GPUs
  • Fine-tuning: 128x NVIDIA H100 80GB GPUs
  • Framework: PyTorch 2.1, DeepSpeed ZeRO-3
  • Parallelism: Tensor (8-way), Pipeline (4-way), Data (16-way)

Evaluation

Mathematical Reasoning

Benchmark Score Comparison
GSM8K 94.2% GPT-4: 92.0%
MATH 78.5% GPT-4: 76.4%
MMLU-Math 88.7% GPT-4: 86.9%
AMC10/12 72.3% Human avg: 45%
AIME 38.7% Human qualifier: 40%

Tool Calling

Metric Score
Tool Selection 96.8%
Parameter Extraction 94.2%
Execution Success 92.5%
Result Integration 95.1%

Code Generation

Task Pass@1 Pass@10
HumanEval-Math 78.3% 92.1%
SymPy Tasks 82.5% 94.7%
NumPy Tasks 75.6% 89.3%

Performance

  • Inference Speed: 45 tokens/second (A100 80GB)
  • Memory: 60GB (BF16), 30GB (INT8), 20GB (INT4)
  • Latency: 89ms mean, 145ms p95

Environmental Impact

  • Hardware: NVIDIA H100 GPUs
  • Training Time: 60 days (30,720 GPU hours)
  • Estimated CO₂: ~8,500 kg CO₂eq
  • Power Consumption: ~850 MWh

We are committed to reducing environmental impact through efficient training and model optimization.

Technical Specifications

Model Architecture

Parameter Value
Parameters 30B
Hidden Size 5,120
Layers 48
Attention Heads 40
KV Heads 8 (GQA)
Intermediate Size 13,824
Vocabulary 102,400
Context Length 32,768
Position Encoding RoPE with YaRN
Activation SiLU
Normalization RMSNorm

Special Features

  • Tool Calling: JSON-based function calling
  • Symbolic Solver: SymPy integration
  • Code Execution: Sandboxed Python runtime
  • LaTeX Formatting: Automatic equation formatting

Citation

@misc{kirim2025math,
  title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
  author={Qiling Research},
  year={2025},
  publisher={Kirim AI},
  url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}

Model Card Authors

Qiling Research

Ethical Considerations

Educational Impact

  • May affect traditional mathematics education
  • Could reduce development of mental math skills
  • Should be used as a learning aid, not replacement

Accessibility

  • Makes advanced mathematics more accessible
  • Could democratize STEM education
  • May widen gap if access is unequal

Verification

  • Always verify results for critical applications
  • Use multiple methods for important calculations
  • Maintain human oversight in education

Glossary

  • Tool Calling: Ability to invoke external functions for computation
  • Symbolic Solver: Algebraic manipulation system (SymPy)
  • GQA: Grouped Query Attention for efficiency
  • RoPE: Rotary Position Embedding
  • YaRN: Yet another RoPE extension method