# Model Card for Kirim-1-Math ## Model Details ### Model Description **Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions. - **Developed by:** Kirim AI Team - **Model type:** Causal Language Model (Decoder-only Transformer) - **Language(s):** Chinese, English - **License:** Apache 2.0 - **Base Model:** Kirim-V1-base (expanded from 13B to 30B) - **Specialization:** Mathematical reasoning, theorem proving, symbolic computation ### Model Capabilities - **Mathematical Reasoning**: Solve problems from elementary to olympiad level - **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution - **Step-by-Step Solutions**: Show detailed work for problem-solving - **LaTeX Output**: Format mathematical expressions properly - **Bilingual**: Handle problems in both Chinese and English - **Code Generation**: Write and execute Python/SymPy code for numerical solutions ## Model Sources - **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math) - **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers) - **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo) - **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base) ## Uses ### Direct Use The model can be used directly for: - **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning - **Homework Assistance**: Solve problems across all difficulty levels - **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam - **Research Assistance**: Verify proofs and perform symbolic computations - **Code-Assisted Problem Solving**: Use numerical methods for complex calculations ### Downstream Use Fine-tuning possibilities: - Domain-specific mathematical applications (physics, engineering, finance) - Custom tool integration for specialized computations - Educational platforms with adaptive difficulty - Mathematical theorem proving systems ### Out-of-Scope Use The model should NOT be used for: - **Academic dishonesty**: Cheating on exams or assignments - **Safety-critical systems**: Without human verification (e.g., structural engineering calculations) - **Financial advice**: Trading or investment decisions without expert review - **Medical calculations**: Drug dosages or medical equipment calibration - **Legal matters**: Without professional mathematician/lawyer verification ## Bias, Risks, and Limitations ### Known Limitations **Technical Limitations:** - Cannot process visual mathematics (diagrams, geometric figures) - May struggle with extremely novel mathematical concepts - Limited to training data through October 2024 - Tool execution can fail for edge cases - Performance degrades on extremely complex graduate-level problems **Reasoning Limitations:** - May make logical errors in complex proofs - Can hallucinate intermediate steps - Occasionally produces incorrect final answers - May not recognize when a problem has no solution **Computational Limitations:** - Cannot perform arbitrarily large calculations without tools - Numerical precision limited by underlying libraries - May timeout on very long computations ### Risks and Biases **Potential Risks:** - Students may become over-reliant on AI assistance - Could generate plausible but incorrect mathematical reasoning - May perpetuate biases in mathematical education approaches - Tool execution could consume excessive computational resources **Mitigation Strategies:** - Always verify critical results with human experts - Use temperature=0.1 for deterministic mathematical reasoning - Enable tool calling for numerical verification - Cross-check answers with multiple methods - Implement appropriate safeguards in educational settings ## How to Get Started ### Installation ```bash pip install torch transformers accelerate sympy ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load model model = AutoModelForCausalLM.from_pretrained( "Kirim-ai/Kirim-1-Math", torch_dtype="auto", device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "Kirim-ai/Kirim-1-Math", trust_remote_code=True ) # Solve a problem messages = [ {"role": "user", "content": "Solve: x² - 5x + 6 = 0"} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt") outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Using the Inference Script ```bash # Interactive mode python inference_math.py --interactive # Single problem python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2" # With quantization python inference_math.py --load_in_4bit --interactive ``` ## Training Details ### Training Data **Mathematical Corpus (500B tokens):** - Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens) - Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens) - arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens) - Textbooks: undergraduate to graduate level (75B tokens) - Q&A: Math StackExchange, MathOverflow (50B tokens) **Code Corpus (200B tokens):** - Mathematical Python libraries (NumPy, SymPy, SciPy) - Computational notebooks from Kaggle, GitHub - Algorithm implementations **General Corpus (800B tokens):** - From Kirim-V1-base pre-training **Total: 1.5 Trillion tokens** ### Training Procedure #### Stage 1: Model Expansion (15 days) - Expanded from 13B to 30B parameters - Progressive width and depth scaling - Hidden size: 4096 → 5120 - Layers: 32 → 48 #### Stage 2: Mathematical Pre-training (30 days) - 500B tokens of mathematical content - Hardware: 512x NVIDIA H100 80GB - Batch size: 2048 - Learning rate: 1.5e-4 with cosine decay - Optimization: AdamW, BF16 precision #### Stage 3: Instruction Tuning (5 days) - 200K mathematical instruction-response pairs - Balanced across algebra, calculus, geometry, etc. - Learning rate: 2e-5 - 3 epochs #### Stage 4: Tool Calling Training (3 days) - 50K tool-calling examples - Function definition and execution - Error handling and recovery #### Stage 5: Reinforcement Learning (7 days) - PPO-based training - Reward based on solution correctness - Symbolic and numerical verification #### Training Hyperparameters - **Optimizer:** AdamW - **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning) - **Weight decay:** 0.1 - **Warmup steps:** 2000 - **Gradient clipping:** 1.0 - **Precision:** BFloat16 - **Total GPU hours:** 30,720 - **Estimated cost:** $450,000 USD ### Compute Infrastructure - **Pre-training:** 512x NVIDIA H100 80GB GPUs - **Fine-tuning:** 128x NVIDIA H100 80GB GPUs - **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3 - **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way) ## Evaluation ### Mathematical Reasoning | Benchmark | Score | Comparison | |-----------|-------|------------| | GSM8K | 94.2% | GPT-4: 92.0% | | MATH | 78.5% | GPT-4: 76.4% | | MMLU-Math | 88.7% | GPT-4: 86.9% | | AMC10/12 | 72.3% | Human avg: 45% | | AIME | 38.7% | Human qualifier: 40% | ### Tool Calling | Metric | Score | |--------|-------| | Tool Selection | 96.8% | | Parameter Extraction | 94.2% | | Execution Success | 92.5% | | Result Integration | 95.1% | ### Code Generation | Task | Pass@1 | Pass@10 | |------|--------|---------| | HumanEval-Math | 78.3% | 92.1% | | SymPy Tasks | 82.5% | 94.7% | | NumPy Tasks | 75.6% | 89.3% | ### Performance - **Inference Speed:** 45 tokens/second (A100 80GB) - **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4) - **Latency:** 89ms mean, 145ms p95 ## Environmental Impact - **Hardware:** NVIDIA H100 GPUs - **Training Time:** 60 days (30,720 GPU hours) - **Estimated CO₂:** ~8,500 kg CO₂eq - **Power Consumption:** ~850 MWh We are committed to reducing environmental impact through efficient training and model optimization. ## Technical Specifications ### Model Architecture | Parameter | Value | |-----------|-------| | Parameters | 30B | | Hidden Size | 5,120 | | Layers | 48 | | Attention Heads | 40 | | KV Heads | 8 (GQA) | | Intermediate Size | 13,824 | | Vocabulary | 102,400 | | Context Length | 32,768 | | Position Encoding | RoPE with YaRN | | Activation | SiLU | | Normalization | RMSNorm | ### Special Features - **Tool Calling:** JSON-based function calling - **Symbolic Solver:** SymPy integration - **Code Execution:** Sandboxed Python runtime - **LaTeX Formatting:** Automatic equation formatting ## Citation ```bibtex @misc{kirim2025math, title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling}, author={Qiling Research}, year={2025}, publisher={Kirim AI}, url={https://huggingface.co/Kirim-ai/Kirim-1-Math} } ``` ## Model Card Authors Qiling Research ## Ethical Considerations ### Educational Impact - May affect traditional mathematics education - Could reduce development of mental math skills - Should be used as a learning aid, not replacement ### Accessibility - Makes advanced mathematics more accessible - Could democratize STEM education - May widen gap if access is unequal ### Verification - Always verify results for critical applications - Use multiple methods for important calculations - Maintain human oversight in education ## Glossary - **Tool Calling:** Ability to invoke external functions for computation - **Symbolic Solver:** Algebraic manipulation system (SymPy) - **GQA:** Grouped Query Attention for efficiency - **RoPE:** Rotary Position Embedding - **YaRN:** Yet another RoPE extension method