Text Generation
Transformers
Safetensors
PyTorch
English
nvidia
conversational
bkartal commited on
Commit
cf7f23a
·
verified ·
1 Parent(s): 43c0df9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -220,7 +220,7 @@ We follow the jinja chat template provided below. This template conditionally ad
220
 
221
  ## Training, Testing, and Evaluation Datasets
222
 
223
- The post-training corpus for Nemotron-H-8B-Reasoning-128K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from R1.
224
 
225
  **Data Collection for Training & Testing Datasets:** Hybrid: Automated, Human, Synthetic
226
 
 
220
 
221
  ## Training, Testing, and Evaluation Datasets
222
 
223
+ The post-training corpus for Nemotron-H-8B-Reasoning-128K consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1.
224
 
225
  **Data Collection for Training & Testing Datasets:** Hybrid: Automated, Human, Synthetic
226