DuarteMRAlves commited on
Commit
1563148
·
verified ·
1 Parent(s): 5ac7876

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -51,6 +51,94 @@ This is the model card for EuroLLM-22B-Instruct. You can also check the pre-trai
51
  - **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
52
  - **License:** Apache License 2.0.
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## Model Details
55
 
56
  The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
 
51
  - **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
52
  - **License:** Apache License 2.0.
53
 
54
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
55
+ <details><summary>See axolotl config</summary>
56
+
57
+ axolotl version: `0.12.2`
58
+ ```yaml
59
+ auto_resume_from_checkpoints: true
60
+ use_tensorboard: true
61
+
62
+ base_model: utter-project/EuroLLM-2512
63
+ model_type: AutoModelForCausalLM
64
+ tokenizer_type: AutoTokenizer
65
+
66
+ load_in_8bit: false
67
+ load_in_4bit: false
68
+ strict: false
69
+
70
+ dataset_processes: 64
71
+ datasets:
72
+ - path: utter-project/EuroBlocks-SFT-2512
73
+ type: chat_template
74
+ split: train
75
+ conversation: chatml
76
+ field_messages: conversations
77
+ message_field_role: role
78
+ message_field_content: content
79
+ roles_to_train: ["assistant"]
80
+ train_on_eos: all
81
+
82
+
83
+ chat_template_jinja: "{% for message in messages %}{% if message['role'] == 'assistant' %}{% set role = 'assistant' %}{% else %}{% set role = message['role'] %}{% endif %}<|im_start|>{{ role }}\n{{ message['content'] | trim }}<|im_end|>\n{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n'}}{% endif %}"
84
+
85
+ output_dir: checkpoints
86
+ val_set_size: 0
87
+
88
+ sequence_len: 32768
89
+ sample_packing: true
90
+ pad_to_sequence_len: true
91
+
92
+ # sequence_parallel_degree: 4
93
+ # heads_k_stride: 1
94
+ # ring_attn_func:
95
+
96
+ plugins:
97
+ - axolotl.integrations.liger.LigerPlugin
98
+ liger_rope: true
99
+ liger_rms_norm: true
100
+ liger_glu_activation: true
101
+ liger_layer_norm: true
102
+ liger_fused_linear_cross_entropy: true
103
+
104
+ # N_GPUS * GRAD_ACC_STEPS * MICRO_BATCH_SIZE * SEQ_LEN = tokens/step ->
105
+ # Assuming 32 gpus (32 * 2 * 2 * 32k = 4 096 000 tokens/step)
106
+ gradient_accumulation_steps: 2
107
+ micro_batch_size: 2
108
+
109
+ eval_batch_size: 1
110
+ num_epochs: 5
111
+ optimizer: adamw_torch
112
+ lr_scheduler: cosine
113
+ learning_rate: 1e-5
114
+
115
+ train_on_inputs: false
116
+ group_by_length: false
117
+ bf16: true
118
+ fp16: false
119
+ tf32: false
120
+
121
+ gradient_checkpointing: true
122
+ logging_steps: 1
123
+ flash_attention: true
124
+ flash_attn_cross_entropy: false
125
+ flash_attn_rms_norm: false
126
+ flash_attn_fuse_qkv: false
127
+ flash_attn_fuse_mlp: false
128
+
129
+ warmup_steps: 125
130
+ eval_sample_packing: False
131
+ save_steps: 500
132
+ save_total_limit: 2
133
+ deepspeed: deepspeed_configs/zero3_bf16.json
134
+ weight_decay: 0.01
135
+
136
+ special_tokens:
137
+ eos_token: "<|im_end|>"
138
+
139
+ ```
140
+ </details><br>
141
+
142
  ## Model Details
143
 
144
  The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.