Update README.md

ad612c0 verified 8 months ago

2 kB

license: other
license_name: taide-l-models-community-license-agreement
license_link: https://drive.google.com/file/d/1ICTxogjS9Bc2O3K1P9ZauQYVoruT13n5/view
extra_gated_heading: 您需要先同意授權條款才能使用此模型
extra_gated_fields:
  姓名(Name): text
  生日(Date of birth): date_picker
  國家(Country): country
  所屬單位(Affiliation): text
  geo: ip_location
  按下送出表示您同意社群授權同意書與個人資料蒐集告知聲明(By clicking Submit below I accept the terms of the license and privacy policy): checkbox
extra_gated_prompt: >-
  * ### [（Llama 版次）-TAIDE
  模型授權條款](https://drive.google.com/file/d/1ICTxogjS9Bc2O3K1P9ZauQYVoruT13n5/view)

  * ### [個人資料蒐集告知聲明(Privacy
  policy)](https://drive.google.com/file/d/1MfYktH3jBK61YVA1yBLruU7nZlKWFYGd/view)
extra_gated_button_content: 送出(Submit)

The license is inherited from the TAIDE Model.

This is an Eagle3 model for Llama-3.1-TAIDE-LX-8B-Chat, trained on custom sharegpt_gpt4 dataset, and for inferencing using sglang.

Following benchmark was ran with this benchmarking file and these settings:

A single H100 GPU
dtype: float16
attention-backend: flashinfer
mem-fraction-static: 0.8
max-total-tokens: 131072
cuda-graph-max-bs: 32
speculative decoding related:
- speculative-algorithm: EAGLE3
- speculative-num-steps: 3
- speculative-eagle-topk: 24
- speculative-num-draft-tokens: 128
num_prompts: 1
Lhs: Baseline
Rhs: Eagle3
Achieving around 1.56x bump in inferencing speed