taide-eagle3-sglang / README.md
seanmamasde's picture
Update README.md
ad612c0 verified
metadata
license: other
license_name: taide-l-models-community-license-agreement
license_link: https://drive.google.com/file/d/1ICTxogjS9Bc2O3K1P9ZauQYVoruT13n5/view
extra_gated_heading: 您需要先同意授權條款才能使用此模型
extra_gated_fields:
  姓名(Name): text
  生日(Date of birth): date_picker
  國家(Country): country
  所屬單位(Affiliation): text
  geo: ip_location
  按下送出表示您同意社群授權同意書與個人資料蒐集告知聲明(By clicking Submit below I accept the terms of the license and privacy policy): checkbox
extra_gated_prompt: >-
  * ### [(Llama 版次)-TAIDE
  模型授權條款](https://drive.google.com/file/d/1ICTxogjS9Bc2O3K1P9ZauQYVoruT13n5/view)

  * ### [個人資料蒐集告知聲明(Privacy
  policy)](https://drive.google.com/file/d/1MfYktH3jBK61YVA1yBLruU7nZlKWFYGd/view)
extra_gated_button_content: 送出(Submit)

The license is inherited from the TAIDE Model.

This is an Eagle3 model for Llama-3.1-TAIDE-LX-8B-Chat, trained on custom sharegpt_gpt4 dataset, and for inferencing using sglang.

Following benchmark was ran with this benchmarking file and these settings:

  • A single H100 GPU

  • dtype: float16

  • attention-backend: flashinfer

  • mem-fraction-static: 0.8

  • max-total-tokens: 131072

  • cuda-graph-max-bs: 32

  • speculative decoding related:

    • speculative-algorithm: EAGLE3
    • speculative-num-steps: 3
    • speculative-eagle-topk: 24
    • speculative-num-draft-tokens: 128
  • num_prompts: 1

  • Lhs: Baseline

  • Rhs: Eagle3

  • Achieving around 1.56x bump in inferencing speed

image/jpeg