Commits · Xenobd/whisper.cpp

metal : improve FA + improve MoE (llama/12612)

04a3389

ggerganov commited on Mar 28, 2025

files : remove old wkv6 (#0)

ee92ae5

ggerganov commited on Mar 27, 2025

HIP: Add support for RDNA4 targets (llama/12372)

a73f01f

Slobodan Josic commited on Mar 26, 2025

CUDA: Fix clang warnings (llama/12540)

efa6dac

R0CKSTAR commited on Mar 24, 2025

musa: refine compute capability (llama/12493)

5e508d2

R0CKSTAR commited on Mar 22, 2025

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)

3a7ca19

Gaurav Garg

JohannesGaessler commited on Mar 19, 2025

musa: override warp_size of musa device to 32 (llama/12445)

184c152

R0CKSTAR commited on Mar 18, 2025

llama: Add support for RWKV v7 architecture (llama/12412)

727de7e

mollysama commited on Mar 17, 2025

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

1e69b8c

Gaurav Garg commited on Mar 17, 2025

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

2adc060

uvos commited on Mar 12, 2025

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)

1f75790

uvos

JohannesGaessler commited on Mar 11, 2025

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

4dc8a81

JohannesGaessler commited on Mar 6, 2025

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

18afa4b

uvos commited on Mar 6, 2025

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

a027c1d

David Huang commited on Mar 3, 2025

CUDA: compress mode option and default to size (llama/12029)

4ec988a

Erik Scholz commited on Mar 1, 2025

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

d6b6852

William Tambellini slaren commited on Feb 28, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

0b52fcc

JohannesGaessler commited on Feb 28, 2025

cuda: unary ops as float + de-duplicate (ggml/1130)

4bec2e4

cmdr2 commited on Mar 3, 2025

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

f959b90

cmdr2 commited on Feb 28, 2025

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

cmdr2 commited on Feb 28, 2025

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)

2b94a24

cmdr2 commited on Feb 25, 2025

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

JohannesGaessler commited on Feb 22, 2025

CUDA: optimize FA for GQA + large batches (llama/12014)

6662d54

JohannesGaessler commited on Feb 22, 2025

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

6cb8158

Garf commited on Feb 22, 2025

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

6641178

PureJourney

JohannesGaessler commited on Feb 21, 2025

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

Bodhi Bodhi Hu commited on Feb 21, 2025

CUDA: use async data loading for FlashAttention (llama/11894)

5b9980d

JohannesGaessler Diego Devesa commited on Feb 17, 2025

cuda : add ampere to the list of default architectures (llama/11870)

1d19dec

Diego Devesa commited on Feb 14, 2025

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)

ff2d3eb

R0CKSTAR commited on Feb 13, 2025

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)

78aed55

uvos commited on Feb 12, 2025

HIP: Switch to std::vector in rocblas version check (llama/11820)

e144c94

uvos commited on Feb 12, 2025

CUDA: fix CUDART_VERSION checks (llama/11821)

04f123a

JohannesGaessler commited on Feb 12, 2025

CUDA: use arch list for compatibility check (llama/11775)

b88e163

JohannesGaessler Diego Devesa commited on Feb 10, 2025

CUDA: fix min. version for movmatrix (llama/11751)

9ac5316

JohannesGaessler commited on Feb 8, 2025

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

78e36a2

JohannesGaessler commited on Feb 5, 2025

CUDA: non-contiguous (RMS) norm support (llama/11659)

4c2e171

JohannesGaessler

ggerganov commited on Feb 4, 2025

CUDA: fix Volta FlashAttention logic (llama/11615)

6df9571

JohannesGaessler commited on Feb 3, 2025

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

acfd94f

JohannesGaessler commited on Feb 2, 2025

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

ed08269

uvos commited on Feb 2, 2025

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

4850c24

uvos commited on Feb 2, 2025

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

f328957

JohannesGaessler Diego Devesa commited on Feb 2, 2025

HIP: Prepare reduction operators for wave 64

bc1c1a4

uvos commited on Jan 29, 2025

CUDA/HIP: add warp_size to cuda_device_info

e538e2c

uvos commited on Jan 29, 2025

HIP: Supress transformation warning in softmax.cu

72c6f1d

uvos commited on Jan 28, 2025

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)

82bb7f3

Nikita Sarychev commited on Jan 28, 2025

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

04b01d8

Haus1 commited on Jan 27, 2025

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

2cc4df4

uvos commited on Jan 25, 2025

hip : Add hipGraph and VMM support to ROCM (llama/11362)

089afa0

uvos commited on Jan 24, 2025

CUDA: fix FP16 cuBLAS GEMM (llama/11396)

7b7c5d3

JohannesGaessler commited on Jan 24, 2025

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)

6f5687a

uvos commited on Jan 24, 2025

Commit History

metal : improve FA + improve MoE (llama/12612) 04a3389

files : remove old wkv6 (#0) ee92ae5

HIP: Add support for RDNA4 targets (llama/12372) a73f01f

CUDA: Fix clang warnings (llama/12540) efa6dac

musa: refine compute capability (llama/12493) 5e508d2

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183) 3a7ca19

musa: override warp_size of musa device to 32 (llama/12445) 184c152

llama: Add support for RWKV v7 architecture (llama/12412) 727de7e

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394) 1e69b8c

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315) 2adc060

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177) 1f75790

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222) 4dc8a81

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209) 18afa4b

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d

CUDA: compress mode option and default to size (llama/12029) 4ec988a

ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098) 0b52fcc

cuda: unary ops as float + de-duplicate (ggml/1130) 4bec2e4

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) f959b90

cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) 2b94a24

CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16

CUDA: optimize FA for GQA + large batches (llama/12014) 6662d54

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 6cb8158

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984) 6641178

MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac

CUDA: use async data loading for FlashAttention (llama/11894) 5b9980d

cuda : add ampere to the list of default architectures (llama/11870) 1d19dec

musa: bump MUSA SDK version to rc3.1.1 (llama/11822) ff2d3eb

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831) 78aed55

HIP: Switch to std::vector in rocblas version check (llama/11820) e144c94

CUDA: fix CUDART_VERSION checks (llama/11821) 04f123a

CUDA: use arch list for compatibility check (llama/11775) b88e163

CUDA: fix min. version for movmatrix (llama/11751) 9ac5316

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656) 78e36a2

CUDA: non-contiguous (RMS) norm support (llama/11659) 4c2e171

CUDA: fix Volta FlashAttention logic (llama/11615) 6df9571

HIP: fix flash_attn_stream_k_fixup warning (llama/11604) acfd94f

CUDA/HIP: add support for selectable warp size to mmv (llama/11519) ed08269

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601) 4850c24

CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957

HIP: Prepare reduction operators for wave 64 bc1c1a4

CUDA/HIP: add warp_size to cuda_device_info e538e2c

HIP: Supress transformation warning in softmax.cu 72c6f1d

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080) 82bb7f3

AMD: parse the architecture as supplied by gcnArchName (llama/11244) 04b01d8

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4

hip : Add hipGraph and VMM support to ROCM (llama/11362) 089afa0

CUDA: fix FP16 cuBLAS GEMM (llama/11396) 7b7c5d3

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356) 6f5687a