whisper.cpp / ggml /src /ggml-cuda

Commit History

metal : improve FA + improve MoE (llama/12612)
04a3389

ggerganov commited on

files : remove old wkv6 (#0)
ee92ae5

ggerganov commited on

HIP: Add support for RDNA4 targets (llama/12372)
a73f01f

Slobodan Josic commited on

CUDA: Fix clang warnings (llama/12540)
efa6dac

R0CKSTAR commited on

musa: refine compute capability (llama/12493)
5e508d2

R0CKSTAR commited on

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19

Gaurav Garg JohannesGaessler commited on

musa: override warp_size of musa device to 32 (llama/12445)
184c152

R0CKSTAR commited on

llama: Add support for RWKV v7 architecture (llama/12412)
727de7e

mollysama commited on

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c

Gaurav Garg commited on

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
2adc060

uvos commited on

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)
1f75790

uvos JohannesGaessler commited on

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)
4dc8a81

JohannesGaessler commited on

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)
18afa4b

uvos commited on

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
a027c1d

David Huang commited on

CUDA: compress mode option and default to size (llama/12029)
4ec988a

Erik Scholz commited on

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
d6b6852

William Tambellini slaren commited on

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)
0b52fcc

JohannesGaessler commited on

cuda: unary ops as float + de-duplicate (ggml/1130)
4bec2e4

cmdr2 commited on

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)
f959b90

cmdr2 commited on

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)
67e8c32

cmdr2 commited on

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)
2b94a24

cmdr2 commited on

CUDA: app option to compile without FlashAttention (llama/12025)
fbc5f16

JohannesGaessler commited on

CUDA: optimize FA for GQA + large batches (llama/12014)
6662d54

JohannesGaessler commited on

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)
6cb8158

Garf commited on

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)
6641178

PureJourney JohannesGaessler commited on

MUSA: support ARM64 and enable dp4a .etc (llama/11843)
ab96dac

Bodhi Bodhi Hu commited on

CUDA: use async data loading for FlashAttention (llama/11894)
5b9980d

JohannesGaessler Diego Devesa commited on

cuda : add ampere to the list of default architectures (llama/11870)
1d19dec

Diego Devesa commited on

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)
ff2d3eb

R0CKSTAR commited on

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)
78aed55

uvos commited on

HIP: Switch to std::vector in rocblas version check (llama/11820)
e144c94

uvos commited on

CUDA: fix CUDART_VERSION checks (llama/11821)
04f123a

JohannesGaessler commited on

CUDA: use arch list for compatibility check (llama/11775)
b88e163

JohannesGaessler Diego Devesa commited on

CUDA: fix min. version for movmatrix (llama/11751)
9ac5316

JohannesGaessler commited on

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)
78e36a2

JohannesGaessler commited on

CUDA: fix Volta FlashAttention logic (llama/11615)
6df9571

JohannesGaessler commited on

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)
acfd94f

JohannesGaessler commited on

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
ed08269

uvos commited on

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)
4850c24

uvos commited on

CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957

JohannesGaessler Diego Devesa commited on

HIP: Prepare reduction operators for wave 64
bc1c1a4

uvos commited on

CUDA/HIP: add warp_size to cuda_device_info
e538e2c

uvos commited on

HIP: Supress transformation warning in softmax.cu
72c6f1d

uvos commited on

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)
82bb7f3

Nikita Sarychev commited on

AMD: parse the architecture as supplied by gcnArchName (llama/11244)
04b01d8

Haus1 commited on

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)
2cc4df4

uvos commited on

hip : Add hipGraph and VMM support to ROCM (llama/11362)
089afa0

uvos commited on

CUDA: fix FP16 cuBLAS GEMM (llama/11396)
7b7c5d3

JohannesGaessler commited on

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)
6f5687a

uvos commited on