HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202) f95736f uvos commited on Jun 16, 2025
CUDA/HIP: Share the same unified memory allocation logic. (llama/12934) 143cb70 David Huang commited on Apr 15, 2025
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d David Huang commited on Mar 3, 2025
CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16 JohannesGaessler commited on Feb 22, 2025
CUDA: use mma PTX instructions for FlashAttention (llama/11583) f328957 JohannesGaessler Diego Devesa commited on Feb 2, 2025
Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420) 2cc4df4 uvos commited on Jan 25, 2025
ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (llama/11211) 79f750d rgerganov commited on Jan 13, 2025
ggml : add support for dynamic loading of backends (llama/10469) b73266f Diego Devesa ggerganov commited on Nov 25, 2024
CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318) e446f60 JohannesGaessler commited on Nov 17, 2024
ggml : build backends as libraries (llama/10256) 3dc93f3 Diego Devesa ggerganov R0CKSTAR commited on Nov 14, 2024