ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379) a575f57 compilade commited on Aug 18
vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355) 054584a jeffbolznv OccamRazor commited on Aug 17
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334) a6fa78e jeffbolznv commited on Aug 16
vulkan : fix out-of-bounds access in argmax kernel (llama/15342) 78a1865 ggerganov commited on Aug 15
finetune: SGD optimizer, more CLI args (llama/13873) f585fe7 Jonathan Graehl OccamRazor JohannesGaessler commited on Aug 14
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) c768824 ORippler commited on Aug 13
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (llama/15188) c8284f2 aixsatoshi Shinnosuke Takagi commited on Aug 13
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273) 8fca6dd uvos commited on Aug 12
sycl: Fix and disable more configurations of mul_mat (llama/15151) 7b868ed Romain Biessy commited on Aug 12
musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236) 4168dda yeahdongcn commited on Aug 12
CUDA: attention sinks for mma FlashAttention (llama/15157) 0ab9aba JohannesGaessler commited on Aug 8
vulkan: Add env var to disable host visible vidmem (llama/15109) 5ec4382 jeffbolznv commited on Aug 7
HIP: add cmake option to enable compiler output of kernel resource usage metrics (llama/15103) 577f7e4 uvos commited on Aug 7
ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) f84562e Christian Kastner commited on Aug 7
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 1d24833 JohannesGaessler commited on Aug 7
ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2b7ae5e Diego Devesa commited on Aug 6