CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921) 5931562 JohannesGaessler commited on Jun 14, 2024
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860) 154bf2b JohannesGaessler commited on Jun 11, 2024
CUDA: revise q8_1 data layout for mul_mat_q (llama/7824) fcfd59e JohannesGaessler commited on Jun 9, 2024