Commit History

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
5931562

JohannesGaessler commited on

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)
154bf2b

JohannesGaessler commited on

CUDA: use tensor cores for MMQ (llama/7676)
78a5b67

JohannesGaessler commited on

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)
fcfd59e

JohannesGaessler commited on

CUDA: refactor mmq, dmmv, mmvq (llama/7716)
849ff52

JohannesGaessler commited on

sync : ggml (#2001)
cbbfa9e
unverified

ggerganov commited on