whisper.cpp / ggml-cuda.cu

Commit History

CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)
c0cfa9b
unverified

JohannesGaessler slaren commited on

CUDA: more warps for mmvq on NVIDIA (llama/5394)
7ab774c
unverified

JohannesGaessler commited on

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)
3ff7660
unverified

JohannesGaessler commited on

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
7aa3216
unverified

JohannesGaessler commited on

CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
ae45b38
unverified

JohannesGaessler commited on

cuda : fix LLAMA_CUDA_F16 (llama/5262)
5fd8fb7
unverified

slaren commited on

llava : add MobileVLM support (llama/5132)
f17a416
unverified

JidongZhang-THU slaren commited on

sync : ggml (llama/0)
cdb7964
unverified

ggerganov commited on

SOTA 3-bit quants (llama/5196)
4649943
unverified

Kawrakow ikawrakow commited on

`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
75d438c
unverified

John Balis slaren commited on

cuda : fix tensor size calculation for non-split buffer (llama/5145)
8f3eb65
unverified

slaren commited on

cuda : fix 2-bit quants on amd hip (llama/5105)
aadbd67
unverified

Engininja2 commited on

CUDA: more info when no device code (llama/5088)
e96ba7d
unverified

JohannesGaessler commited on

cuda : fix compile error in jetson platform (llama/4975)
0935414
unverified

Kylin commited on

ggml : add IQ2 to test-backend-ops + refactoring (llama/4990)
227f2ae
unverified

ggerganov commited on

ggml : introduce GGML_CALL function annotation (llama/4850)
7815f68
unverified

jartine commited on

cuda : fix dequantize kernel names (llama/4938)
95f6502
unverified

ggerganov commited on

CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938)
73c6598
unverified

Kawrakow ikawrakow commited on

CUDA: faster q8_0 -> f16 dequantization (llama/4895)
0a1a178
unverified

JohannesGaessler commited on

CUDA: fix softmax compile for old CUDA versions (llama/4862)
5eda533
unverified

JohannesGaessler commited on

ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856)
5e827d5
unverified

Kawrakow ikawrakow commited on

CUDA: faster softmax via shared memory + fp16 math (llama/4742)
52c45b9
unverified

JohannesGaessler commited on

SOTA 2-bit quants (llama/4773)
75de5bf
unverified

Kawrakow ikawrakow commited on

CUDA: fixed redundant value dequantization (llama/4809)
70c8d60
unverified

JohannesGaessler commited on

ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (llama/4787)
f391d7a
unverified

Konstantin Zhuravlyov commited on

fix : cuda order of synchronization when setting a buffer (ggml/679)
e48c553
unverified

Erik Scholz slaren commited on

ggml : add error handling to graph_compute (#1714)
92f24ee
unverified

finnvoorhees commited on

cuda : simplify expression
cda4a91

ggerganov slaren commited on

cuda : mark I16 and I32 ops as unsupported
cec288d

ggerganov commited on

CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)
654d245

JohannesGaessler commited on

CUDA: fix tensor core logic for Pascal and HIP (llama/4682)
977baeb

JohannesGaessler commited on

cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)
6980ee4

hydaitw commited on

sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) (#1691)
919a447
unverified

ggerganov commited on

sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)
aa86ade
unverified

ggerganov commited on

sync : ggml (Metal fixes, new ops, tests) (#1633)
a0d4b48
unverified

ggerganov commited on

sync : ggml (new ops, new backend, etc) (#1602)
895e87a
unverified

ggerganov commited on

cuda : sync some minor stuff from llama.cpp (#1548)
cc06e31
unverified

ggerganov commited on

cuda : assert ggml_add sources to be contiguous
c012035
unverified

ggerganov commited on

whisper : add batched decoding (#1486)
0131aa6
unverified

ggerganov commited on

ggml : fix some compile warnings
ad6c9c1
unverified

ggerganov commited on

whisper : add full CUDA and Metal offloading (#1472)
da4acca
unverified

ggerganov commited on

cuda : fix HIPBLAS build
46033e6
unverified

ggerganov commited on

sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422)
7006035
unverified

ggerganov Chris Raethke commited on

sync : ggml (const correctness)
4ce2d25
unverified

ggerganov commited on

sync : ggml (CUDA faster rope)
44e3164
unverified

ggerganov commited on

ggml : sync latest llama.cpp (view_src + alloc improvements) (#1247)
8bb66c1
unverified

ggerganov commited on

ggml : sync (ggml-alloc, GPU, eps, etc.) (#1220)
d41ba35
unverified

ggerganov commited on

whisper : initial hipBLAS support (#1209)
e093092
unverified

ardfork commited on