Commit History

sync : ggml
278a9b3
unverified

ggerganov commited on

ggml : add unified SYCL backend for Intel GPUs (llama/2690)
01169e0
unverified

Abhilash Majumder jianyuzh KevinLy hengyu ggerganov commited on

ggml : minor type fix (int64_t -> size_t)
1bbb1a9
unverified

ggerganov commited on

common : fix input buffer check (#1812)
6c38a7f
unverified

ggerganov commited on

talk-llama : sync llama.cpp
92cfd93
unverified

ggerganov commited on

sync : ggml
5a9540e
unverified

ggerganov commited on

Add OpenCL add kernel (llama/5151)
f833987
unverified

OccamRazor commited on

cuda : fix tensor size calculation for non-split buffer (llama/5145)
8f3eb65
unverified

slaren commited on

ggml-alloc : add 10% margin to the buffer sizes (llama/5149)
c55bdf8
unverified

slaren commited on

ggml : update softmax n_task calculation (llama/5126)
3a3eb8e
unverified

snadampal commited on

metal : remove unused `n_buffers` and `buffers` (llama/5129)
a3e87d3
unverified

Paul Tsochantaris commited on

metal : show compile log messages
ae08f31
unverified

ggerganov commited on

cuda : fix 2-bit quants on amd hip (llama/5105)
aadbd67
unverified

Engininja2 commited on

llama : pre-allocate input tensors in a separate buffer (llama/5100)
20a4ca1
unverified

slaren commited on

metal : disable support for MUL_MAT F32 x F16
7fbc01f
unverified

ggerganov commited on

CUDA: more info when no device code (llama/5088)
e96ba7d
unverified

JohannesGaessler commited on

minor : clean-up some warnings and style (llama/5094)
7df090b
unverified

ggerganov commited on

ggml : parallelize FP32 conversion when using BLAS (llama/5045)
7bf2c87
unverified

reinforce20001 ggerganov commited on

llava : MobileVLM support (llama/4954)
dc8f956
unverified

cxt123 Chenxiaotao03 commited on

llama : run all KQV ops on the CPU with no KV offload (llama/5049)
97ce95c
unverified

slaren commited on

cuda : fix compile error in jetson platform (llama/4975)
0935414
unverified

Kylin commited on

ggml : check ggml_add src1 type (ggml/708)
aa5d6ed
unverified

Judd Judd commited on

docs : make model options / model install methods clearer (#1806)
a2bec1d
unverified

mikey-rrr commited on

cmake : make libwhisper.so position independent (#1792)
1cf1553
unverified

trixirt commited on

cmake : temporary remove VLA check (#1795)
1a32e6f
unverified

ggerganov commited on

whisper.android : return output from benchmarks (#1785)
5cff61b
unverified

lcfrs commited on

server : implement "verbose_json" format with token details (#1781)
d6e13b6
unverified

rmmh commited on

ggml : sync ggml-metal.m
b4085c3
unverified

ggerganov commited on

sync : llama.cpp
5de718a
unverified

ggerganov commited on

sync : ggml
34bdd70
unverified

ggerganov commited on

ggml : add IQ2 to test-backend-ops + refactoring (llama/4990)
227f2ae
unverified

ggerganov commited on

imatrix : offload to GPU support (llama/4957)
6490f98
unverified

ggerganov commited on

backend : add eval callback (llama/4935)
3cc64d6
unverified

ggerganov commited on

metal : create autorelease pool during library build (llama/4970)
9027276
unverified

ggerganov commited on

ggml : importance matrix support for legacy quants (llama/4969)
d8bb9d8
unverified

Kawrakow ikawrakow commited on

metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (llama/4936)
e2cc0e5
unverified

azarovalex ggerganov commited on

ggml : introduce GGML_CALL function annotation (llama/4850)
7815f68
unverified

jartine commited on

cuda : fix dequantize kernel names (llama/4938)
95f6502
unverified

ggerganov commited on

CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938)
73c6598
unverified

Kawrakow ikawrakow commited on

Add ability to use importance matrix for all k-quants (llama/4930)
7032309
unverified

Kawrakow ikawrakow commited on

talk-llama : optional wake-up command and audio confirmation (#1765)
542e8da
unverified

rakksor commited on

server : fix building and simplify lib deps on Windows (#1772)
f928f33
unverified

Przemysław Pawełczyk commited on

talk-llama : sync llama.cpp
62ad8e0
unverified

ggerganov commited on

talk-llama : llama.cpp
d128cb3
unverified

ggerganov commited on

sync : ggml
6a472b5
unverified

ggerganov commited on

metal : correctly set SIMD support flags on iOS (llama/4923)
1cf2fa9
unverified

azarovalex commited on

2-bit quantizations (llama/4897)
8a399ab
unverified

Kawrakow ikawrakow commited on

scripts : sync-ggml-am.sh add option to skip commits
c34dd82
unverified

ggerganov commited on

talk-llama : sync llama.cpp
b9d2bd9
unverified

ggerganov commited on