Commit History

2-bit quantizations (llama/4897)
8a399ab
unverified

Kawrakow ikawrakow commited on

scripts : sync-ggml-am.sh add option to skip commits
c34dd82
unverified

ggerganov commited on

talk-llama : sync llama.cpp
b9d2bd9
unverified

ggerganov commited on

sync : ggml
18bfc83
unverified

ggerganov commited on

examples : adapt to metal API
b65decb
unverified

ggerganov commited on

ggml: cache sin/cos for RoPE (llama/4908)
c315fbf
unverified

JohannesGaessler commited on

metal : remove old API (llama/4919)
d6abb6a
unverified

ggerganov commited on

metal : disable log for loaded kernels (llama/4794)
2305485
unverified

ggerganov commited on

gguf : fix potential infinite for-loop (llama/4600)
0e93179
unverified

texmex76 Bernhard Gstrein commited on

metal : refactor kernel loading code (llama/4794)
53e6bf8
unverified

ggerganov commited on

CUDA: faster q8_0 -> f16 dequantization (llama/4895)
0a1a178
unverified

JohannesGaessler commited on

talk-llama : add optional CLI arg to set the bot name (#1764)
63c8089
unverified

RhinoDevel commited on

examples : add python example for transcription (#1744)
d600e4c
unverified

contractorwolf commited on

whisper : load the model into multiple buffers of max size 1GB (#1763)
0e9101f
unverified

ggerganov commited on

talk-llama : sync llama.cpp
75c5f9c
unverified

ggerganov commited on

sync : ggml
2ed0a44
unverified

ggerganov commited on

backend_sched : fix assignments
cb91db5
unverified

slaren commited on

CUDA: fix softmax compile for old CUDA versions (llama/4862)
5eda533
unverified

JohannesGaessler commited on

Importance Matrix calculation (llama/4861)
c0b17f1
unverified

Kawrakow ikawrakow ggerganov commited on

models : make all scripts to be POSIX Compliant (#1725)
f7aef3e
unverified

sonphantrung commited on

ggml : fix 32-bit ARM compat for IQ2_XS (#1758)
d5836c9
unverified

ggerganov commited on

go : add SetInitialPrompt method to bindings (#1753)
5fd6678
unverified

blib321 commited on

server : add more parameters to server api (#1754)
cb0cf7b
unverified

George Hindle commited on

whisper : fix segment length with params.no_timestamps == true
720d738
unverified

ggerganov commited on

params : don't compute timestamps when not printing them (#1755)
251825e
unverified

George Hindle commited on

talk-llama : sync llama.cpp
f33490f
unverified

ggerganov commited on

swift : remove local ggml.h reference
98b68e8
unverified

ggerganov commited on

swift : track ggml release branch
ece2b9d
unverified

ggerganov commited on

sync : ggml
9af4c11
unverified

ggerganov commited on

sync : llama.cpp
569565f
unverified

ggerganov commited on

ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856)
5e827d5
unverified

Kawrakow ikawrakow commited on

metal : put encoder debug group behind a define (llama/4873)
6e822b8
unverified

Paul Tsochantaris commited on

metal : improve dequantize precision to match CPU (llama/4836)
f2da2a4
unverified

ggerganov commited on

ggml : fix vld1q_s8_x4 32-bit compat (llama/4828)
efed5ba
unverified

ggerganov commited on

CUDA: faster softmax via shared memory + fp16 math (llama/4742)
52c45b9
unverified

JohannesGaessler commited on

metal : fix deprecation warning (ggml/690)
b1e29bc
unverified

ggerganov commited on

ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)
6469bfe
unverified

Timothy Cronin commited on

metal : wrap each operation in debug group (ggml/690)
b5e360f
unverified

Jack Mousseau commited on

ggml : change GGML_MAX_NAME at compile time (ggml/682)
ded2b1a
unverified

leejet commited on

Fix execlp call (ggml/689)
abda16e
unverified

Halalaluyafail3 commited on

SOTA 2-bit quants (llama/4773)
75de5bf
unverified

Kawrakow ikawrakow commited on

CUDA: fixed redundant value dequantization (llama/4809)
70c8d60
unverified

JohannesGaessler commited on

ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (llama/4787)
f391d7a
unverified

Konstantin Zhuravlyov commited on

ggml : do not sched_yield when calling BLAS (llama/4761)
5d1dffc
unverified

ggerganov commited on

ggml : include stdlib.h before intrin.h (llama/4736)
743cace
unverified

ggerganov commited on

swift : checkout ggml commit instead of branch (#1750)
6ab88cc
unverified

Alexandru Mariuti commited on

talk-llama : add optional Piper TTS support (#1749)
fb92e62
unverified

RhinoDevel commited on

server : add request path option(#1741)
6c319ac
unverified

eschmidbauer commited on

main : add cli option to disable system prints (#1740)
97e710a
unverified

ggerganov commited on