Commit History

ggml : fix compile warnings (unused vars) (llama/4966)
97fa2e3
unverified

ggerganov commited on

ggml : add mmla kernels for quantized GEMM (llama/4966)
0d50a29
unverified

snadampal commited on

metal : use autoreleasepool to avoid memory leaks (llama/5437)
c276f12
unverified

irbull commited on

ggml-alloc : v3 (ggml/727)
5cffd6f
unverified

slaren commited on

examples : added audio_ctx argument to main and server (#1857)
469988b
unverified

dscripka ggerganov commited on

metal : option to embed MSL source into compiled binary (#1842)
a46b62a
unverified

Didzis Gosko commited on

examples : initialize context params properly (#1852)
3443ee7
unverified

ggerganov commited on

talk-llama : sync llama.cpp
e6d6e1d
unverified

ggerganov commited on

sync : ggml
94800c5
unverified

ggerganov commited on

src : relocate new backend sources
44cd2d4
unverified

ggerganov commited on

ggml : fix `error C2078: too many initializers` for MSVC ARM64 (llama/5404)
8ebb36c
unverified

Michael Podvitskiy commited on

CUDA: more warps for mmvq on NVIDIA (llama/5394)
7ab774c
unverified

JohannesGaessler commited on

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)
3ff7660
unverified

JohannesGaessler commited on

Basic Vulkan Multi-GPU implementation (llama/5321)
5d130aa
unverified

OccamRazor slaren commited on

CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
7aa3216
unverified

JohannesGaessler commited on

Slight quantization improvement for Q4_K and Q5_K (llama/5361)
e3cd020
unverified

Kawrakow ikawrakow commited on

CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
ae45b38
unverified

JohannesGaessler commited on

ggml : make use of ggml-quants.h possible in C++ code (llama/5338)
963ade6
unverified

Kawrakow ikawrakow commited on

ggml : avoid duplicating function calls using MIN/MAX macros (llama/5325)
9bb2b0a
unverified

Dr. Tom Murphy VII Ph.D ggerganov commited on

iq2_xxs: tune quantization (llama/5320)
11e5f6b
unverified

Kawrakow ikawrakow commited on

cuda : fix LLAMA_CUDA_F16 (llama/5262)
5fd8fb7
unverified

slaren commited on

metal : add im2col F32 dst support (llama/5132)
26aec77
unverified

ggerganov commited on

llava : add MobileVLM support (llama/5132)
f17a416
unverified

JidongZhang-THU slaren commited on

ggml : limit n_threads to the max n_tasks (llama/5238)
2645c33
unverified

slaren commited on

kompute : llama-bench support and ggml_cpu_has_kompute() (llama/5226)
0c9c434
unverified

Cebtenzzre commited on

ggml : add abort_callback for cpu backend (ggml/725)
a8ea91b
unverified

Michael Podvitskiy commited on

extra : update sync scripts
d99e873
unverified

ggerganov commited on

server : allow CORS request with authorization headers (#1850)
16a6639
unverified

Valentin Gosu commited on

whisper.android : how to build with CLBlast (#1809)
eea7f53
unverified

lcfrs ggerganov commited on

whisper : expose CUDA device setting in public API (#1840)
d13ee66
unverified

Didzis Gosko commited on

make : add macOS deployment target option (#1839)
9c90601
unverified

Didzis Gosko commited on

talk-llama : stream response (#1121)
2193f2b
unverified

ggerganov commited on

sync : ggml (#0)
fded75b
unverified

ggerganov commited on

ggml : fix IQ3_XXS on Metal (llama/5219)
f066321
unverified

Kawrakow ikawrakow commited on

sync : ggml (llama/0)
cdb7964
unverified

ggerganov commited on

Faster AVX2 dot product for IQ2_XS (llama/5187)
187ae44
unverified

Kawrakow ikawrakow PeterReid commited on

SOTA 3-bit quants (llama/5196)
4649943
unverified

Kawrakow ikawrakow commited on

ggml alloc: Fix for null dereference on alloc failure (llama/5200)
8181686
unverified

Paul Tsochantaris commited on

ggml : add max buffer sizes to opencl and metal backends (llama/5181)
3d354d0
unverified

slaren commited on

metal : free metal objects (llama/5161)
ea7167a
unverified

Paul Tsochantaris commited on

gguf : fix comparison (ggml/715)
80cfca4
unverified

ggerganov commited on

`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
75d438c
unverified

John Balis slaren commited on

gguf : add input validation, prevent integer overflows (ggml/709)
5bf1614
unverified

ggerganov commited on

ci : fix yolo URLs + fix metal capture (ggml/712)
588f789
unverified

ggerganov commited on

metal : add debug capture backend function (ggml/694)
ece88c3
unverified

Jack Mousseau ggerganov commited on

common : fix wav buffer detection (#1819)
bc84057
unverified

JacobLinCool commited on

server : add fields to `verbose_json` response (#1802)
763d09d
unverified

JacobLinCool commited on

make : update MSYS_NT (#1813)
587152f
unverified

jwijffels commited on

talk-llama : sync llama.cpp
1453539
unverified

ggerganov commited on