Spaces:
Running
Running
Commit History
faster avx512 exp implementation (llama/7551)
6dbbbab
ggml : fix loongarch build (O2 issue) (llama/7636)
133ffbf
junchao-loongson
commited on
metal : remove invalid asserts (llama/7617)
562afce
metal : add missing asserts (llama/7617)
be552ab
ggml : fix YARN + add tests + add asserts (llama/7617)
15da5f7
cuda : non-cont concat support (llama/7610)
64d3007
llama-bench : add support for the RPC backend (llama/7435)
d460266
ggml : use atomic_flag for critical section (llama/7598)
68c6582
slaren
commited on
examples : adapt to new ggml_concat (ggml/0)
36af6c5
ggml : fix typo in ggml.c (llama/7603)
f06f1cb
Align GEMM dispatch (llama/7566)
2171dc6
sycl : fix assert (llama/7563)
b4fb287
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (llama/7552)
da90a1e
rpc : resource management rework (llama/7562)
7571b13
fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436)
f0ee71c
Neo Zhang
commited on
ggml : generalize GGML_OP_CONCAT (llama/7563)
8d359ad
update HIP_UMA #7399 (llama/7414)
7097123
Allow multiple copy function pointers for CUDA graph kernel param updates (llama/7565)
143f6df
agray3
commited on
Fix q_xxs using mul_mat_q (llama/7459)
0be4f48
AidanBeltonS
commited on
Add freq factors (llama/7495)
340b830
AidanBeltonS
commited on
metal : add GGML_OP_REPEAT kernels (llama/7557)
0534b5d
metal : disable FA kernel for HS=256 (llama/7556)
0c32e28
ggml : restore ggml_rope_xpos_inplace (ggml/0)
0641dee
ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama/7433)
51f504f
Masaya, Kato
commited on
ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0)
9f41704
ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)
4005bca
ggml : drop support for QK_K=64 (llama/7473)
8737d46
Update vulkan rope implementation to support frequency factors (llama/7475)
be0ec58
CUDA: fix FA out-of-bounds reads (llama/7479)
b38d0f9
CUDA: fix FA out-of-bounds writes (llama/7465)
2e26e3a
cuda : fix compile warning (llama/7454)
58db6c8
CUDA: remove incorrect precision check (llama/7454)
eb4b5e0
cuda : fix rope + add tests (llama/7452)
215ce5c
llama : add phi3 128K model support (llama/7225)
ef68527
metal : handle F16 inf values, fix FA partial offload (llama/7434)
8d153a7
CUDA: fix unused warning in mmq.cu (llama/7442)
f16510d
CUDA: deduplicate mmq code (llama/7397)
e7b20b1
rpc : track allocated buffers (llama/7411)
925eb7a
Update SYCL upscale operation (llama/7321)
3984ba6
AidanBeltonS
commited on
ggml-opencl, llama: using reserve() if count already known (llama/7272)
8325ed5
ggml : add loongarch lsx and lasx support (llama/6454)
9794ea7
junchao-loongson
Jinyang He
commited on
Add provisions for windows support for BF16 code including CMake provision for enabling AVX512_BF16 (llama/7258)
cf52931
Srihari-mcw
commited on
Vulkan Embedding Fix (llama/7360)
2bfeba3
ggml : fix another case of quants nans (llama/7387)
645c367
slaren
commited on
ggml: implement quantized KV cache for FA (llama/7372)
aef1b4b
cuda : clear error after buffer allocation failure (llama/7376)
b7f6691
slaren
commited on
Capture CUDA logging output (llama/7298)
3519475
fraxy-v
slaren
commited on