Spaces:
Running
Running
Commit History
SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579)
f083887
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567)
1c781a8
Adrien Gallouët
commited on
vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536)
59600b5
Eve
commited on
ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562)
664be9a
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)
c7a861a
sycl : offload of get_rows set to 0 (llama/10432)
47b6bff
Alberto Cabrera Pérez
commited on
sycl : Reroute permuted mul_mats through oneMKL (llama/10408)
af13def
Alberto Cabrera Pérez
commited on
CANN: RoPE operator optimization (llama/10563)
3ad7b0a
vulkan: get the first command buffer submitted sooner (llama/10499)
e1c1e73
ggml : remove redundant copyright notice + update authors
c78cdd7
ggml : fix row condition for i8mm kernels (llama/10561)
01c713f
cmake : fix ARM feature detection (llama/10543)
c04a34f
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541)
bf73242
kompute : improve backend to pass test_backend_ops (llama/10542)
c8008b8
CANN: Fix SOC_TYPE compile bug (llama/10519)
7f24ebb
leo-pony
commited on
CANN: ROPE operator optimization (llama/10540)
63ee002
Add some minimal optimizations for CDNA (llama/10498)
bf49bbe
uvos
commited on
metal : fix group_norm support condition (llama/0)
20ee62d
vulkan: define all quant data structures in types.comp (llama/10440)
cea89af
vulkan: Handle GPUs with less shared memory (llama/10468)
18a0ad1
vulkan: further optimize q5_k mul_mat_vec (llama/10479)
cb018d4
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506)
c6d15e0
vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459)
c032c06
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516)
f2a87fc
R0CKSTAR
commited on
vulkan: fix group_norm (llama/10496)
8f5eeb8
cmake : enable warnings in llama (llama/10474)
26a670b
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487)
6d586a0
Charles Xu
commited on
CANN: Improve the Inferencing Performance for Ascend NPU Device (llama/10454)
f9fd6d6
Shanshan Shen
shanshan shen
Frank Mai
commited on
CANN: RoPE and CANCAT operator optimization (llama/10488)
b357ea7
vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484)
6a4b6ae
metal : enable mat-vec kernels for bs <= 4 (llama/10491)
6d07dee
llama : accept a list of devices to use to offload a model (llama/10497)
6d7599e
Diego Devesa
commited on
ggml : add support for dynamic loading of backends (llama/10469)
b73266f
metal : minor code formatting
385a521
ggml : do not use ARM features not included in the build (llama/10457)
0001327
Diego Devesa
commited on
CANN: Support Ascend310P to accelerate F32 and F16 Model (llama/10216)
c9e03e6
leo-pony
commited on
cuda : optimize argmax (llama/10441)
69ae50d
vulkan: predicate max operation in soft_max shaders/soft_max (llama/10437)
0a14325
vulkan: copy iq4_nl LUT into shared memory (llama/10409)
c31abdb
vulkan: further optimize mul_mat_vec using larger loads (llama/10387)
50a2978
add cmake rvv support (llama/10411)
e0bf47c
haopeng
commited on
CUDA: remove unnecessary warp reduce in FA (ggml/1032)
9a8c238
feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel (ggml/1019)
c7e59ef
metal : add `GGML_OP_CONV_TRANSPOSE_1D` kernels (ggml/1026)
9c845f4
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
80663f4
Frankie Robertson
commited on
ggml-opt: fix data corruption (ggml/1022)
a916e92
ggml/sched : do not skip views in pre-assignments
b1eba61
slaren
commited on