Spaces:
Running
Running
Commit History
ci : disable CUDA and Android builds fcafd21
ci : disable Obj-C build + fixes 3859606
make : shim cmake 15c1d58
talk-llama : sync llama.cpp 5908a19
sync : ggml 00d464f
ggml : add predefined list of CPU backend variants to build (llama/10626) 1794b43
Diego Devesa commited on
ggml-cpu : fix HWCAP2_I8MM value (llama/10646) b3e6ea8
Diego Devesa commited on
vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (llama/10642) e9ee893
SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (llama/10584) 385f335
Nicolò Scipione commited on
Avoid using __fp16 on ARM with old nvcc (llama/10616) 19743b6
Frankie Robertson commited on
vulkan: optimize and reenable split_k (llama/10637) bca95f5
ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037) dd775d5
ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) 154bbc0
files : remove make artifacts d3e3ea1
common : fix compile warning 6a0d528
ggml : move AMX to the CPU backend (llama/10570) 3732429
Diego Devesa commited on
metal : small-batch mat-mul kernels (llama/10581) 58b0822
SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579) f083887
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567) 1c781a8
Adrien Gallouët commited on
vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536) 59600b5
Eve commited on
ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562) 664be9a
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580) c7a861a
sycl : offload of get_rows set to 0 (llama/10432) 47b6bff
Alberto Cabrera Pérez commited on
sycl : Reroute permuted mul_mats through oneMKL (llama/10408) af13def
Alberto Cabrera Pérez commited on
CANN: RoPE operator optimization (llama/10563) 3ad7b0a
vulkan: get the first command buffer submitted sooner (llama/10499) e1c1e73
ggml : remove redundant copyright notice + update authors c78cdd7
ggml : fix row condition for i8mm kernels (llama/10561) 01c713f
cmake : fix ARM feature detection (llama/10543) c04a34f
ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) bf73242
kompute : improve backend to pass test_backend_ops (llama/10542) c8008b8
CANN: Fix SOC_TYPE compile bug (llama/10519) 7f24ebb
leo-pony commited on
CANN: ROPE operator optimization (llama/10540) 63ee002
Add some minimal optimizations for CDNA (llama/10498) bf49bbe
uvos commited on
metal : fix group_norm support condition (llama/0) 20ee62d
vulkan: define all quant data structures in types.comp (llama/10440) cea89af
vulkan: Handle GPUs with less shared memory (llama/10468) 18a0ad1
vulkan: further optimize q5_k mul_mat_vec (llama/10479) cb018d4
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506) c6d15e0
vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459) c032c06
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516) f2a87fc
R0CKSTAR commited on
vulkan: fix group_norm (llama/10496) 8f5eeb8
cmake : enable warnings in llama (llama/10474) 26a670b
ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487) 6d586a0
Charles Xu commited on
CANN: Improve the Inferencing Performance for Ascend NPU Device (llama/10454) f9fd6d6
Shanshan Shen shanshan shen Frank Mai commited on
CANN: RoPE and CANCAT operator optimization (llama/10488) b357ea7
vulkan: Fix a vulkan-shaders-gen arugment parsing error (llama/10484) 6a4b6ae
metal : enable mat-vec kernels for bs <= 4 (llama/10491) 6d07dee
llama : accept a list of devices to use to offload a model (llama/10497) 6d7599e
Diego Devesa commited on