js : remove un-needed request header from fetchRemote (#2119) 6c54394 unverified Mark Karpelès commited on May 13, 2024
main : dont print timings with --no-prints (#2108) 685d1c1 unverified Daniel Ziegenberg commited on May 13, 2024
main : add options for temperature control (#2088) 9a3f777 unverified Daniel Ziegenberg commited on May 13, 2024
whisper.android : update example, add field to print timestamp (#2072) 03fb680 unverified codezjx commited on May 13, 2024
main : fix double quote escaping in csv output (#2090) 9952a85 unverified mashizora commited on May 13, 2024
ggml : optimize for ppc64le using VSX intrinsics (ggml/784) 05d3824 Hong Bo PENG ggerganov commited on May 12, 2024
metal : fix flash attention kernel requirements (llama/7169) 6cb3028 ggerganov commited on May 10, 2024
Minor arithmetic improvement to mmvq wrapper kernel (llama/7172) ae75124 Ouadie EL FAROUKI commited on May 10, 2024
opencl : alignment size converted from bits to bytes (llama/7090) 2692ce5 albertjin Cebtenzzre commited on May 9, 2024
metal : use `vm_allocate` instead of `posix_memalign` on macOS (llama/7078) eb910b1 Gilad S commited on May 8, 2024
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (llama/7019) 4cf786d JohannesGaessler commited on May 1, 2024
ggml : add Flash Attention (llama/5021) 34d3b03 ggerganov JohannesGaessler phymbert commited on Apr 30, 2024
Fix more int overflow during quant (PPL/CUDA). (llama/6563) 531387f dranger003 commited on Apr 28, 2024
gguf : enforce that tensor names are unique (llama/6905) 22e446d Xuan Son Nguyen slaren commited on Apr 28, 2024
Reset schedule earlier to allow overlap with ggml graph computation on device (llama/6933) 3a8eea8 agray3 commited on Apr 26, 2024
gguf : fix mismatch between alloc and free functions (llama/6929) d8fb433 slaren commited on Apr 26, 2024
ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (llama/6906) f900de6 ggerganov commited on Apr 25, 2024
ggml : fix ggml_backend_cpu_supports_op() for CPY (llama/0) d645791 ggerganov commited on Apr 21, 2024
ggml : group all experts in a single ggml_mul_mat_id (llama/6505) f0b5c67 slaren ggerganov commited on Apr 18, 2024