Commits · Xenobd/whisper.cpp

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

a027c1d

David Huang commited on Mar 3, 2025

ggml : fix kleidiai build (llama/12159)

dbc0180

ag2s20150909 commited on Mar 3, 2025

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)

1d6d451

Akarshan Biswas commited on Mar 3, 2025

ggml-backend : keep paths in native string type when possible (llama/12144)

6e89d8c

Diego Devesa commited on Mar 2, 2025

CUDA: compress mode option and default to size (llama/12029)

4ec988a

Erik Scholz commited on Mar 1, 2025

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

d6b6852

William Tambellini slaren commited on Feb 28, 2025

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)

d7d82b9

Rémy O commited on Feb 28, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

0b52fcc

JohannesGaessler commited on Feb 28, 2025

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)

459beb1

Prashant Vithule vithulep commited on Feb 28, 2025

CANN: Fix build error with GCC 13 (llama/11990)

dcf68db

hipudding commited on Feb 28, 2025

vulkan: matmul dequantization improvements (llama/12015)

ffdf466

Eve commited on Feb 28, 2025

vulkan: improve im2col (llama/11826)

f6cff0a

Daniele commited on Feb 28, 2025

cmake: Fix ggml backend dependencies and installation (llama/11818)

c6c2a2c

Vladimir Vuksanovic commited on Feb 27, 2025

vulkan: fix assertion when qy_needs_dequant (llama/12068)

271c7e4

jeffbolznv commited on Feb 25, 2025

ggml-cpu: Fix build with sve (llama/12059)

4be146e

mollysama commited on Feb 25, 2025

cuda: unary ops as float + de-duplicate (ggml/1130)

4bec2e4

cmdr2 commited on Mar 3, 2025

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

f959b90

cmdr2 commited on Feb 28, 2025

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

cmdr2 commited on Feb 28, 2025

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)

3d4f29c

petterreinholdtsen Petter Reinholdtsen commited on Feb 26, 2025

common : more general m_audio_len update logic (#2855)

4674264
unverified

Ivy233 Ivy233 commited on Mar 7, 2025

go : improve model download (#2756)

168712d
unverified

Ryan Johnson commited on Mar 7, 2025

common : fix audio loading by miniaudio (#2862)

494fb84
unverified

Dmitry Atamanov commited on Mar 4, 2025

fix: missing include common-whisper (#2858)

2271d56
unverified

Lin Xiaodong commited on Mar 2, 2025

ruby : follow audio library change (#2851)

b94e7d3
unverified

KitaitiMakoto commited on Feb 28, 2025

whisper : support GGML_BACKEND_DL (#2843)

2e6437e
unverified

Diego Devesa

ggerganov commited on Feb 27, 2025

common : separate whisper sources (#2846)

0447b9d
unverified

ggerganov commited on Feb 27, 2025

common : fix build min/max (#2845)

07533a2
unverified

ggerganov commited on Feb 27, 2025

examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)

7a280a4
unverified

Dmitry Atamanov commited on Feb 27, 2025

stream : stop on ^C when no audio is received (#2822)

45399ad
unverified

petterreinholdtsen Petter Reinholdtsen commited on Feb 27, 2025

sync : ggml

7926873

ggerganov commited on Feb 26, 2025

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)

2b94a24

cmdr2 commited on Feb 25, 2025

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

6c8e7ec

Garf

ggerganov commited on Feb 25, 2025

opencl: fix for small models (llama/11950)

4532dc6

lhez Shawn Gu Skyler Szot commited on Feb 24, 2025

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)

14fd317

Neo Zhang Jianyu arthw commited on Feb 24, 2025

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)

310a36c

qnixsynapse commited on Feb 24, 2025

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)

4aa54ec

Aaron Teo Jinyang He junchao-zhao commited on Feb 22, 2025

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

JohannesGaessler commited on Feb 22, 2025

CUDA: optimize FA for GQA + large batches (llama/12014)

6662d54

JohannesGaessler commited on Feb 22, 2025

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

6cb8158

Garf commited on Feb 22, 2025

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

6641178

PureJourney

JohannesGaessler commited on Feb 21, 2025

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

Bodhi Bodhi Hu commited on Feb 21, 2025

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)

9de6d81

Charles Xu commited on Feb 20, 2025

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)

1a1acd2

Prashant Vithule vithulep

ggerganov commited on Feb 20, 2025

CUDA: use async data loading for FlashAttention (llama/11894)

5b9980d

JohannesGaessler Diego Devesa commited on Feb 17, 2025

vulkan: implement several ops relevant for ggml_opt (llama/11769)

3c2171d

Rémy O commited on Feb 17, 2025

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)

1c7a669

jeffbolznv commited on Feb 16, 2025

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

Hale Chan commited on Feb 16, 2025

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

Adrian Kretz commited on Feb 15, 2025

repo : update links to new url (llama/11886)

9705bb5

ggerganov commited on Feb 15, 2025

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)

0d2e888

Rémy O commited on Feb 15, 2025

Commit History

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032) a027c1d

ggml : fix kleidiai build (llama/12159) dbc0180

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133) 1d6d451

ggml-backend : keep paths in native string type when possible (llama/12144) 6e89d8c

CUDA: compress mode option and default to size (llama/12029) 4ec988a

ggml : upgrade init_tensor API to return a ggml_status (llama/11854) d6b6852

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595) d7d82b9

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098) 0b52fcc

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064) 459beb1

CANN: Fix build error with GCC 13 (llama/11990) dcf68db

vulkan: matmul dequantization improvements (llama/12015) ffdf466

vulkan: improve im2col (llama/11826) f6cff0a

cmake: Fix ggml backend dependencies and installation (llama/11818) c6c2a2c

vulkan: fix assertion when qy_needs_dequant (llama/12068) 271c7e4

ggml-cpu: Fix build with sve (llama/12059) 4be146e

cuda: unary ops as float + de-duplicate (ggml/1130) 4bec2e4

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) f959b90

cuda/cpu: Increase support for fp16 unary operations (ggml/1125) 67e8c32

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) 3d4f29c

common : more general m_audio_len update logic (#2855) 4674264 unverified

go : improve model download (#2756) 168712d unverified

common : fix audio loading by miniaudio (#2862) 494fb84 unverified

fix: missing include common-whisper (#2858) 2271d56 unverified

ruby : follow audio library change (#2851) b94e7d3 unverified

whisper : support GGML_BACKEND_DL (#2843) 2e6437e unverified

common : separate whisper sources (#2846) 0447b9d unverified

common : fix build min/max (#2845) 07533a2 unverified

examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759) 7a280a4 unverified

stream : stop on ^C when no audio is received (#2822) 45399ad unverified

sync : ggml 7926873

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) 2b94a24

metal : copy kernels for quant to F32/F16 conversions (llama/12017) 6c8e7ec

opencl: fix for small models (llama/11950) 4532dc6

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035) 14fd317

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995) 310a36c

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019) 4aa54ec

CUDA: app option to compile without FlashAttention (llama/12025) fbc5f16

CUDA: optimize FA for GQA + large batches (llama/12014) 6662d54

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000) 6cb8158

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984) 6641178

MUSA: support ARM64 and enable dp4a .etc (llama/11843) ab96dac

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390) 9de6d81

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917) 1a1acd2

CUDA: use async data loading for FlashAttention (llama/11894) 5b9980d

vulkan: implement several ops relevant for ggml_opt (llama/11769) 3c2171d

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902) 1c7a669

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904) afbd891

metal : optimize dequant q6_K kernel (llama/11892) 376cbe6

repo : update links to new url (llama/11886) 9705bb5

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528) 0d2e888

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

a027c1d

ggml : fix kleidiai build (llama/12159)

dbc0180

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)

1d6d451

ggml-backend : keep paths in native string type when possible (llama/12144)

6e89d8c

CUDA: compress mode option and default to size (llama/12029)

4ec988a

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

d6b6852

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)

d7d82b9

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

0b52fcc

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)

459beb1

CANN: Fix build error with GCC 13 (llama/11990)

dcf68db

vulkan: matmul dequantization improvements (llama/12015)

ffdf466

vulkan: improve im2col (llama/11826)

f6cff0a

cmake: Fix ggml backend dependencies and installation (llama/11818)

c6c2a2c

vulkan: fix assertion when qy_needs_dequant (llama/12068)

271c7e4

ggml-cpu: Fix build with sve (llama/12059)

4be146e

cuda: unary ops as float + de-duplicate (ggml/1130)

4bec2e4

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

f959b90

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

67e8c32

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)

3d4f29c

common : more general m_audio_len update logic (#2855)

4674264
unverified

go : improve model download (#2756)

168712d
unverified

common : fix audio loading by miniaudio (#2862)

494fb84
unverified

fix: missing include common-whisper (#2858)

2271d56
unverified

ruby : follow audio library change (#2851)

b94e7d3
unverified

whisper : support GGML_BACKEND_DL (#2843)

2e6437e
unverified

common : separate whisper sources (#2846)

0447b9d
unverified

common : fix build min/max (#2845)

07533a2
unverified

examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)

7a280a4
unverified

stream : stop on ^C when no audio is received (#2822)

45399ad
unverified

sync : ggml

7926873

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)

2b94a24

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

6c8e7ec

opencl: fix for small models (llama/11950)

4532dc6

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)

14fd317

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)

310a36c

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)

4aa54ec

CUDA: app option to compile without FlashAttention (llama/12025)

fbc5f16

CUDA: optimize FA for GQA + large batches (llama/12014)

6662d54

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

6cb8158

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

6641178

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

ab96dac

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)

9de6d81

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)

1a1acd2

CUDA: use async data loading for FlashAttention (llama/11894)

5b9980d

vulkan: implement several ops relevant for ggml_opt (llama/11769)

3c2171d

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)

1c7a669

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

afbd891

metal : optimize dequant q6_K kernel (llama/11892)

376cbe6

repo : update links to new url (llama/11886)

9705bb5

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)

0d2e888