Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -52,21 +52,6 @@ The tensor operators are optimized heavily for Apple silicon CPUs. Depending on
|
|
| 52 |
instrisics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
|
| 53 |
the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
|
| 54 |
|
| 55 |
-
## Limitations
|
| 56 |
-
|
| 57 |
-
- Inference only
|
| 58 |
-
- No GPU support
|
| 59 |
-
- Very basic greedy sampling scheme - always pick up the token with highest probability.
|
| 60 |
-
This should be similar to the [GreedyDecoder](https://github.com/openai/whisper/blob/main/whisper/decoding.py#L249-L274)
|
| 61 |
-
from the original python implementation, so in order to make a fair comparison between the 2 implementations, make sure
|
| 62 |
-
to run the python code with the following parameters:
|
| 63 |
-
|
| 64 |
-
```
|
| 65 |
-
whisper --best_of None --beam_size None ...
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
In the future, `whisper.cpp` will support more sampling strategies.
|
| 69 |
-
|
| 70 |
## Quick start
|
| 71 |
|
| 72 |
First, download one of the Whisper models converted in [ggml format](models). For example:
|
|
@@ -220,6 +205,21 @@ make large
|
|
| 220 |
| medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
|
| 221 |
| large | 2.9 GB | ~4.7 GB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
|
| 222 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
## Another example
|
| 224 |
|
| 225 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|
|
|
|
| 52 |
instrisics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
|
| 53 |
the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
## Quick start
|
| 56 |
|
| 57 |
First, download one of the Whisper models converted in [ggml format](models). For example:
|
|
|
|
| 205 |
| medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
|
| 206 |
| large | 2.9 GB | ~4.7 GB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
|
| 207 |
|
| 208 |
+
## Limitations
|
| 209 |
+
|
| 210 |
+
- Inference only
|
| 211 |
+
- No GPU support
|
| 212 |
+
- Very basic greedy sampling scheme - always pick up the token with highest probability.
|
| 213 |
+
This should be similar to the [GreedyDecoder](https://github.com/openai/whisper/blob/main/whisper/decoding.py#L249-L274)
|
| 214 |
+
from the original python implementation, so in order to make a fair comparison between the 2 implementations, make sure
|
| 215 |
+
to run the python code with the following parameters:
|
| 216 |
+
|
| 217 |
+
```
|
| 218 |
+
whisper --best_of None --beam_size None ...
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
In the future, `whisper.cpp` will support more sampling strategies.
|
| 222 |
+
|
| 223 |
## Another example
|
| 224 |
|
| 225 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|