this week open AI released a new version of whisper their audio to text model it's called turbo and supposedly has faster transcription speeds than the large model with a minimal reduction in accuracy in this video we're going to try it out on my 2021 M1 Max machine with an MP3 file containing an episode of the AI Daily Brief podcast so I'm trying out the UV python package manager so we're going to tell it that we want to run with open AI whisper and torch and then we're going to call the whisper command line tool passing
in the MP3 file and then telling it which model we want to run with and you can sort of see from the warning messages we get here it's kind of saying fp16 is not supported on CPUs that's kind of weird it's like making me think maybe it's running on the CPU and you can sort of see here in real time this is transcribing really really slow so this is not what we want at all so let's kill that and we're going to run it again and this time we're going to pass in device MPS which
should hopefully make it run on the GPU so let's run that and you can see sort of fails quite quickly so let's scroll up it takes a little little while but we've got an error at the top so I've been searching this and apparently this is a known error on GitHub so using the whisper CLI from the repository is a noo if you're working on a Mac luckily for us there is another tool that we can use called insanely fast whisper that uses hugging Face's Transformers library and luckily for us the hugging face folks have
got it working and uploaded the weights to Wacom SL whisper hyen large hyen V3 hyphen Turbo so let's give it a try so we need to set this environment variable first and we're going to first run it on our file passing in the device ID MPS and we're going to use the large B3 model first so we have a comparison this one is smaller and therefore faster than large we're going to use that as our Baseline we'll give it a batch size of four I found anything bigger seems to give me out of memory exceptions
and then we're going to tell it where to write the transcript and we'll kick that off and you can see it's sort of running we'll speed it up for the purposes of the video and it finishes in just over 3 minutes let's now try the turbo model so we're going to update it to write the transcript to a different Json file and then we'll update the model name to point to Turbo and then we'll kick that one off again we'll speed it up and this time it finishes in a little bit under 2 minutes so
we can see there's been a speed up I've tried it on a bunch of different podcasts files which you can see on on this table here so sort of different lengths and I've tried it with both of the the models and apart from the one that we just looked at which is about 1.7 times faster and is generally slower I think probably because we are recording that at the same time as we were running the The Tool uh it's generally 2.3 2.4 times faster than the large V3 model so keep in mind when looking at
these numbers that I'm using a 2021 M1 Max with 64 GB of RAM so it is possible if you're using a 2023 M2 Max or maybe an M3 Max or an ultra you're going to get it even faster than this now let's have a look at what it's come up with so so we'll just have a look at the Json file we'll have a look at the keys in there so it's got chunks it's got speakers and it's got text speakers is generally empty but let's have a look at the chunks that we've got and
if we scroll down a bit we can see this episode is about miror moradi leaving open Ai and then further down Sam ultman replying to that so the content is more or less similar in the Jason file for the large model but let's see what happens if we ask an llm to Summarize each file so I'm going to cat the file we're going to pull out the text and then we're going to call a llama and we're use the 3.1 model and we're going to say above is a podcast transcript what are your three main
takeaways and so it identifies that people are leaving because of exhaustion AI investors will pay whatever's needed and more transparency is needed at open AI so that was the turbo model let's now try with the large model so it looks a pretty similar summary to me there is a hallucination on mirror's name that isn't actually the case in the files I'm not sure what went wrong there and it also identifies that the podcast host has some skepticism about what open AI doing so I'll put all the links to the Json files in the description below
and if you want to learn more about insanely fast whisper check out this video next