do you want to convert speech to text in your own project but don't know where to get started then look no further because in this video we have a look at the best free speech to text apis and also at the top open source libraries for speech recognition converting speech to text is an exciting but also a challenging task luckily there are existing solutions out there that we can use basically we have two options we can either use an api or we can use an existing open source library so in this video we have a look at the best free solutions of course normally you have to pay for an api but all the listed services in this video also come with a free tier that might be enough for a simple project or to get started with your mvp so before we have a look at each service and library let's go over the advantages and disadvantages of both approaches with an api it's much easier to get started you don't even need any deep learning related knowledge how the underlying model actually works apis usually offer a well-trained state-of-the-art language model so the accuracy is much better and it can offer additional out-of-the-box features like entity detection or sentiment analysis but on the downside you have to pay for the service and you always need an internet connection to access it on the other hand open source libraries are completely free and with open source you can see what's going on under the hood and you can even contribute and help to improve it also by working with open source libraries you learn a lot but on a downside it can be difficult to set up and oftentimes you need a lot of prerequisites for example a lot of libraries require a linux build system and you need a good gpu and you need programming skills and oftentimes also deep learning specific knowledge for a speech to text library so now that we know about the different pros and cons of each approach let's go over the different options we have first let's have a look at the different speech-to-text apis that also come with a free tier google's speech to text api is probably the most popular api for speech recognition they offer 60 minutes free transcription per month and as a new user you also get 300 in free credits for google cloud after that it costs 0. 006 dollar per 15 seconds or 0. 009 per 15 seconds depending on the different options their api has a good accuracy and support for over 60 different languages on the downside you need to sign up for a google cloud account and create a project in there and it's surprisingly complicated to get started with it next we have a look at assembly ai assembly ai offers a state-of-the-art speech to text api which is built for developers their api documentation is great and they also provide a lot of tutorials so you can get started and integrate speech recognition into your app in under five minutes with a free tier you can transcribe three hours of audio content each month and after that pricing is very straightforward transcribing simply costs 0.
00025 dollars per second this results in 0. 00375 per 15 seconds as compared to the 0. 006 per 15 seconds we have with google additional optional audio intelligence features cost 0.
000 dollar per second on top which makes the total amount still pretty cheap and these features are awesome you can get sentiment analysis content summarization topic detection entity detection and much more and all of this can be obtained with a few simple api calls now on the downside as of today assembly i only supports english transcription but more language models will be available soon and also their sdks are still a little bit limited but their api is so easy to work with that it allows for a quick setup with native http libraries in any programming language so out of all options in this video i think this is the easiest one to set up and the last api option i want to show you is the aws transcribe service the free tier offers one hour free per month for the first 12 months of use pricing can vary depending on different options but in the first category it is for example 0. 024 per minute which is 0.