hello everyone very warm welcome to the channel in this video I'm going to show you how you can do video dubbing with the help of AI the project which we are going to cover in this video is called as Sony translate Sony translate is a very interesting project which provides you a very powerful userfriendly web app that allows you to easily translate videos into different languages this project host not only the code for the web UI but also it provides you a very comprehensive gradio library to provide a seamless and interactive user experience so we
are going to install it on free Google collab you can also use the same instructions to get it installed locally in your Jupiter notebook or in any python script so let's get started before I do that let me introduce you to the sponsors of the video who are agent ql agent ql is a cery language that turns any web page into a data source with its python SDK and live debugging tool you can script and interact with web content agent ql works on any page it is resilient it is reusable and it structures output accordingly
to the shape of your cury I will also drop the link to their website in videos description okay let's go to our Google collab now and it's a free Google collab you can access it on cab lab. google.com and from there once you can sign in with your free Google account click on runtime and select the free GPU provided by Google itself so very generous of them now let's install some of the prerequisites for this project and I will also give you the link to this notebook so no need to worry about all of these
commands so the interesting bit here is that it is using two TTS projects or models which we already have covered on our Channel the first one is C TTS C TTS is a library for advanced text to speech generation and it comes with more than 1,00 um language supported pre-trained models it also provides you tools for training new models and fine tuning but that is a topic for another video and I already have covered it on my channel another uh project which it supports is called as paror TTS that is another project which we already
have covered on the channel Chanel parlor TTS is a lightweight text to speech model that can generate high quality natural sounding speech in the style of a given speaker which means that it covers gender pitch speaking style Etc so pretty good selection of tooling behind the scene now if you look at this code on all we are doing it we just get uh cloning the repo of Sony translate and I will also drop the link to it in video description then we are installing some of the prerequisites like git lfs which allows us to download
larger files from git and then this is where we're installing all of these prerequisites this is a piper one and this is a ququ TTS one as I mentioned earlier and then it is installing both of these it is going to take bit of a time so let's wait for it to finish installation is still in progress while that happens let let me show you the list of languages it supports and there are heaps of them as you can see on your screen mostly all of the European languages are rapic and then the list goes
on and on and on Udu Hindi Vietnamese even the regional languages of India are supported and lot of regional languages from Central Asia Africa are also there which is quite interesting and then they also support transcription and then if you come down you see there are some non transcri transcription one the language which I don't even know I have never heard of them but very really good stuff and then on their GitHub repo page they also have shared lot of examples which you can check out okay so it is still running as I said it
is going to take bit of a time and if you are on Google collab Pro which offers you to have more PV GPU then of course the speed will improve but anyway I think this T4 GPU should be good enough we can wait and the installation is complete now next up we would also need to get authenticated with hugging face because we are using this stuff from there and in order to do that first we would need to go to hugging face website and then from the top right just click here click on your profile
on the left hand side click on settings and there click on access token and then if you don't have already a token just click on create not new token here may click the read one just give it any name and create token and then grab that token go back to your hugging pH and that's a free one by the way just click here I already have the token so when it is going to ask me I'm just going to so you see it is asking me the token here so let me quickly do that and
I have put in my token here and of course I'm going to rotate it after the video is done and you can see that now it is downloading the model from hugging phas it is going to download quite a bit of a stuff and then it is going to enable the piper TTS so let's wait for it and this is the one for that it needed uh that token and now we have this public URL let's click on it it it is going to start that gradio demo for us it is loading and there you
go we have our Sony translate where you can upload your video so you see you can um choose your video Source you just need to click here on this drop down and then there's a video there's a source language where it is you can also I think specify but it will do the automatic detection and then because on the previous page we had selected English and then you can just go with how many speakers are there which voice do you want from us or any other and there are a lot of options which you can
select from here and then you can even go with DTS text to speech based on the audio and there are few advanced settings which I would suggest you just keep it as is because mostly are already optimized okay so let me uh upload a video file here maybe I'll just upload one of my own small video file so I have just uploaded one of my video file as you can see here f2c and then I'm going to click on translate it is processing the video and then it should start transcribing it and translating it sometimes
it takes bit of a time because remember it's a free GPU if there is more load on Google's GPU you might face some issues and sometime it fails to where you just need to uh just rerun it so if if you're looking to do it on um regular basis and if you have massive video files I would highly suggest that instead of running it on free Google collab either you run it locally in jupyter Notebook following same instructions or you get a [Music] um you get some sort of um Google prolab Now you see that
it failed there and it is asking me that I needed to uh accept the uh agreement on hugging face website so let me show you how so there are two models one is for Pine note speaker diarization and the other one is this segmentation and by the way I already have covered both of them here in this video if you're interested I thought I already have uh done it but maybe it was another email so let me put in my information maybe for company I'm just going to say uh personal for website I will just
give it my own website address so I have just put in my website information Fahad misa.com personal and then research and testing let's agree so I have been granted the access and same thing for this one so let me put it in here and now I have access to both of the GED models so let's go back to our Google collab or gadio demo which we were running and I will just uh click on translate here let's wait for it so you see it has created few of the files here but um if you remember
we didn't set the target language here so now I have set it to Udu so because the audio is in English so I'm just going to click it translate again here let's translate it again shouldn't take too long it's quite quick I'm quite impressed by here so it is doing text to speech or TTS it is also doing the diarization and okay and you if you see here we have one model for diarization diarization in audio transcription refers to the process of identifying and labeling different speakers within the audio recording which determine who is speaking
at what time and then we have the segmentation and this segments the audio into speaker specific segments assigning a unique identifier to each speaker and that is where it creates also an SRT file so you see here not only it has created our U mp4 file but also the SRT file and SRT is subre subtitle file which is a plain text file that contains suitable information for these files like time stamp subtitle text and sequential numbering okay so let me download this MP4 which is now Udu and I will play both English and Udu for
you one by one and I will also show you a 30 file so let me just download it by clicking here okay let me play this video file first for a few seconds and I am Fahad Mah and we are both AWS Community Builders together we were looking at building a project for Dev Ember and while looking at generative AI we both realized how about senta letters okay so this is a very old video for a project which I did for AWS anyway so let me um play now the translated one okay so you see
that it is it has been translated but I don't think so it's an Udu language it seems like some closer language maybe it is Persian PTO something like that definitely not Udu so uh also it has selected the female voice I think we can also select the male voice from that cretive demo but I think you can play around with it uh as you know there are lot of configurations which you can do another interesting thing which I wanted to show you is that though the transcription doesn't look that good but it was able to
do this trans uh translation in text quite good in that SRT file so you see this is a subre title file which I was telling you about so if you look at this translation it's a perfect U which it has done but while speaking I think it was unable to pick that up and if I take you take you back to that creio demo so you translate this is where we needed to select that male voice instead of female and then we can play around with different sort of configuration to get it right but all
in all not a bad botle since you can run it on your free Google collab which is always a good thing to do so that's it guys I hope that you enjoyed it let me know what do you think about it if you if you like the content please consider subscribing to the channel if you're already subscribed please share it among your network as it helps thank you for watching