Hi l I have heard a lot about AI but I haven't seen very many applications using AI to solve real business problems do you know of any yes I do I wrote one is it yet another chatbot no it is multimodal AI let me show [Music] you it's so good to have you back on surus Expeditions again Lan happy to be here Martin what's your current job at Google I'm a customer Solutions engineer based in Google Singapore office my job is to drive technical Innovation and deploy customized AI solution in advertising sales and you have

Ryan with you there uh Ryan what do you do for Google I'm Lance steamate I'm also a customer solution engineer Lan and I built the application together that we're showing today very nice what business problem does your application solve in marketing you often create a long video ad first that video is used in some places like on YouTube or other video platforms then you reuse parts of that video to make shorter video ads for other form factors and platforms like YouTube shorts and other mobile devices so AI can do more than just write text uh

to respond to chat messages yeah that's right J doesn't only generate new text images or videos it can also enhance what you already have this can be very useful in business we often see Google Cloud customers combining generative AI with other AI Technologies I see now how does your application do that well it use AI to transcribe your video and identify shots in the video the data then is fed into generative AI we can then generate new videos we call that multimodel generative AI nice uh could you show us a demo of course here's the

last video that you and I shot together let's make a shorter version of it here are the topics it found here I can review the shorten video look it's only 3 minutes instead of the original 8 minutes it looks good so I will click generate video it will take a few minutes and then we get the shorten video in landscape and portray mode wow that's a lot easier than watching the entire video taking notes and then editing it manually in some video editing software uh what does the code look like the application is built in

Cloud functions that are called from the client side JavaScript in the users's web browser these are the most important functions transcribe video summar video and cut video the first function is called transcribe video its job is to generate a text transcript based on the audio Shack of the video it extracts the auto trans loo track from the uploaded video here then it loads that audio file into a recognition audio object and creates a speech client both those classes are provided by Google then it calls the low recog running recognized method which will create text from

the audio file this operation may take a while to complete it uploads the text transcript to cloud stge and finally it calls refined by video shots which uses Google video intelligence AI to sync the shots with the audio to make the transition seamless this is then returned to the color and then I'm guessing the summarized video function is called that's right the summarized video functions ask Gemini to summarize the text transcript first It prepares the data for Gemini then it calls the send transcript to LM function let's go to that file where that function is

defined the S transcript to LM function creates a text generation model that class has been provided by Google then it calls the predict method on that model and Returns the response to the caller and what is the prompt that you use for the large language model that's def find right here in the root prompt in variable our English prompt is you are a senior copy writer for an advertising agency who excels a summarizing transcript for video ad shorten the transcript by keeping important lines and removing other lines and here's the same prompt in Chinese for

when we are working with videos in that language and the third function you mentioned is cut video yes that function takes the original video and the summary that g I just created it creates a shorter video based on those inputs using the movie pie Library it doesn't call gemini or do any AI at all all right well thanks for showing us how this works uh as you were building this application uh what were your takeaways well generative AI is more than just chat BX it can also use enhance content that you already have and this

can be very useful in business we often see Google Cloud customers combining generative AI with other AI techniques and using multimodal AI please remind me what does multimodal AI mean it means AI can handle video audio text and other data Gemini was built and trained to be multimodal from the start and it looked pretty easy to call it using the Google libraries um so if someone watching this wants to experiment with multimodal AI how can they get started you can go to our GitHub repo name adlip you can look at the code and deploy it

in your own Google Cloud project sounds good L Ryan uh I will add the link to the repo in the video description below H thanks for sharing all this with us thanks for having me again Martin it was great to be here and thank you everyone for watching if you have questions for Lan Ryan or me please enter them in the comments below also let me know if there are other soless topics you'd like to see in future episodes I read every single comment until next next time [Music]

Multimodal AI