Name: Build a Document Summarization App using LLM on CPU: No OpenAI ❌
Duration: 41 min 18 s
Channel: AI Anytime
Description: hello everyone welcome to AI anytime channel in today's video we are going to create a streamlit application to summarize documents and this application will be powered by a language model called l...

hello everyone welcome to AI anytime channel in today's video we are going to create a streamlit application to summarize documents and this application will be powered by a language model called lamini plan T5 248 million parameters that we have so we are going to use the language model in this video to see if we can summarize the documents no and this is completely a open source model we are not going to rely on open AI here we do not need any kind of API keys to build this application so we are going to use this language model which is not that large so we are you know living in the era of large language models okay where you would have seen models like llama alpaca you know Dolly Etc have been in released by different organizations and research group and these are all open source models of course the licenses are different you know to use that for development of commercial purposes so in the in this video we are going to rely on lamini flan T5 okay which is not that large have around 248 million parameters and therefore we are going to say it's fine tuned on flying T5 which is a language model by Google again plan T5 is a very underrated language model OKAY in the communities it's extremely underrated okay uh a model which was you know released by Google a few years ago and this is fine tune uh model lamini we are going to use that and it has couple of pipelines that I am familiar with one of the pipeline is summarization pipeline the other pipeline it is also had a text generation pipeline so they are going to use the summarization pipeline you know directly from the hugging phase where this model has been deployed and there are some uh inference apis also you can use that if you want so we are going to see how we can leverage this language model to build this application in streamlit and in upcoming video and also so create an API using fast API for the same and we can also have a web app there with maybe fast API and react for example to build this kind of an application so in today's video we'll we will focus on lamini flying T5 and uh and we'll build this application using streamlined so if you see currently I am on their uh hugging face repository okay it's it's by mbzuai which is a university in United or of Emirates if you see here Muhammad bin jaid University of artificial intelligence it's this university has great potential because when I am currently also pursuing Masters in you know Ai and when I was looking to different universities where I can pursue my masters I also shortlisted this University it's it's it has great potential because kind of the research that the research facility that they have and the faculties that who are teaching it you know in this University okay so if you come down you'll see all the faculties their programs so they have master's program in I guess in all the subset of AI like you know deep learning national language processing machine learning and computer vision so they have their master's program separately if you want to you know pursue your Masters you can maybe consider this University as well if you see they have you know computer vision machine learning natural language processing Etc okay and they are doing good you know and if they have a PhD program as well their faculties their research and they're still taking admission for at least UAE Nationals and it's so if you are outside Emirates you may need uh some interest exam that you have to clear for example GMAT or GRE Etc that you have to clear and so they have developed this you know fine tune this model okay and if you come over here Mohammed bin University of artificial intelligence which is a university there is their species of having face if you see their organization card they have multiple models here they have around 22 models most of their models are based on lamini okay so ECC laminated series 256. they have multiple lamini model okay so we are using this one lamine plant E5 248 if you come over here it says a diverse herd of distilled models from large scale instructions okay so this is the model so that contains 2. 58 samples for instruction fine tuning so you can go ahead and read the paper as well they have data set that contains these amount of samples and their base model okay so this flan T5 is the base model where they are fine-tuned on the lamini LM series and we are using this 248 okay which is not that large but you know gets the job done guys okay it's one of the uh halfway crucial model when it comes to having a less than 500 M parameters but it still works okay and it's easy to set up on our local CPU machine as well that's what we are going to try in this video we are going to use our CPU machine to create this application okay so if you see they have given you some uh code bases as well but will not rely on this okay these are there so as I said right they have text to text generation Pipeline and they have summarization pipeline as well so we are going to uh use the summarization pipeline in this video but if you want to maybe create a local GPT or private GPT people are talking about that in Industry currently you'll see multiple videos on people are using private GPT local GPT mini GPT etc etc which are again based on you know vikuna and alpaca or llama models uh mainly and they are difficult to set up on our CPU machine it works hallucinates a lot okay but Hallucination is the problem with language model guys so let's see how we are going to uh build this application so what do you have to do first what I have done in my case you can either load the model from hugging phase okay all right you can download this complete you know uh folder that files and version that you see you can keep you can create a folder called laminiflang T5 240 and you can download all these files and keep it locally in your CPU machines if you see I have it over here so you can see this is a folder lamony flight this is the model Name by the way okay the checkpoint okay so this is called the checkpoint when we have worked with hugging phase model so this is a checkpoint and all the files are inside it I have downloaded it so everything remains private nothing goes outside of your environment okay and you don't have to rely on internet as well once you download all these things for the first time you know you need git lfs large files okay so there's something called git lfs to you know uh basically clone this completely you can do that as well so if you come over here you can see all the details you know if you have if you want to use it again first to deploy your models you can also deploy that and all okay so let's do that okay so if you come in this folder I have a folder called lamini llm summarization okay it's not that large but it's still language model okay and then you know in data I have couple of files okay that um I want to summarize so I have some document okay that I would like to summarize let me show you that document guys so these are the document that we're going to use okay this is one document this is I have downloaded basically I've created this okay you from McKenzie you know uh McKenzie website I have taken the article I've created a PDF file there to just to summarize so we'll use uh luncheon okay to perform the heavy lifting for us no uh file loader pre-processing etc etc will be done by Langston and then we'll have lamini flan T5 248 to summarize that okay we're able to use the pipeline of summarization so let me do one thing let me just do a code dot here so what I will do I'll open that in vs code and now you can see I've opened that in vs code and the first thing is that these are the requirements that you need so you need line chain you need sentence Transformers torch sentence piece accelerate are the you know dependencies okay oh I have also kept chroma DB because I'm also creating one more video as I said one more video will be created with the text generation pipeline so for that we need embeddings vector embeddings and that's why we'll use chroma DB there the other video that we are going to uh you know create very soon and then we have Pi PDF tick token stimulate Etc you know these are the requirements okay you do not need fast API you become Python multi-part and AIO files for this video okay so particularly this video but I am going to use the same environment when I'm creating the fast API based API and application as I said in the beginning of this video so I have created a single environment and I have installed everything into it okay so this is the thing that you need and I have lamini flan T5 248 million and I have a data where I've saved this data by the way okay so let me just do one thing let me create a file called uh not inside data here okay so what I'll do I'll create a file called app.

pi so in app. pi here okay we'll start writing the code guys so what what are the things we need so we need lunch in we need torch Transformers who are the back end engine okay so for you know lamini uh flan T5 248m so let's first import so the first thing that I'm going to import is import streamlit as SD if you are not familiar with streamlit it's a web framework okay I will help you create data science app faster okay so it's very simple you don't need any web Technologies expertise and all okay so import stimulate add SD now the thing that I need is from langchain dot text splitter we are going to split the text in the document so text filter import and I'm gonna use recursive text character text splitter okay pretty much straightforward length and documentation is one of the best documentation I have ever seen guys you know in my Development Career of six seven years okay Langston is has the best documentation out there and they are growing so fast I think one of the most used Library this year okay that's how I you know I this is my guess that this library has been the most used okay in the recent times so what we are doing we're going to use recursive character text splitter okay from the LinkedIn dot text splitter okay so you you need some chunk overlap you need the chunk size Etc okay I will keep a smaller chunk size just to do it faster and then we need document loader because we are going to use a PDF file in this case you can again work with you can extend this further with you know with for Word document for txt file Etc Json Etc as well okay so from Langston dot document loaders import we need Pi PDF loader let's say Pi PDF loader and then we need directory loader directory order document loaders yeah this looks nice and the next thing we need is Dot uh chains dot I need the summarizes summarize chain okay so you can see this Samurai chain it has multiple chain question answer retriever you know conversational it's a multiple chain you can use that okay so from Langston dot change dot summarize import load some right chain okay if you want to know more about shame you do not have to go anywhere you have to go to Langston documentation as I say write it they have covered everything just go on lunch and docs in the left hand side you will have let me just show you also what I'm talking about so just write 19 documentation go to Langston documentation welcome to Langston and you will see in the left hand side you have everything you do not have to go and watch videos etc etc okay it has everything that you need okay so if you come on change you will see in the change that you'll find change in models you will find models llms click on llms you will see everything that you need guys okay so if you summarization so come on summarization you see I'm using this load summarize chain as I said right they have chain type map reduce stuff you know they have every different type of chain type as well so just go through the documentation it will help you learn faster you can also from LinkedIn dot change dot summarize is done we move to Transformer thingy okay so from Transformers so from Transformers import we need T5 tokenizer okay we need tokenizer and the way we are going to load the tokenizer and model so we'll use conditional generation for model and we'll use T5 tokenizer because we are using flan T5 so T5 tokenizer let me write from Transformers import T5 tokenizer and t54 conditional conditional generation I don't know why it's not also suggesting me but it's okay T5 for conditional I think I have it's fairly strong conditional okay this looks good so from Transformers import T5 tokenizer T5 for conditional generation now from Transformers import let's have pipeline where we are going to use the pipeline of summarization pipeline okay Opera Mini model and then torch without torch because we are going to use the offload that we are going to define the floating point if required okay so import torch and then we need import I don't know let's keep base64 the reason I'm using basically for I will tell you because on streamlit we are going to have a PDF viewer where we will show the PDF okay display the PDF basically so before that we need base64 now for that file encoding Etc so for import base64 is done the first thing that we need guys guys is to load the model and tokenizer so we have to define the checkpoint so model and tokenizler so for that let's first Define our checkpoint because we have downloaded the model locally model files which everything locally because we don't want to rely on you know internet even anyway it doesn't require internet because it's when you run for the first time through hugging phase it stores in the local cache but it's better to have it download it locally because it's not that big okay so you can do that so checkpoint and in this checkpoint what I'm going to do I'm going to Define this okay I do not need this laminate plant to your folder name where you have kept the model okay so model name and you can see python. bin which is the important file here okay the model weights and that's what you need so our checkpoint is defined so let's have a variable called tokenizers in this tokenizer we'll use T5 tokenizer from pre-trained pretty much straightforward if you have worked with 13 phase models so from pre-trained we are saying okay go to T5 tokenizer module or class and from pre-trained load that so checkpoint and then we have base model so in base model what I'm going to do here in base model I'm going to say T5 for conditional generation and from pre-train so let's do that from P10 the first thing is checkpoint where is your model you know basically the where you have kept your model so if you have kept it locally because we have this checkpoint if you if you are trying to load it from hugging phase you have to give that repository name okay like mbj uai slash lamini whatever it is okay checkpoint and then we need device map a couple of parameters here device map equals Auto device map equal Auto and then we have torch D type so torch D type equals tors dot floor 32 so this is done torch dot floor 32 if you don't know what is device map so device map basically helps you to you know inference with bigger models okay so we have a model size which is too big okay so basically it has Cuda it has CPU it has Auto okay it has different different way of inferencing this model it's adjusted automatically so we have device map Auto it will automatically Shuffle it between your if you have good enabled you can use Cuda there if you have CPU if you only want to rely on CPU you can also use CPU so maybe you can read more about device map on the hugging fetch documentation so what we are doing here guys we have tokenizer so from tokenizer we are saying T5 tokenizer Dot from pre-trained we are loading the model from a tokenizer from this lamini flan T5 248 and then we have this base model from conditional generation from pre-trained device map Auto torch D type dot float32 so there are different you will see the data types that we are loading the model we are you know uh this is based on the 10 Source value okay so you you would have heard about 4 bit quantize five bit the quantized model you know you have different types of models there uh way of loading model that will help you run on CPU as well so we are relying on Flow 32 you know for the 10 Source torch dot floor 32 so our model is done tokenizer is return okay we have loaded it successfully now what we what we need is we need the we need something that will load the file and will perform the pre-processing okay we need we have to split the text okay and that's what we are going to do so we are going to rely on land chain for that so I'm going to write that function because I already have used in many of my previous videos okay so I'm just going to take it from my gist you can see I already have it here pdfloader.

pi so I'm just going to copy this okay if you want to use that you know uh you can also I can share this link in the description but I have used the same function in you know some of my previous videos you can go ahead and look at it as well so what I'm doing here I have a function called file preprocessing which is taking file as an input file that will be a PDF file and we are using a pi PDF loader class from line chain okay the document loader and we are using it load and split function that loads it and split it and then we have chunk size of 200 which we have a small chunk size here because that we want to run a little faster we have churned size 200 you can play around this number so okay chunk size chunk overlap Etc and then we are splitting the document because there might be number of pages stuff and then we have final text we are restoring we have a final text which is an empty string at this moment because we only need the text so from text in text you can see final text equal to final text plus text Dot Page content you only need the content page content okay and we also need spaces in between so it takes care of it so we are returning the final text okay so we're going to use this function okay and that is completely based on line chain it helps you you know do the heavy lifting kind of stuff if you don't want to use LINE chin you can use Pi pdf2 or you know PDF plumber etc etc there are other libraries to work with the PDF files so I basically go with LinkedIn because it's easy to you know use its it does the heavy lifting for you okay now we are done with this now here we have to write the LM pipeline guys okay the language model pipeline okay so let's write it so I'm just going to define a function called Define llm pipeline Define llm Pipeline and in this I'm just gonna do a file path or something okay Define llm pipeline and here we're going to write some function here so different let me just okay Define llm pipeline okay file path and so let's create the pipeline guy so pipeline as I said right it has two pipeline okay one is the summarization and the other is text generation so in pipe sum we are going to define the pipeline so in pipeline okay just terminate pipe sum equals pipeline and in this pipeline we are going to write all our arguments that are required okay the arc so pipeline the first thing that we are going to use is summarization pipeline so we'll just write summarization it also has a text generation Pipeline and you can see that on having phase repository as well so the first pipeline that we're going to do is summarization okay if you want to use this model for text generation you can please try that as well okay you can use some Vector databases for embeddings you can store your embeddings and then you can use inside this pipeline you can pass embeddings and llm okay in the chain of line chain okay question answer retrieval or conversational or whatever chain that you are using so here we are using the summarization our model is nothing but the model is base model so model equals base underscore model is done then the next thing is tokenizer so tokenizer will be tokenizer token either equals to tokenizer and then a maximum length Okay so let's keep it 500. maximum length that we are returning is 500 and minimum length is 50. you can again change this number if you want so minimum length is 50.

so we are done with this so we have defined our pipeline okay so we have a variable called pipe sum pipe underscore sum which uses a pipeline class okay from Transformers and then it uses a pipeline from lamini model which is summarization and then we're defining the model tokenizer maximum length minimum length you can use multiple other arguments if you want to use that okay there are multiple parameters there okay so maximum length minimum length okay now the next thing that we do is so let's have a variable called input text and in this input text we are going to use that function file preprocessing function and if you want to use the file path over here this function that we are using here because that loads the file does the pre-processing returns the final text now that Wing is stored in input text now we have input text we'll have result and in result we'll use this you know pyth sum so result and let's use pipes on me and in this pipe sum what we are going to do is we're gonna pass the input text and we only need the summary text okay so we have to basically extract the value summary text so what I'm going to do here okay I'm just going to write again result equals result the first one and then the summary text because this is what lamini model you know uh return guys so it returns with something some metadata as well so somebody takes and then I'm just gonna do return result that's it so what we have done in this function so we have llm pipeline function where we are passing the file pipe sum we have a variable where we are defining the pipeline using the summarization pipeline passing the model tokenizler defining the parameters of maximum length and minimum length and then using the function that we have written above file preprocessing and again we are getting the result and extracting only the summary takes we do not need all the details like you know maybe some other details like for example number of K Etc we don't need that we only need summary text okay now what we're gonna do is write the streamlit code okay so let's write that so the first thing that we're going to do here let me just write extremely true so one more thing we need one more function we need because we are going to display the uploaded PDF file so we'll have a file uploader PDF and we also want to show that video file so for that let's have I am also using my previous function you can see this function display PDF file it helps you display PDF files on a streamlined application so I'm not writing the code it's also given on the streamlined community in the discussion forum so what we are doing here we have you know we are using cache data okay so there are two different types of cache decorators available in stream lead so cache data and cash resource okay so in cash resources if you are dealing with numpy operations or models Etc you can use that cache you don't have to it's stored in the cache okay you don't have to load it every time and when we're doing cache underscore data it's for your CSV files Json file Etc files that you have okay so that's what we are using so we're having a display PDF file passing the file you know reading it base64 encoding okay then we are embedding PDF in HTML because on the estimate UI we are going to show so we are doing the iframe okay passing this base64 PDF which contains the PDF base64 and we are defining width and height and just using markdown because team leads the post supports markdown okay so displaying the file over there now let's write the extremely code guys so the first thing that I always do when I write stimulate application uh create stimulate application is defining the layout I need a wider layout so the first thing that you should do is set the page configuration so set page config and in this case is layout equals wide so let's do that so layout equals wide this looks good and then what I'm going to do here I'm going to write a main function and inside this let me just do a pass for now let me come down if name underscore underscore okay now here we will write our estimate code which is pretty much straightforward so Define Main okay and the first thing that we need is let's have some title and also SEO title and here we'll write document for example anything let's give a name or to name of this app document summarization you know app using language model or something okay this looks good okay so this is the title and maybe here also we have a parameter called page title okay so in page title we can write you know summary app or summarization app something like this summarization app so this title looks good okay we have SEO title now we need first one thing with the first thing that we need to uploaded uploaded file and in this uploaded file what we're gonna do we're gonna use the file uploader function so HC dot file uploader extremely dot file uploader give a label to that and say upload your PDF file or something upload your PDF file and then let's define an extension type that only PDF is allowed okay this is done so we have uploaded a file HD dot file uploader now what we're going to do here okay we're going to check if there's a value inside it so if uploaded file is not none if uploaded file is not none we're gonna say let's have a button now so if uploaded file is not known if HD dot button and then we'll have summarize button when you click on the summarize button the next set of course would get executed so if XP dot button summarize and here we'll have two columns guys so let's have two columns so the first is column one column two and we're gonna say SD dot columns I'm gonna divide the layout into two parts column one column two columns two and with column one so everything that goes inside column one let's have a info I'm just gonna say uploaded PDF file something like this let's have a column two also and let's see let's run the app now so with column two I'll say HD dot success or not success let's have info and let's call it summarization is below or something summarizing PDF file or let's say summarization is complete for now let's have summarization is below or something okay now let's run this guy so we have you know actually we have written the code for the template okay where we are going to have two columns in column one we are going to upload so the uploaded PDF file and in column two summarization that we are going to show Okay so let's let's see how we're gonna run this so I'm just gonna say hey stream lead run app. pi that okay I have to activate the virtual environment so CDV EnV CD scripts dot slash activate on power sales and CD dot dot CD dot dot let me do a clear and then I'm gonna just use the arrow command to get that stimulator on app. pi it says cannot import name T5 tokenizer from Transformer okay so this is the error that we are getting cannot import name T5 tokenizer ah okay I think I have done something wrong in the import this should be caps T5 tokenizer T5 for conditional generation this looks nice so T5 token air conditioner resistance okay now let's go back let me just save it Ctrl s come back here rerun it will take little time on the first time if you're running it for the first time it has it takes little time so now if you see document summarization app using language model this is the UI that we have got guys okay so now here you can do one thing you can upload a file up to 200 MB okay your PDF file I'm gonna upload a PDF here okay so if you come on desktop I'm gonna use this document or maybe document one for now once you click on document one PDF it will show you the summarize button okay once you click on summarize button it will use LinkedIn function that we have written for pre-processing and once the pre-processing steps are done when you have the final text that you pass it to lamini LM model and then lamini La model will utilize the summarization pipeline to return user summary for this PDF file okay let's click on summarize okay so now we are not seeing it because we haven't completed the code yet so you see that we have two columns uploaded PDF file and summarization is below so in this uploaded PDF file will have the PDF will show the PDF okay the PDF will be displayed and in summation we'll write the summarization function we'll utilize the summarization function and we'll return the summarization if you come down let's write it over here so here in HD dot info uploaded file okay what I'm gonna do in this case is so let's define that I'm just thinking where should we Define the we need a file path so let's define that on top because we need it for both The Columns so here we'll Define maybe file path and in file path we have our data folder you can see this data folder so in this data folder so data and in this data let's have maybe data Plus and our uploaded file and Dot name we need uploaded file dot name the name of that file and we store this here in the data and that's what our file path now contains and we can use file path in both the columns column one basically to display that and column two as well so that let's see that okay so now with column one or so now what we're doing we are storing that file here in the data file so let's let's do one thing okay we also have to use open here to read that so with open okay right so file path so let's write that in the in the data folder so as maybe other temp file so I'm just going to do as time 5.