foreign today we will take you through a hands of lab demo of how to detect fake news using machine learning before we start I hope this screen is clearly visible and the audio is fine if yes please type in yes if there are any issues do let us know in the chat section so that we can resolve them I'm repeating again before we start I hope this screen is clearly visible and the audio is fine if yes please type in yes if there are any issues do let us know in the chat section so that we can resolve them let's wait for some more minutes to let us the people join until then let me tell you guys that we have regular updates on multiple Technologies if you are a tech geek in a continuous hunt for the latest technological Trends then consider getting subscribed to our YouTube channel and press that Bell icon to never miss any update from Simply learn I'm repeating again let me tell you guys that we have regular updates on multiple Technologies if you are a tech geek in a continuous hunt for the latest technological Trends then consider getting subscribed to our YouTube channel and press that Bell icon to never miss any update from Simply learn great I think we can get started so in today's session we will discuss what fake news is and at the end we will do a hands of lab demo of how to detect fake news using machine learning before we move on to the programming part let's discuss first what fake news is and proceed further for the same what is fake news polls are misleading information that is reported as news is called fake news a common goal or fake news is to harm someone or something reputation or to profit through advertising the term fake news was first used in 1890s a time when dramatic newspaper reports were common even though incorrect information has always been disamented throughout history however the phrase has no clear definition and is often used to refer to all misleading information high profile individuals have also used it to refer to any news that is not favorable to them so dear Learners if you want to upskill your AI and machine learning skill so give yourself a chance to Simply learn professional certificate program in Ai and machine learning which comes with a completion certificate and in-depth knowledge of AI and machine learning check this course details from the description box below now let's move to our programming part so first we will open a command prompt to write a command to open Jupiter notebook so here we will write Jupiter notebook Center and here I have to select new python kernel file okay so this is how the kernel look likes so first we will import some major libraries of python so here I will write import ance as PD and import numpy as NP then import c bond that's SNS and import SK learn dot model selection port train underscore test underscore split before that I will import matplotlib dot Pi plot as PLT okay then I will write here from SK learn Dot Matrix import accuracy or than from SK learn dot matrix Airport classification to report at Port Ari then import string okay then press enter so it is saying okay here I have to write from everything seems good loading let's see okay till then number is a python Library used for working with arrays which also has function for working with domain of lineal algebra and matrices it is an open source project and you can use it freely number stand for numerical python pandas so panda is a software Library written for Python programming language for data manipulation and Analysis in particular it offers data structure and operation for manipulating numerical tables and Time series then Seaborn an open source python Library based on matplotlib is called C bone it is utilized for data exploration and data visualization with data frames and the pandas Library c-bond functions with ease than matplotlib for Python and its numerical extension numpy matplotlib is a cross platform for the data visualization and graphical charting package as a result it presents a strong open source suitable formatlab the apis for matplotlib allow programmers to incorporate graphs into GUI applications then this train test split we may build our training data and the test data with the aid of SQL and train test split function this is so because the original data set often serves as both the training data and the test data starting with a single data set we divide it into two data sets to obtain the information needed to create a model like hone and test accuracy score the accuracy score is used to Gorge the model's Effectiveness by calculating the ratio of total true positive to Total to negative across all the model prediction this re regular expression the function in the model allow you to determine whether a given text fits a given regular occupation or not which is known as re okay then string a collection of letters words or other character is called a string it is one of the basic data structure that serves as the foundation of manipulating data the Str class is a built-in string class in Python because python strings are immutable they cannot be modified after they have been formed okay so now let's import the data set we will be going to import two data set one for the fake news and one for the True News or you can say not fake news okay so I will write here EF underscore big question PD Dot read underscore CSV or what can I say TF fake okay it underscore fake okay then pick dot CSV you can download this data set from the description box below then data Dot true equals to PD dot read underscore CSV sorry CSG then fake news sorry true dot CSV okay then press enter so these are the two data set you can download these data set from the description box below so let's see the board data set okay then I will write here data underscore fake dot head so this is the fake data okay then data underscore true Dot and this is the two data okay this is not fake so if you want to see your top five rows of the particular data set you can use head and if you want to see the last five rows of the data set you can use tail instead of head okay so let me give some space for the better visual so now we will insert column class as a Target feature okay then I will write here data let's go fake Plus equals to zero then Theta underscore true and Plus one okay then I will write here data underscore fake dot shape and data underscore true dot ship okay then press enter so the shape method return the shape of an array the shape is a tuple of integers these number represent the length of the corresponding array dimension in other words a tuple containing the quantities of entries on each axis is an array shape dimension so what's the meaning of shape in the fake world in this data set we have two three four eight one rows and five columns and in this data set true we have two one four one seven rows and five column okay so these are the rows column rows column for the particular data set so now let's move and let's remove the last 10 rows for the manual testing okay then I will write here data underscore speak let's go manual testing verse 2. data underscore fake dot tail was it last 10 rows I have to write here 10. okay so for I in range two three four eight one sorry zero comma 2 3 4 7 0 comma minus 1.
okay and TF underscore not DF data underscore fake dot drop one here instead of one I can write here I comma this is equals to zero place equals to true then data not here data underscore same I will write for I will copy from here and I will paste it here and I will make the particular changes so here I can write true that I can write true okay then I have to change a number two one six right 2 1 4 0 6 -1 same so press enter X is equals to zero alert since X maybe you mean double zero or of this okay we will put here double course and I'm putting this take dot drop i x is equal 0 okay in place okay and also write equals to a question yeah so okay access is not defined so now it's working so let me see now did the underscore fake dot shape okay and data dot true on data underscore true dot shape as you can see 10 rows are deleted from each data set so I will write here data underscore fake underscore manual testing yeah class equals to zero and data underscore true let's go manual underscore testing plus equals to one okay just ignore this warning and let's see data underscore fake underscore manual testing dot head as you can see we have this and then data dot sorry underscore true underscore manual testing dot at 10. this is this is done uh true data set so here I will merge data let us go merge pursue PD Dot concat concat is used for the concatenation data underscore fake data underscore comma axis equals to zero then data underscore merge dot head the top 10 rows yeah as you can see the data is merged here okay first it will come for the fake news and then with that for the True News and let's merge true and fake data frames okay we did this and let's merge the column then data dot merge Dot columns or let's see the columns it is not defined what are the data underscore much these are the column same title tag subject date class okay now let's remove those columns which are not required for the further process so here I will write data underscore or request to data underscore merge crop title we don't need that subject we don't need then so one so let's check some null values it's giving here because of this that's good then data dot is null dot sum Center so no null values okay then let's do the random shuffling of the data frames okay for that we have to write here data equals to data dot sample one then data okay data dot hat now you can see here the random shuffling is done and one for the true data reset and zero for the fake news one okay then let me write here data Dot reset underscore index place because you true a dot dot drop comma X is equals to one then comma in place equals to true okay then let me see columns now data Dot columns so here we have two columns only rest we have deleted okay so let me see data dot at yeah everything seems good let's proceed further and let's create a function to process the text okay for that I will write here but okay you can use any name text and text equal to text Dot lower okay and text equal to re dot for the substring remove these things from the datas okay so for that I'm writing here comma okay then text equals to re Dot substring comma comma text okay then I have to write text equals to r dot substring to www Dot S Plus comma comma text okay then text equals to re Dot substring then oh comma yeah then text equals to re Dot substring and percentage as again percentage or RG dot SK function right here string dot punctuation comma and comma then text right then text equals to re Dot substring and and comma text equal to re Dot substring right here and again d that again then comma and again texture okay then at the end after idea return text so everything like uh this these type of special character will be removed from the data set okay let's run this let's see yeah so here I will add DF sorry not DF data data then text pursue data dot apply to the function name what part word opt okay and press enter yeah so now let's define the dependent and independent variables okay x equals to data text and Y equals to data class okay then splitting training and testing data okay sorry so here I will write X underscore train comma X underscore test then y underscore train comma y underscore test equals to train underscore test underscore split then X comma y comma test let's go size equals to 0. 25 okay press enter so now let's convert text to vectors for that I have to write here that it's X so here I will write from sqlarn Dot feature extraction Dot text import t vectorizer okay then vectorization equals to tfid factorizer okay then three underscore train equals to vectorization or resided the ion refactorization dot fit that transform X underscore train okay then XV underscore test equals to factorization condition Dot transform X underscore test okay then press enter [Music] so now let's see our first model logistic regression so here I will write from sqln Dot linear underscore model okay import logistic regression then a lot goes to logistic regression and have to write here LR Dot wait then XV Dot not DOT so dot train comma x v underscore test okay press enter dot XV dot train okay here I have to write y train and press enter we work so here I will write prediction underscore linear regression question dot predict XV underscore test okay let's see the accuracy score for that I have to write LR DOT score then XV underscore test comma y underscore test okay let's see the accuracy so here as you can see accuracy is quite good 98 percent now let's print the classification code I underscore test comma prediction of linear regression okay so this is you can see Precision score then F1 is code then support value accuracy okay so now we will do this same for the decision free gradient boosting classifier random Forest classifier okay then we will do model testing then we will predict the score okay so now for the decision tree classification so for that I have to import from SK learn Dot 3 import decision three classifier okay then at the short form I will write here I will copy it from here then okay then I have to write the same as this so I will copy it from here and let's change linear regression to season 3 classified okay then I will write here same let's go DT question DT dot predict XV underscore test e still loading it's it will take time okay till then let me write here for the accuracy DT DOT score V underscore test comma y let's wait okay let's run the accuracy so as you can see accuracy is good than this linear regression okay logistic regression okay so let me show you deep let me predict trend okay so this is the accuracy score this is the all the report yeah now let's move for the gradient boosting classifier okay for that I have write from sqlan dot ensemble port gradient boosting classifier pacifier I will write here GB equals to let me copy it from here I will give here random let's go state equals to zero wait wait wait wait so I will write here GB Dot fit three underscore train comma y underscore train okay then press enter here I will write predict underscore GB was who GB Dot wait sorry Reddit three DOT test dot dot underscore test till then it's loading so I will write here uh it's for this code then I will add GB DOT score that three underscore test comma y underscore test okay so let's wait it is running this part till then let me write for the printing this okay it's taking time taking time still taking time but if I will run this it's not coming because of this yeah it's done now so you can see the accuracies not good then decision tree but yeah it is also good 99 .