all right so after four videos this is probably going to be the last video about this project and today we're going to see the latest modifications and features that you have asked for and I have added and then at the end I will just pick some questions that you guys have asked me before and then I will try to answer them so we're going to start by talking about the API Keys you don't need ainv file you can just have your API keys in here and that's not only important if we're running this locally but also if we are running this on a Docker container which we are going to see later on the second thing that I've added is this attended mode it will allow you to interactively help your scraper to get the right data inside of your website and that's actually a very good suggestion that I got from you guys and the last thing that we are going to see is the docker container a lot of you have asked about this throughout my videos so I will show you where you can find the container and how you can run it on your machine and then of course I am going to answer some questions at the end that you guys have left me so let's jump right to it all right so first of all as always you are going to find this project in automation campus. com so here scrape Master 4. 0 this is where you're going to find the whole code and how to set it up or this time I have actually rehar it on GitHub I have created a new account because my last account I simply cannot share anything on it so I've created a new GitHub I don't trust them guys maybe they are going to suspend me again I don't know what's going to happen but as long as it's up you can go to GitHub and get it as fast and as easily as possible otherwise it is always going to be accessible in my websites scrap Master 4.
0 and probably this this is going to be the last update of this application okay so let's go back here and let's go back to API keys so as you can see here we have three API keys that we can give the application these three API Keys the first one is going to work for the first two models the GPT ones then the Gemini is going to work for Gemini Gro is going to work for Gro and llama 3. 1 8B this is where you run your llama in LM Studio locally so it does not need an API key so to test this out we are going to copy paste one of these API keys by the way you can just copy paste one of them as long as you choose the right model it is going to work so here I copy paste Gemini API key and I have chosen Gemini API if you want to have the three of them in there you can absolutely do that and you can always create a AMV file inside of your project as long as you are naming them like this open AI API key Google API key and gr API key and putting the values in here in the EnV file it is going to work you don't need to copy paste your API Keys every time in here anyways here we are going to have Gemini Flash and I am going to scrape data out of Amazon let's say for example I am going to search for Alexa and I will have a page like this and I want to scrape data out of this page let's put our URL in here let's enable the scraping let's say we want to scrape the name the rating the number of reviews the number of item sold or the number of item bought the price that's basically it so let's have the name the rating number of reviews number of item bought and lastly the price and here we are going to launch the scrape in let's see what's going to happen and as you can see in here it has failed I did not think that this will fail but this actually would help us demonstrate the second feature which is the attended mode so let's enable the attended mode and let's launch it again let's see what's going to happen this time so as you can see here in this case we can actually help it we can go here and then search for Alexa and then from there we can just say resume scrape in and then it will take the page from which we went and scrape all the data from that exact page not from the page where it has failed so I did not think I was going to actually show the attendant mode at this part I thought I was going to show the API keys but as you can see here this is actually a good use case for using the attended mode and this is what it has been able to basically scrape so here we have the name the number of item Bots the number of reviews the price and the rating good so now as you can see here we have the data that we want we can actually enable the pagination if we want to get multiple pages so let's go ahead and launch the scraper again just so that we can get the pagination because I want to show you something before going and showing you where the attended mode really shine so let's launch the scraper so let's actually go to the second page just so that it detects this little argument that we have here so it becomes easier for it to basically get the pages so here we have all of this and these are the pages that we want let's go back here and let's res Zoom the scrape in and that's good we got the URL and we got the pagination so here let's say if we go to this page that's actually really good we are on the page six anyways we got the pagination but what I wanted to show you is let's say for example I want to scrape from these three websites at the same time and I am going to copy paste these websites inside of here once you do that you're not going to be able to do the pagination or the attended mode because you are going to scrape from multiple websites and this becomes in a way a scraper at scale it does not make sense first to do the pagination because what pagination are you going to scrape from the first the second or the third page so it does not make sense or the attended mode because if we work with the attended mode every time it will open the page it will have to stop and then you will have to do the action etc etc and that basically makes it very prone to errors so this is why if you have multiple websites just make sure that you can actually access the websites and from there it can only do the scraping it's not going to do the pagination or the attended mode you can always go back to the code and change that behavior if it works for you but for me I think once you are going to script from multiple websites the only thing that you will be able to do is scrap in because otherwise it's simply going to be very slow all right so now let's actually talk about the attended mode so there are three main use cases where I have been using this attended mode the first one is when I need to log into some page meaning that I am going to launch the URL but I still have to log in to access the data that's number one the second is when I need to do some kind of UI interaction so I need to click on some buttons in order to show the data and the third use case is when I want to use it as a fullback option which is what we have just seen with Amazon so in case that you're not sure that the website that you are going to open is not going to block you you can use it as a fullback option and now let's actually go to the example so here I am at one of my videos and what I want to do in this case is to get all the comments that I have here because I want to use these comments so I can get suggestions for my next video so this is how I actually use this scraper myself so for first of all we're going to get the URL then we are going to paste it in here let's make sure that we have our API key already placed we are going to choose Gemini flash which for me is the best one to scrape the data and then then we are going to enable the scraping and get the data so we're going to start by the name of the person who commented the comment the name the comment the number of likes and the reply so the name the comment number of likes and number of replies and here I am going to enable the attended mode and you are going to see why so let's launch the scraper once we launch the scraper we are going to get this page and once we open this page first of all always with YouTube you are going to get this before you continue you are going to have to accept it or reject it whatever I accepted it and then here you have to wait for all of these comments to appear this is already handled by the application so even if you don't need the attended mode it's still going to scrape the comments but here I need to upload the replies as well so I need to click on on this and this can't be done by the application and this is why I've used the attended mode in this case so I am going to basically have all these replies shown and do some kind of infinite scroll by the way the infinite scroll is also is done inside of the application so you don't need to do that but in this case we have clicks as well and now we are going to go back to click on resume scraping and from here it should be able to get all the data that we have just shown and basically give it to me in a table format that it can just copy paste to chat GP in order to get like three ideas about what I should cover in the next video as you can see here I get the name of the people who have commented I get the comments I get the number of likes and I get the number of replies so what we are going to do in this case is get this in a Json format so let's open it let's copy it let's go to chat GPT and here I'm going to tell it I'm going to give you a Json with all the comments on my latest video about a universal web scraper and I want you to give give me three suggestions about what I should cover next according to the comments here I'm going to place a Json so by far the thing that has been asked the most is integrating llama models and by the way guys I've already done that the video before that was all about integrating llama so you can go back here local AI scraper can I run this locally and you are going to find it in here so for the people are asking about integrating a local model this is already been done you just need to watch that video hand in web SES scraper blockers this is what we have just done if you want to basically do the captas yourself if the application cannot do the capture and you are blocked for some reason even though that you shouldn't get blocked because I already have integrated user agents that was one of the suggestions that you guys have given me in my first video so I've already done that I've already have a list of user agents so every time the website does not detect that you are a scraper so that should not happen but if it does that was the attended mode that I've just added so that was a great suggestion that I've used and the last thing is the pagination and infinite scroll also suggestions that I have added okay so now let's talk about Docker so I have shared this container with you guys you're going to find the link in the description below and here you can download the image and start working with it I have created another account just to show you how you can work with it first of all you need to download Docker desktop so make sure that you have Docker desktop on your machine just Google it and install it once you do that open the CMD and you need to run two commands you can actually do it inside of Docker desktop but let's just run the CMD it's better so open the CMD and here you will need to pull the image so Docker pull name of the user and then the name of the application scrape master and then it will pull the last image and if you go back here a new image will be added inside of your Docker desktop good so as you can see here this is the new image and now to be able to run it all you need to do is this command Docker run- B then the number of the port I would suggest you stay with the same number as me 8501 8501 so username and then scrap Master this will run it and here you can click on all host and it will open the machine and everything will be run in your container instead of running on your local Machine by the way the attended mode does not work with Docker because when you are working with Docker you are probably running this in a container in a server so there is no graphical user interface unless you actually add it yourself so this is why the attended mode will not work if you running this on a Docker container it will always work in the background so no need to enable the attended mode when you are working with Docker if you want to use the attended mode just clone the project through GitHub or create the projects following the steps that I have described in my website but in Docker mode the attendant mode does not work the pagination works the scraping works as well and by the way this is why I added the API keys because if you are working through Docker either you have to add the arguments of the API keys while you are running the docker container or you have to add the EnV file while also running the docker container which is going to make it the docker run command more complicated anyways so this is how to run it through Docker we can test it if you want to I can add an API key in here let's say I'm going to copy this I'm going to put it in an open AI I going to scrape this website scrap me.