hello Community open R1 what is it how can we use it now you know we have deep seek r one download close to 200,000 downloads here for in the last days from hugging phas we have it available we can use it and you might say what is open or one now let's open up this video and let's have a deeper look now you know there's a beautiful research paper by Deep seek explaining here or one in detail and we have here the reinforcement learning algorithm we have here the group relative policy optimization described in detail
and everything is beautiful but you know then there was this particular spark and in my video that I showed you here and I was talking here about R1 distilled version smaller version not the original huge r one but we have a 32 billion version we have even a 1.5 billion version that is distilled from R1 and I showed you that we have to take care that because there is a deep seek version 3 base model and a deep seek version 3 Model that is particularly useful here for our chain of sord models and then we
went that we have two all1 models we have the zero model and we have the main R1 model and you know we found that the zero model encounters challenges such as endless repetition poor reability and a language mixing so therefore R1 Z is not really and we went with R1 and we have a performance that is at least comparable to openi o1 across mathematic code and reasoning tasks and I showed you here in the publication we looked here at the reinforcement learning with cold start and we looked at the training Pipeline and we went through
all the four steps beautiful but you know there is something missing and of course there's something missing I mean open a with 01 tells us nothing about your internals about your data set about everything so why should now deep seek or War tell us everything they have given us so much more than any other company have given us the complete model open source but if you really want to understand it you find out hey there are missing pieces that we have to discover so it was just great here the release of R1 great for the
I Community but of course they didn't release everything I mean why should they they're a startup company you know what this missing the most important fact the data sets the data sets for all the different steps that we already looked at and the real code used to train the model are not released and this is absolutely fine because of course a company will not give you here all their sec secret and it is amazing what deep seek R1 already have given us so let's come to the core of this video there is now January 28
2025 today filming this video this initiative of open or1 a fully open reproduction of deep seek o1 and here we have hugging face and they say hey you know what we try really to reverse engineer o1 we have so much from R1 let's see if we can deduct the missing pieces so really can build if you want an R1 complete open R1 system and of course this is really complicated but you see the uput just on the first day 266 this is nice so what is the goal of open or1 and just to remind you
this is just starting it will take weeks maybe end of February 2025 we have it I don't know it all depends depends on you yeah because you can participate this is now a community exercise and they ask the community to help them so it is not just those three people no there are now people coming in from the complete AI community and they trying to rebuild the last missing pieces so that we have a truly open a one from everything from the data set for the extreme hyperparameter tuning that we will see we want to
see here everything and understand everything sing and you know what this is so much better than openi 01 where we have nothing so here we have as an AI Community the chance to learn and something beautiful really the leading and bleeding edge of EI research so huging Fest tells us here our plan of attack is the following step one we have to get the high quality reasoning data set we have to distill them from the open source s deeps o1 and you might say why we go for the high quality reasoning data set well do
you remember in this video here where I told you here about deeps one the reasoning and explained there's this distillation process 2.4 and they say to equip a more efficient smaller language model with the reasoning capabilities of R1 deep seek directly fine tune now open source model like the qan and the models using their special 800,000 samples that were created here for the reasoning process mathematic and and and with deeps o1 so we already went through this in the video and I showed you already doing this and having now this data set look what they
built they have here the Lama 70b or1 distill or the q1 32b that I particular like here for causal reasoning because it's 32 billion free trainable parameter is great to run locally so you see you can build all these beautiful models down to a 1.5 billion which is here reflecting on the pure R1 reasoning capability of course reduced smaller m is but this is what we are looking for now the second step is completely different please notice we also have the R1 zero remember and I told you in my other videos we are looking here
at the reinforcement learning pipeline so what they need and you're not going to believe it they need to create here new large scale data sets for the R1 Zer now for the mathematic and reasoning and code and everything and then if they have both the data for the if you want fine tuning and the data for the reinforcement learning then they will go on and try here the complete training pipeline here the model the supervised find juning and the reinforcement rning via the multistage training as described in the R1 research paper so this is the
goal of open or one and you can help them let's look at the facts that we have oh yeah I forgot to mention there are some not correct information out there about deeps just want to tell you a year ago we had this publication from Deep seek deep seek mathematics pushing the limit of mathematical reasoning and open language M and just look at what their cooperation partners are you have deep seek AI at this time and you have chingua University the Harvard I call it the Harvard of the East I'm sorry maybe they are better
already than Harvard you have Chua University and piking University so everybody who tells you hey there was this complete new Advanced po methodology may I just tell you in the AI research literature this is more than a year old that deep seek officially published this to the complete AI community and explained in detail here their training modalities this group relative policy optimation they described in this paper here in absolute detail so at a time a year ago there was also the question would open not use this new beautiful methodology for their model but we never
received CE any feedback because openi is profit orientated and therefore there's no information available on this but just to tell you they published this they went here with chingua University and they are not just a little small startup company this is a complete wrong impression sometimes you can read on the internet so you see here we go with version three I recommend this to you from end of April 202 for deep seek junga University pacing University and you will learn a lot of their training process they published a year ago and especially I recommend chapter
413 the process supervision reinforcement learning with GPO great this was this fact now the other fact of course is we have now by hugging face this new initiative of open or one and as you can see go to GitHub and you see hey more than 11,000 Stars already hey great so they are now building here everything but watch out it is not just a code base let's have a deeper look so just to show you here today hugging face open R1 on G beautiful and you see here already hear what they do open are a
fully open reproduction of deeps R1 this repo is work in progress let's build together and hopefully in the next weeks we as a community we will be able to really get here the data sets that are still missing here to completely understand R1 so they give us here the overview the goal of this repo is to build to the missing pieces of the R1 pipeline so that everybody can reproduce and build on top of it so what they already have I've shown you we have already here beautiful the code we know everything about supervised fine
tuning we know how to evaluate this to generate synthetic data from the model and everything this is already there but now as I tell you replicate the I want dis steal model by dis stealing here high quality Corpus from Deep seek R1 then replicate the poor reinforcement learning pipeline to get the new data sets on mathematic reasoning and code so if you want to participate easy installation beautiful and then you have here the training modes if you want to go with supervised fine tuning here everything is already available for you but please you notice this
is for expert only so if you are the expert in your field for anything that can participate here here you have all the code already available have a look at this if you detect hey I have a knowledge where I can help them and you look at this and say hey there's part missing that I can contribute hey why not go and help here the community you see there's a a lot of code already available given the information today that I received from hunging face they are let's say hopeful that in the next weeks they
will be able here to find everything that they are looking for and then they will publish it for the community free of charge so we will make a huge jump in our understanding here off of reasoning models and especially here after our one model so maybe in the future we don't have have to pay here proprietary llm provider here extreme high prices because it is open source wouldn't this not be a beautiful idea for the future development of eii and you know what is beautiful here coming back now here today also January 28 2025 here
for me they have here on hugging face new inference provider directly on the Hub by hogging face so hey wow upward 112 members great so what do you have they have now new inference provider they already have Google they already have AWS they already have Microsoft but now they go with four more they go with f replicate Sova systems and together computer so they have new serverless inference provider that are working directly on the Hub model pages so this means you don't have to leave hugging face you don't have to care about your how many
gpus you need 80 gb fre Ram or maybe 1,00 GB vram you have a direct connection to four new inference provider and all you need is your credit card so let's have a look I was interested started today in the billing and I found out here there are two processes either you go with direct request so if you say you always already have an account at let's just say together EI and you already have there your credit card then you get a key and you can use it here on hugging face on all the models
on hugging face and the data set on hugging face directly and you just get a key and you choose here the direct request methodology if you are completely new when you say I have nothing I start new maybe you should check out here the routed requests so this is here you pay the stand provider APA R to hugging face and hugging face tells us here hey there are no additional mockup on the prices by let's say together AI from hugging phase so we just pass through the provider cost directly to you and your partner if
you want is your credit card is only given to hugging face this is the first day would I personally choose to be here the alpha tester for a service on the first day maybe I wait a week or two and then if we see that it has stabilized and there's no mistake and everything is beautiful then I personally also would use this service because you don't have to care about any GPU infrastructure any configuration of GPU anything this is already set up for you that you can use it but of course you need your credit
card because if you need I don't know 1,000 GB vram yes you have to pay pay for the gpus you can run deep seek R1 the full model already on one of the providers to get AI and the code is so simple on hugging face if you want to go here with this one you have your Hub you have your inference client you have your API key from to together Ai and then you have your question what is the capital of France maybe I wouldn't go with this highly complicated question and use here one of
the best models on on the planet but just a demonstration and then you just say hey the model is deeps AI on hugging face deeps R1 you don't have to care about anything it's everything is done for you so a brand new opportunity offered to you by hugging face offered to you by four inference provider it is the first day I have not tried out any of them so if you experience them and you have a positive feedback or maybe you have a negative feedback hey why not leave a reply in the comment of this
video for the I Community to see your experiences and I think this is it for today so just to make sure open or one is looking to reconstruct here the synthetic data sets and if we would get them if we would be able to reconstruct a high quality data set we do have if you want the fuel for the EI engine and this will allow everybody to find you their existing or new llms into the reasoning models by simply fine tuning them with the supervised fine tuning also available in hugging phas so you see we
just need those specific data sets if we succeed here in this community exercise by hugging phase to regenerate those syntatic data sets this will allow everybody to F tune your own llm your little llama or whatever you have to fine tune into a reasoning model with almost the reasoning capabilities of R1 so you see having this data set they are so important and of course if you want here if we then go for the next step the reinforcement learning this data set new data set training recip including will also start as a starting point for
anybody who builds similar models from scratch on and even more methodology available but now we are really completely transparent we have everything we have the data sets the multiple data sets we need we have the data pipeline we have the training pipeline we have all the weights we have the complete mods we then have everything to understand how to further develop the performance of eii system and you know what I like open eii I now pay multiple years I pay for their services and given the name openi I somehow I wish they would have provided
your open source model but unfortunately as of today I can tell you there is nothing from open1 to learn okay have to accept this so therefore we have open R1 by Deep seek highly intelligent highly specialized expert from Deep seek they have G given us here R1 and I think it's here for the worldwide AI Community a chents to learn to First improve their systems and I can tell you EI in the future it's going to be amazing so why not subscribe to the channel and I see you in my next video