HuggingFace is a platform that hosts thousands of AI models that we can integrate into our flow-wise applications for free. So if you're tired of paying for services like OpenAI and Anthropic, then this video might be for you. Before we proceed, I do need to issue a fair warning.
Although it can be a lot of fun playing with these models, it can also be exceptionally frustrating getting them to work correctly. I will give you practical advice in this video on how to improve the results from these models, but you might have a newfound respect for services like OpenAI and Anthropic after you punch a hole through your PC trying to get these models to behave correctly. That's not to say that these models do not have their place though, and I've used some of these models to perform simple tasks for free and then leaving the more complex tasks for models like GPT and Claude.
So now that that's out of the way, let's first have a look at HuggingFace. We can access HuggingFace by going to HuggingFace. co and we can then search for specific models or we can click on the Models menu to get a list of all the available models.
And on the left-hand side, we can filter on these models so we have multimodal models, computer vision, natural language, and more. If we only wanted to look at text generation models, we can simply click on this text generation folder and this will give us all the text generation models and we can see that there are nearly 70,000 models at the time of this recording. We can see the specific details of a model by clicking on its link, for instance this "mixture" model which is extremely popular, and on this page we can see additional information about this model.
And most importantly, if we look on the right-hand side, we can see this section called "Inference API" and we can also see this text that says this model can be loaded on "Inference API" which means we are able to integrate with this model from tools like Flowwise. If you do not see this "Inference API" section, it means that integration with this model is not set up on HuggingFace and you might have to self-host that model. And that is not something that we'll have a look at in this video as I only want to focus on the free options.
So as long as you see this "Inference" section, we are good to go. So we can also test out that model by sending a message over here like "What is the capital of South Africa? " and we can see the type of responses that we can expect from this model.
So if we're happy with these results, we can go ahead and implement this model in Flowwise. So back in Flowwise, I've created a new chat flow and let's start by adding a new node. Let's go to "Chains", let's add an LLM chain and let's start by adding our LLM.
So under "Chat Models" and let's add the "Chat HuggingFace" node. And this node allows us to call those "Inference APIs". Let's connect this to our LLM chain and let's set up our HuggingFace credentials.
So under this drop down, click on "Create New". Let's give it a name like "HuggingFace API". And now for the HuggingFace API key, go back to HuggingFace, then create a new account or log into your account.
I'm already logged in, so after logging in, simply go to "Settings", then go to "Access Tokens", click on "New Token", give your token a name. You can leave the type as "Read" and then click on "Generate a Token". Now go ahead and copy this token, then paste it into this field in Flowwise and click on "Add".
Now we need to specify the model that we'd like to use. Now in order to get the name, go back to your model's page on HuggingFace and simply click on this "Copy" button next to the name and then paste that into this model field. We will not look at the endpoint in this video as this is the endpoint that will be generated when you decide to self-host your models.
And basically how that works is let's say that you go to a model that does not have this "Inference API" set up. If you can see this "Deploy" button, you can actually go and set up your own inference endpoint for this model. So you'll simply click on this option, you will load your credit card details.
You can pretty much leave all of these settings as the default options and you can then make this a public API as an example. And since I haven't loaded payment details, I'm not able to host this. But if you deadlight your details, you will be able to deploy this and by the end of the deployment you will receive a URL endpoint which you can copy and paste into this field.
But again, I'm only going to look at the free models in this video. But some of you might find this useful. Right, all we have to do now is add our prompt template.
So under "Add Nodes", let's go to "Prompts" and let's add our prompt template. Let's connect this prompt template to the chain and let's set the template. And let's do something like tell me a joke about, and let's set a variable with curly braces and let's call the variable "subject".
Let's save this and let's save the chat flow and let's test this in the chat. Let's expand the chat and let's try this. Let's enter something like "dog" and now you will notice a very strange looking response coming back.
And it kind of looks like the model is generating more than one joke about dogs and it's also added this random word at the start of the prompt. And this is one of those things that can be extremely frustrating when working with these open source models. But I am going to show you how to improve this.
Let's go back to the model page and let's see if we missed something. If we scroll down with this page, we can see this section here for "instruction format" and it's very important that you get this right. Most models, if not all models, will have some sort of section in the documentation explaining how to prompt these models.
So here we can see in this example that they start a prompt with these S brackets and then they use these square brackets to pass in an instruction to the models. So in between these brackets we have instruction 1, then we're expecting the model's answer and then we can pass in a follow-up instruction. So let's see what happens if we simply copy this example and let's change the prompt template.
Let's actually remove this. It's passed in that example and let's replace instruction with the text that we just deleted. Tell me a joke about subject.
Let's save this. Let's save the chat flow and let's see what happens now. Let's actually clear this chat and let's enter dog again and we can see the response has improved but it's still adding a whole bunch of funny words and characters towards the end.
So let's continue with this instruction. Let's also add this S bracket to the prompt. So before instruction let's add this opening S bracket and at the end let's add the closing S bracket.
Let's save this. Let's save the flow and let's see what we get now. So let's pass in dog and this time the response has greatly improved and I think this is one of those issues that a lot of you have been struggling with and it really just comes down to getting the instructions on how to prompt these models from the documentation and then implementing those instructions in your prompt templates.
So I hope you found this video useful and if you did please hit the like button, subscribe to my channel and let me know down in the comments which open source models you like to use and which prompts you use to get the best out of those models. If you like this video then you might also like this other video where we run these open source models locally using Olama.