One of the most exciting and played out topics in the world of technology today is AI. It seems you can't use a product without it claiming to have some sort of AI enhancement. And I'm honestly torn on AI.
On one hand, you have this really great technology that can have profound impacts on our daily lives. And on the other hand, we have to give up privacy to use it. This is amplified by the fact that the more we use it and the more it learns from us, the better it becomes and the more it can help.
And this means giving up our privacy to use it. Or does it? I set out to learn about AI systems that are private and local.
I started building a system in my own home that could run AI workloads and help me see if there are any practical uses for running AI at home, all without any data ever leaving my home network. I've managed to build and host some pretty advanced systems that help me with everything from chatbots to image generation to help with my home and more. Let's see if we can self-host an AI stack comparable to what you see from the big players.
Since we're on the topic of privacy, I think it's a good time to talk about keeping your internet private with today's sponsor. You all know the scenario, right? You travel to a far off place and you finally get to your destination.
When you arrive at the place you're staying, you want to connect to Wi-Fi and you search for the username and password. You're immediately presented with a choice. Stay on your carrier's data network and use up your data plan, not to mention allowing them to snoop on you or connect to the location's Wi-Fi network where you have no control over what they do, who is connected to the network, and allow their ISP to snoop on you too.
Is that like a double or a triple snoop? Well, I'll choose the third option and that's using a VPN like Surfshark and surfing the web without tracking and blocking all of the snoopers involved. Surfshark keeps prying eyes off of my activities when browsing the internet.
It encrypts my connection using WireGuard, which is a fast and modern VPN protocol that has quickly become one of the most popular and secure protocols out there. Surfshark also adds some really nice features like dynamic multi-hop that uses multiple servers to connect me to my destination online, an integrated ad blocker so that I'm not targeted for ads, cookies, or tracked, and they even have a browser extension that lets me quickly connect to a VPN using a single tab in my browser. This is all backed by a no logging policy and over 3200 ram only servers, meaning nothing's written to disk.
So on your next trip, if you're like me and you bring your iPhone, tablet, android phone, macbook, windows laptop, apple tv, and even a travel router, know that Surfshark will keep all of your devices connections secure with their unlimited device policy. So sign up today with their 30-day money back guarantee and if you use the code "TECHNO-TIM" you'll get an extra four months for free on top of their already low price. So we'll start with Ollama.
So Ollama is an open source API for creating, running, and managing models. It helps you download and use trained models from the big players like Llama3 from meta, Gemma2 from google, and FiveThree from Microsoft. These are open models that are pre-trained to help you do a variety of chat-based tasks or even specialize in tasks like StarCoder2 which can help you with coding tasks and even image generation in line, but we'll cover that a little bit later.
But Ollama is the tool that I use to load and manage models that I want to use. Now Ollama by itself is kind of useful but it's exponentially more useful when you pair it with something to use the API like a web UI. One of the easiest ways to see the benefits of using Ollama is to use it with a chat UI like open web UI.
Now if you're familiar with ChatGPT, you'll feel right at home using open web UI but it also comes with some features that even ChatGPT can't do. When using open web UI you can download and manage models pretty easily in the UI. It's also nice because it allows you to create multiple accounts, set default models, and even restrict which users can access which models.
They're also adding memory features too which can remember some of your answers and provide more contextual results. It's pretty cool. After getting things set up you can use it just like you would a chat bot.
Want to list all the US states and capitals? Sure. Easy.
Wanted to summarize something for you? No problem. Wanted to do it in the voice of John Dunbar from Dances with Wolves as if he's creating an entry in his journal?
Nice party trick. Two socks like Cisco has become a trusted friend. He still won't eat from my hand but his keen eyes and ears never fail to alert me when something is wrong.
Maybe soon it can bring in local AI text-to-speech and also mimic Kevin Costner's voice. So this is fun and all but here's where open web UI can do some tricks that might make some of the public chat bots lose their digital minds. You can enable web search too.
Here in my chat I can enable web search to find and summarize my search results. You can see here in the results it's also citing sources so you know where it's getting its information. Cool.
Now I know you're also worried about privacy. That's why I've integrated this web search within open web UI with SearchNG which is a free internal meta search engine which aggregates results from different sources without tracking or profiling you as a user. SearchNG is a really great project that I'm running here locally and I can easily hook it into open web UI bringing anonymous web searches into my chat client.
This is one of the many things you can do in open web UI and one of the many reasons I'm going to continue using this as an interface for Ollama. We'll see some more cool tricks that can do later but let's move on. We've all seen the incredible images that can be generated with a system like Dolly from OpenAI.
They range from creative to scary to realistic to even laughable. So output aside creating images with text is a powerful tool for getting ideas and inspiration killing time or even just for laughs. Stability AI and other companies have released their models that are open to use.
Each company and each model has been trained on a variety of images and artworks. These models can be found on Hugging Face which is a community for building and sharing AI models. These models can be downloaded and used but require an engine and UI to make it possible similar to Ollama in open web UI.
Models are one thing and the engine and the UI are another. There are quite a few to pick from in this space from the simple auto 1111 which I started with to Comfy UI which I have started using more and more. Now Comfy UI is a little bit more advanced but what I like about it is is that it supports new models, allows you to visually tweak your pipelines, and seems to have more regular updates.
Now I'm going to be honest I'm not an expert when it comes to this UI but I know how to adjust my settings according to prompts that I find on the internet. And that's one of the key learnings here is that the better the prompt the better the output. And this is why things like prompt generators exist and are very helpful at getting the most out of some of these models.
For the most part I run stable diffusion and Comfy UI for fun. It's really cool seeing some of the results but I can see how this could be helpful for creatives for getting some inspiration. The cool thing about having this service running within your stack is that you can then hook it into existing systems.
Remember open web UI? Yeah I've even integrated it into chat. Now it's still early and it's less configurable than using the dedicated UI but if you want an image for your chatbot in open web UI it's totally doable and really cool to play with.
Using Ollama with web UI and various models can help you accomplish various tasks even highly technical ones like break/fix scenarios and debugging help. Now I personally found it really helpful when trying to fix errors with some of my scripts, converting my code to a different or newer version of a language, and even explaining how to use various components that I haven't used before. Now you can even integrate this into your IDE or editor.
Now I've never used Copilot for my code but many claim that integrating Ollama with a model and an extension is a free and private way to get Copilot. Let me show you what I mean. I've been testing out a free extension called Continue that can connect to your local and private instance of Ollama.
While the model that I'm using isn't the greatest for code generation it's really good at code suggestions as you type. If you enable autocomplete it can suggest code for tab completion as you type which is much better than depending on suggestions from VS code alone. I found this feature super helpful and it works out of the box with the starcode 2 model really well.
Another tool I have in my self-hosted AI tool belt is Whisper. Whisper is an automatic speech recognition neural network created and trained by OpenAI. It handles things like accents, different languages, and even background noises with surprising accuracy.
OpenAI has released these models and even the code to process it but it's largely up to you for the interface. There's a great freemium desktop client called Mac Whisper that I paid for which is wonderful but I wanted a universal way to get transcription regardless of what platform that I'm using. I'm using an open source web-based version of Whisper that not only allows you to transcribe audio into text but it also does this visually and lets you update them before exporting them into various formats.
I can either upload a video or choose an existing YouTube video and choose the Whisper model to use. I found that smaller models work great for spoken word or unscripted talks because it will remove repeats and stutters from the captions and if you're going to do something scripted I found that the larger models are what you want. After choosing your options, shortly after you will have your video transcribed and you can visually scrub through it and fix any captions that you don't like.
You can then export these into various formats and even transcribe it into another language. Now this might seem helpful only to me since I'm a YouTuber and I need to generate subtitles but I plan on using this for home videos too and possibly even for meetings. Now this project is undergoing a complete rewrite and I'm excited to see where it goes.
I would love to see multi-speaker options, find all, and other options you see in some of the other clients but it's pretty awesome. One of the other things I've supercharged at home is Home Assistant. Home Assistant has started embracing AI and starting with an integrated chatbot and you guessed it you can integrate it with Ollama.
After integrating Home Assistant with Ollama your virtual assistant will get supercharged by Ollama's LLM support. In the most basic example if you ask the default assistant a question it doesn't know how to respond but when switching to your supercharged Ollama assistant it knows exactly what you're talking about and has the context of your smart home. I've also enabled Whisper to transcribe my voice and Piper to read text-to-speech however these are considerably slower since they aren't GPU enabled and require their own version of Whisper and Piper.
I do wish that I could use my existing Whisper server but I'll take what I can get because it's usually faster than typing it all out on mobile anyways. One thing to note is although you can integrate with Ollama Home Assistant won't actually perform these actions. Now this is a little bit odd since the response said it did it for you but currently it can only perform actions if you integrate with the cloud versions of Google AI and OpenAI which is kind of weird because you integrate this self-hosted thing with the cloud that can then do things on your smart home network.
But Home Assistant says Ollama action support is coming soon. Now this is just the tip of the iceberg for what's to come with AI and I hope that other self-hosted platforms make their systems pluggable with Ollama and other services. As we see more and more systems sprinkle AI onto all of their products I sincerely hope that there is also a future where we can plug in our own system like Ollama with our own models giving us the control and privacy we need.
After all some of these products allow you to plug in ChatGPT so plugging in our own should also be an option. I hope that companies see this as a way to keep our data private as well as theirs and local to the person who's using it because that's the future I want to live in. I'm Tim, thanks for watching.