you've gotten llama into tiny grad you've gotten stable diffusion in Italian grind what was that like can you comment on like what are um what are these models what's interesting about boarding them so what's yeah like what what are the the challenges what is what's naturally what's easy all that kind of stuff there's a really simple way to get these models into tiny grad and you can just export them as Onyx and then tiny grad can run onyx um so the ports that I did of llama stable diffusion and now whisper are more academic to
teach me about the models but they are cleaner than the pie charge versions you can read the code I think the code is easier to read it's less lines there's just a few things about the way tiny grud writes things here's here's a complaint I have about pytorch nn.relu is a class right so when you create it when you create an end module you'll put your nn relues as in a net and this makes no sense really use completely stateless why should that be a class but that's more like a software engineering thing oh do
you think it has a cost on performance oh no it doesn't have a Custom Performance um but yeah no I think that it it's that's what I mean about like tiny grad's front end to being cleaner I see uh what do you think about Mojo I don't know if you've been paying attention to the programming language that does um some interesting ideas that kind of intersect uh tiny grad I think that there's a spectrum and like on one side you have Mojo and on the other side you have like ggml um gml is this like
we're going to run llama fast on Mac and okay we're going to expand out to a little bit but we're going to basically like depth first right Mojo is like we're gonna go breath first we're going to go so wide that we're gonna make all of python fast and Tiny grads in the middle we are going to make neural networks fast yeah but they uh they try to really get it to be fast compiled down to specifics uh hardware and make that compilation step as flexible and resilient as possible yeah but they've turned completeness and
that limits you turn that's what you're saying it's somewhere in the middle so you're actually going to be targeting some accelerators some like some some number not one my goal is step one build an equally performance stack to pie torch on Nvidia and AMD but with way less lines and then step two is okay how do we make an accelerator right but you need step one you have to first build the framework before you can build the accelerator uh can you explain ml perf uh what's your approach in general to benchmarking tiny grad performance so
I'm much more of a like build it the right way and worry about performance later um there's a bunch of things where I haven't even like really dove into performance the only place where tiny grad is competitive performance wise right now is on Qualcomm gpus so tiny grid is actually used in open pilot to run the model so the driving model is is time to grad when did that happen that transition eight months ago now um and it's 2x faster than qualcomm's Library what's the hardware of open uh that open pilot runs on the the
kamea it's a Snapdragon 845 okay so this is using the GPU so the GPU is an adreno GPU there's like different things there's like really good Microsoft paper that talks about like mobile gpus and why they're different from desktop gpus um one of the big things is in a desktop GPU you can use buffers uh on a mobile GPU image textures are a lot faster and a mobile GP image textures and limit okay and so you want to be able to leverage that I want to be able to leverage it in a way that it's
completely generic right so there's a lot of this xiaomi has a pretty good open source library for mobile gpus called mace where they can generate where they have these kernels but they're all hand coded right so that's great if you're doing three by three comps that's great if you're doing dense map malls but the minute you go off the beaten path a tiny bit well your performance is nothing