Scribe
Scribe

Gefällt es Ihnen? Machen Sie Scribe noch besser, indem Sie eine Bewertung hinterlassen

Chrome-Erweiterung herunterladen

Durchsuchen

  • Beliebte Videos
  • Aktuelle Videos
  • Alle Kanäle

Kostenlose Tools

  • Video-Untertitel-Downloader
  • Video-Zeitstempel-Generator
  • Video-Zusammenfasser
  • Video-Wörterzähler
  • Video-Titel-Analysator
  • Transkript-Suche
  • Video-Analytik
  • Video-Kapitel-Ersteller
  • Video Quiz-Generator
  • Mit Video chatten

Produkt

  • Preisgestaltung
  • Blog

Developers

  • Transcript API
  • API Documentation

Rechtliches

  • Nutzungsbedingungen
  • Datenschutz
  • Support
  • Sitemap

Copyright © 2026. Mit ♥ gemacht von Scribe

— Wenn wir Ihr Leben einfacher gemacht haben (oder zumindest etwas weniger chaotisch), hinterlassen Sie uns eine Bewertung! Wir versprechen, dass es unseren Tag versüßen wird. 😊

Related Videos

New AI HYENA Destroys Old AI Models and Breaks Speed and Memory Records

Video thumbnail
25.17k1,674 Wörter8m readGrade 8
Teilen
Channel
AI Revolution
[Music] Something big just dropped and it's not another transformer upgrade. It's something completely different. Liquid AI, a Boston startup spun out of MIT, just revealed Hyena Edge on April 25th, right before ICLR 2025 kicks off in Singapore.
And yeah, hyena, like the animal that laughs at lions, except this thing is laughing at latency graphs on your phone. is built to run powerful AI right on your device, faster and lighter than anything we're used to. And it might just be the first real sign that the Transformer era is starting to crack.
So, let's rewind just a bit. For years, we've been in this romance with the Transformer architecture because of that paralyzable attention mechanism Baswani and friends introduced back in 2017. It's gotten us some wild breakthroughs, sure, but there's a big catch.
Squeezing these chunky Transformer models onto a smartphone without frying your battery or devouring your RAM has been painful. Most edge optimized models like SM LM2, like FI, even Meta's Llama 3. 21B still lug around standard attention blocks and rely on kernels that are great on a data center grade GPU, but not so hot on the Snapdragon inside your pocket.
Liquid AI is basically saying, "Why not ditch most of that attention baggage and do something leaner? " Enter Hyena Edge, a convolution-based multi-hybrid model. Convolutions on a language model.
Yep. Convas aren't new. They rule in vision, but here they're part of this broader family of operators called hyena, which Michael Poli's group kicked off a couple years ago.
The edge variant takes it further by replacing roughly 2/3 of the grouped query attention operations inside a topshelf transformer plus backbone with these gated convolutions from the hyena subf family. That swap alone cuts a heap of memory overhead and avoids the quadratic time blow up that attention brings. Now, Liquid AI didn't just eyeball some abstract benchmarks on a desktop GPU and declare victory.
They ran the whole thing on an actual Samsung Galaxy S24 Ultra. Yes, the phone you might have in your pocket right now. And compared it to a parameter matched GQA Transformer Plus+ model.
Prefill latency. Hyena Edge was faster across the board and at longer contexts. We're talking up to 30% quicker.
The code latency, same story. Once you hit sequences over 256 tokens, that convolution magic really kicks in. and memory usage was lower at every sequence length they measured, which is huge when your app's data has to fit between Spotify, Tik Tok, and the photo album of your cat.
What about accuracy, though? Because, let's be honest, nobody cares if a model is lightning fast if it can't finish your sentence. Liquid AI trained both models on the exact same 100 billion tokens and then unleash them on a battery of standard language model benchmarks.
On wiki text, hyena edg's perplexity dropped to 16. 2 compared to 17. 3 from the transformer baseline.
Lambata went from 10. 8 down to 9. 4.
On pyap, the accuracy nudged up from 71. 1 to 72. 3.
Hella swag saw a jump from 49. 3 to 52. 8.
Wino Grande climbed from 51. 4 to 54. 8.
Ark Easy crept up from 63. 2 to 64. 4 and Arc Challenge pushed from 53.
34 to 55. 2. One funny footnote, both models tied on a Pi QA variant at 31.
7, so the Hyena didn't win absolutely everything, but it never fell behind either. Net result, speed and memory savings come with equal or better predictive oomph, which is the holy grail for ondeision AI. Okay, but how did they actually arrive at that architecture?
This is where it gets super nerdy, but also kind of sci-fi cool. Back in December 2024, Liquid AI unveiled something called STAR, the synthesis of tailored architectures framework. Picture an evolutionary algorithm wearing a lab coat.
You feed it a bunch of primitive operators, some constraints about latency and memory, sprinkle in linear systems theory, and then let it evolve architectures generation after generation. For Hyena Edge, they kicked off with a population of 16 candidate models. Over 24 generations, Star juggled 18 different convolution options.
Hyena full, hyena X, hyena Y, with filter lengths ranging from 3 to 128, plus several flavors of grouped query attention and swag glue feed forward layers. Every candidate got its latent memory and latency profiled on the actual S24 Ultra, not a random desktop card. And they even trained each mini model for 5 billion tokens to keep score on Perplexity in real time.
As the evolutionary cycles rolled on, the Hyena Y operator kept musling its way to the front of the pack. It turns out this variant strikes that sweet balance. plenty of expressive power without the inner convolution overhead you see in hyena full and a lighter gating setup than hyena X star could literally visualize how many self attention hyena and swuiglue blocks were inside each generation watch the walkthrough video Liquid AI posted you'll see those histograms shifting over time self-attention bars shrinking hyeni y bar swelling latency curves dipping kind of like watching natural selection but with code blocks instead of genes By the final generation, Star spat out the design that became Hyena Edge, 32 layers deep, width, 48, attention head size 64, and 2/3 of what used to be GQA replaced by hyenagated convolutions.
No handtuning, no trust me, bro, just hardcore automated search. And they stress tested the outcome directly on the phone again to be sure those earlier approximations held true at full scale, which they did. Otherwise, we wouldn't be talking about this.
One angle I love is how they benchmark responsiveness on short prompts because that's where hybrids usually fall flat. Edge apps like voice assistants often fire off queries under 20 tokens. So, shaving milliseconds there is everything.
Liquid AI reports their prefill latency advantage is visible right from the shortest sequences. That's basically the model's first impression for the user and it only widens as you feed in longer context. Even if you're just sending a single sentence to your ondevice language model, Hyena Edge can answer faster than its transformer twin.
Now, a quick side tour. Group QA attention KI was already an optimization to make transformers more manageable by letting multiple queries share key value heads. It's lighter than full attention, but still attention at heart.
Hyena Edge swaps most of those heads out entirely, and that's what slices the quadratic term down. Convolution operations scale linearly with sequence length. So for 512 or 1,024 tokens, you're looking at serious compute savings.
The kicker is that liquid AAI engineered their gated convolution, so they still capture long range dependencies, something older convol struggled with. That's how they kept or improved the Perplexity numbers without fall back to heavy attention. All that said, Liquid AI isn't keeping this in a vault.
They've already stated loudly that they plan to open source Hyena Edge and a series of liquid foundation models in the coming months. If you're like me, that sentence feels like Christmas morning. It means developers will get a turnkey model that can run natively on stuff like the S24 Ultra, maybe the iPhone 16 Pro, maybe even a Raspberry Pi if you dare, without requiring a cloud subscription or a 5 watt charger.
And because it's open, we'll see a thousand forks. Someone will quantize it to 4bit. Someone else will fine-tune for coding assistance.
Another team will port it to watch OS. Mark my words. More broadly, this is part of a bigger trend.
We're stepping into a post transformer world or at least a poly architecture ecosystem. Transformers are still unbeatable for heavy GPU jobs though. But when it comes to edge devices where every bit of energy matters, hybrids like convolutions, recurrent models, and even state space models are finally getting their moment.
And with smart tools like Star doing the architecture search, we're moving faster than manual tweaking ever allowed. But the best part, everything's tested directly on real devices like smartphones, not just on GPUs in a lab. So what works in theory also works in your hand.
Zooming out a little, phones now have powerful NPUs, laptops are shipping with crazy AI accelerators, and there's pressure to keep AI local for privacy. Models that can crush benchmarks like Lombata at a perplexity of 9. 4, use less RAM, and respond 30% faster could be exactly what tips the scale.
Running everything on devices just feels better. No lag, no cloud dependency, no leaking personal data when you're offline. Credit where it's due.
The team behind this listed under Liquid Science includes Armen Thomas, Stefano Masaroli, Michael Pulley, and the rest of Liquid Edge. They built on the Hyena X and Hyena Y work, plus tweaks like Swiggloo, and of course, the original Transformer ideas. It's less about one-off brilliance and more about letting algorithms evolve designs smarter than we could by hand.
One quick note, early versions of Hyena Edge were tested at a width of 512 before scaling up to48 for the final model, keeping attention head size at 64. During Stars evolution runs, they estimated whole model latency and memory by adding up pre-measured operator stats, which let them move fast without wasting full training cycles. They even visualized it.
Watching those models slide toward the bottom left corner where low latency meets low perplexity was like watching a stock ticker only cooler. So where are we headed? If the last few years belong to transformers, the next could be ruled by automated architecture search, hybrid models, and real practical edge AI.
Hyena Edge proves you can rip out most of the attention, still match or beat quality, and get way faster on realworld devices. And since Liquid AI plans to open source it, they're inviting everyone to take it even further. Is this the future?
Powerful AI running straight from your pocket with no cloud in sight, or are we just dreaming too big, too soon? Thanks for watching, and I'll catch you in the next one.
Ähnliche Videos
DeepSeek and Microsoft Just Slapped OpenAI Across the Face with New Insane Models!
10:33
DeepSeek and Microsoft Just Slapped OpenAI...
AI Revolution
14,294 views
Stanford Creativity Expert: This Simple Shift will 10x Your AI CreativityㅣJeremy Utley
13:20
Stanford Creativity Expert: This Simple Sh...
EO
340,066 views
New Computer Breakthrough: Light-Speed Unlocked
21:43
New Computer Breakthrough: Light-Speed Unl...
Anastasi In Tech
96,838 views
This Genius Drill Could Change Geothermal Forever
14:49
This Genius Drill Could Change Geothermal ...
Ziroth
75,987 views
Our AI Future Is WAY WORSE Than You Think | Yuval Noah Harari
1:37:44
Our AI Future Is WAY WORSE Than You Think ...
Rich Roll
829,376 views
But what is quantum computing?  (Grover's Algorithm)
36:54
But what is quantum computing? (Grover's ...
3Blue1Brown
609,107 views
Forbidden Technologies and The Silencing of Their Inventors
39:36
Forbidden Technologies and The Silencing o...
Universe Inside You
3,681,195 views
Houthis ATTACK the Wrong U.S. Fighter Jet – Then THIS Happened…
18:43
Houthis ATTACK the Wrong U.S. Fighter Jet ...
Navy Media
5,022,053 views
Everything is Better: GaN vs Silicon Power Supplies
31:14
Everything is Better: GaN vs Silicon Power...
ElectrArc240
86,516 views
Microsoft Accidentally Created the Most Efficient AI Ever
9:45
Microsoft Accidentally Created the Most Ef...
AI Revolution
76,862 views
Master's specialisation Geospatial Artificial Intelligence (GeoAI) explained
5:10
Master's specialisation Geospatial Artific...
Faculty ITC | University of Twente
1,614 views
Veritasium: What Everyone Gets Wrong About AI and Learning – Derek Muller Explains
1:15:11
Veritasium: What Everyone Gets Wrong About...
Perimeter Institute for Theoretical Physics
1,818,714 views
How A YouTube Review Exposed An $8 Billion Company
18:08
How A YouTube Review Exposed An $8 Billion...
Logically Answered
527,356 views
No.1 Nitric Oxide Expert: This is the anti-aging cure no one is talking about!
1:26:08
No.1 Nitric Oxide Expert: This is the anti...
The Diary Of A CEO
1,110,270 views
Full interview: "Godfather of AI" shares prediction for future of AI, issues warnings
51:48
Full interview: "Godfather of AI" shares p...
CBS Mornings
317,171 views
Everything We Know About 'Oumuamua
31:13
Everything We Know About 'Oumuamua
Astrum
3,409,152 views
The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry
24:02
The Race to Harness Quantum Computing's Mi...
Bloomberg Originals
5,641,940 views
Microchip Breakthrough: World’s First Silicon-Free Processor
19:03
Microchip Breakthrough: World’s First Sili...
Anastasi In Tech
291,277 views
What's next for AI at DeepMind, Google's artificial intelligence lab | 60 Minutes
14:01
What's next for AI at DeepMind, Google's a...
60 Minutes
1,369,483 views
Apple's AI Crisis: Explained!
16:52
Apple's AI Crisis: Explained!
Marques Brownlee
6,699,853 views