walking into a virtual world that literally builds itself as each step is taken or watching a quick idea become a polished video scene without a massive production crew this is the kind of future hinted at by Google's new AI model genie2 meanwhile 10 cents Hunan video is giving us a taste of Hollywood level visuals without the Hollywood budget all completely open source so let's talk about it all right let's start with Google deep Minds Genie 2 if you're not familiar with world models they're basically AI systems that create and simulate entire environments in real time
Genie 2 is a significant leap from its predecessor which was stuck in two Dimensions now we get full 3D worlds generated as you or another AI agent move through them it's not a game engine though it's a diffusion model that outputs frames as you explore think of it like walking forward in a virtual world and having the scenery appear right before your eyes as if it's being rendered on the Fly this approach allows for the dynamic creation of complex scenes and even though it's still early days it offers a window into how future AI driven
simulations might evolve it can even depict basic interactions with environmental elements like water smoke and simple physics effects showcasing how these digital worlds could one day feel more reactive and natural Genie 2 can handle multiple viewpoints third person first person isometric and it only needs a single image prompt to get going that prompt can come from Google imagine 3 model or even a photo snapped in real life once it starts generating frames the model tries to maintain internal consistency it can remember where objects are even after they're out of sight then reconstruct them accurately when
they come back into view this solves a big problem previous models like Oasis had where the environment would forget key details whenever they left the camera's perspective however this model isn't perfect Deep Mind admits it can maintain a consistent world for up to around 6 seconds beyond that visual artifacts begin appearing details degrade and the illusion of a stable environment breaks down most demo Clips are shorter around 10 to 20 seconds so we don't yet have sustained long duration sequences that feel fully coherent it's also unclear exactly how genie2 was trained only that it utilized
a large scale video data set at the moment Deep Mind isn't releasing the model publicly viewing it instead as a research tool for training and evaluating AI agents in fact Deep Mind specifically notes that genie2 could be used to train and evaluate their own SEMA algorithm indicating how integral it could become in the development of other Advanced AI systems it could also help artists and designers rapidly sketch Out Concepts acting as a creative prototyping engine in the long run Deep Mind believes that world models like genie2 could be crucial in pushing toward artificial general intelligence
as they let AI learn in diverse everchanging simulated settings what's particularly compelling about about Genie 2 is how it hints at a broader future for AI in fields like game development virtual cinematography or even VR training simulations with a more mature version of such a tool you could imagine developers rapidly spinning up environments that shift based on a player's actions while preserving certain narrative beats or artistic elements future iterations might overcome the short time limit enabling minutes or even hours of consistent generation the hope would be that scaling the data set refining training methods or
incorporating more advanced memory mechanisms could lead to more stable richer and far longer lasting worlds over time these models May integrate with traditional game engines or be guided by human designers who provide narrative structures and style parameters this could produce a hybrid creation pipeline where the raw building blocks of the world are quickly generated by Ai and humans refine and shape the final product to maintain that emotional resonance and handcrafted charm all right now let's take a closer look now at 10 cents hunu and video which has quietly entered the scene while open AI has
been teasing Sora for ages 10cent quietly launched its own solution and made it open source they claim hunu and video rivals or outperforms top-notch models like Runway gen 3 Luma 1.6 and several leading Chinese video generators according to human evaluations its results hold up strongly in terms of image quality and motion consistency early tests even suggest it can match or surpass the quality of other commercial heavy weights including Luma lab's dream machine and cing AI placing hunu and video in the top tier of available Solutions Hunan video's architecture is interesting it uses a decoder only
multimodal large language model as its text encoder rather than the standard clip or T5 XXL setups this approach helps it follow instructions better and grasp fine details it also employs a token refiner that takes a simple prompt like a man walking his dog and automatically enriches it with more details to produce higher quality output this refinement can add elements like specific lighting conditions intricate scene setups or subtle attributes of the subjects ensuring the final output feels more nuanced and complete than what a brief initial prompt might suggest the result is more vivid descriptive Generations without
the user needing to write a long prompt 10cent went all in with size giving Hunan video 13 billion parameters and training this Beast isn't simple they used a multi-stage approach starting with a low-resolution image training at 256 pixels then scaling up and mixing images and video over time by gradually increasing resolution and length they achieve stable convergence and good results the final product can generate text to video turn still image into moving scenes create animated avatars and even produce audio for video content now running Hunan video locally is tough you need a huge GPU with
at least 60 GB of memory that's more than most gaming PCs have but since it's open source developers can work around this there are cloud services like fall AI offering payper video Solutions and the official Hunan video server sells credits if you're willing to pay a bit you can access it without top tier Hardware it's worth noting that early testers report a generation time of about 15 minutes per video which while not instant is still a practical turnaround for producing highquality AI generated sequences early tests are promising though videos can look photorealistic with smooth motion
and can depict humans animals and environments with surprising accuracy one minor drawback it's not as strong with English prompts compared to some commercial models but because it's open source the community can improve it over Time by releasing Hunan video openly 10cent is challenging not only models like runways and Lumas but also setting the stage for competition with upcoming Solutions like open AI Sora this openness matters for years the biggest breakthroughs often came from private Labs that didn't share their code now 10cent giving developers the full model which can accelerate Innovation startups researchers and Indie creators
can tweak and fine-tune it close any language gaps and maybe even push quality beyond what 10cent Achi achieved think of it as a community-driven R&D approach to video generation comparing Genie 2 and Hunan video genie2 focuses on generating entire 3D worlds for agents to move around in real time mainly for training or testing AI hyan video focuses on generating video content photorealistic sequences animated scenes and more Genie 2 is about environments and interactable spaces while hunu and video is about producing highquality video from text or images both represent important Frontiers in how AI can create
visual content on demand where does open ai's teased Sora fit open AI has been on a roll with chat GPT image models and various improvements but Sora remains a no-show meanwhile 10cent just released a tool that might match or surpass what we've seen from commercial generators this could prompt open AI to move faster or offer something more groundbreaking we'll see how open aai responds will Sora arrive with a splash or has 10cent already raised the bar too high as these models evolve we'll find out what direction the industry takes until then keep an eye on
genie2 and hunu and video These are signs that the future of AI generated worlds and video has already begun all right let me know what you think in the comments and if you enjoyed this make sure to like And subscribe for more AI updates thanks for watching and see you in the next one