Name: How A Bunch Of GRADS Took On Silicon Valley... And WON! (The Story of DeepSeek)
Duration: 16 min 22 s
Channel: AI Revolution
Description: did you know that the AI startup deep seek the one that shook up the AI world and sent shock waves through America's biggest tech companies is actually run by a team of fresh grads and undergrad in...

did you know that the AI startup deep seek the one that shook up the AI world and sent shock waves through America's biggest tech companies is actually run by a team of fresh grads and undergrad interns their V2 model didn't just raise eyebrows it started a price War in China forcing giants like tensent Alibaba and Bou to slash their own AI prices overnight and their CEO Leon wfun has made a bold promise deep seek will always stay open source his goal for China to stop imitating us Innovation and start leading it at the same time another AI Challenger Kimi K 1. 5 from moonshot AI has been making waves of its own outperforming even GPT 40 in math and coding benchmarks proving that China's AI scene isn't just catching up it's rewriting the rules how did deep seek pull this off what's their secret and where does Kimi K fit into the this race this is the story of deep seek and the AI Revolution happening right now deep seek emerged on the Chinese AI scene as a lowprofile startup with a surprising impact it began attracting attention partly because it had a Quant private fund called Juan fun behind it which had invested heavily in Nvidia a100 gpus and partly because it drove a sudden price war in the large language model space its model deep seek V2 introduced a cost for inference that was so low about one Yuan per million tokens that competitors felt pressure to slash their own prices 10cent BYO Alibaba and bite dance eventually followed this move even if it meant taking a loss observers compared deep se's approach to the discount oriented pindad daa calling it the AI pindu daa though deep seek itself did not rely on subsidies or burn cash to offer its low prices the secret hinged on a new model architecture that brought GPU memory usage down to a fraction of the usual Baseline MLA replaced the typical multi-head attention mechanism mha by operating at only 5 to 13% of the memory footprint the team also implemented a deep seek moaz sparse design that cut unnecessary computation which allowed them to reduce their operating costs so effectively that they could turn a profit while other companies struggled in global circles analysts began to talk about deep seek as a mysterious new force in AI Andrew Carr a former open AI employee adopted some of the training ideas in his own work and anthropics Jack Clark described deep seeks researchers as a group of extremely capable Minds he saw the company as part of China's drive to produce influential technology on par with its role in drones and electric vehicles deep se's founder Leong Wen fun had always believed that chasing immediate applications was less valuable than pushing the boundaries of architecture he had a background in advanced engineering and AI research and he spent years in the background developing ways to scale up deep learning systems with minimal overhead he felt that Reliance on short-term replication of foreign breakthroughs would perpetually keep Chinese AI Labs behind the curve he preferred to focus on fundamental changes even if that meant risking time and resources on projects that might not pan out part of the company's broader goal is progress toward AGI so they devote themselves to exploring the infrastructure and Frameworks that might support generalized intelligence rather than merely shipping a chat GPT clone moonshot AI another newcomer in Beijing took a different route by focusing on a multimodal large language model named Kimi K 1. 5 this model gained attention for outrunning GPT 40 and Claude 3.

5 Sonet on metrics such as Math 500 where it earned a score of 96. 2 it also hit 77. 5 on aimi and landed in the 94th percentile on code forces Beyond these tests it achieved impressive numbers on math Vista and mmu thanks thanks to strategies like rejection sampling partial rollouts and length penalties during its reinforcement learning phase the developers equipped it with a 128k token context window which means it can handle extremely lengthy input without losing track of earlier details because it supports Tech text image and code processing users can feed it various forms of content in a single go it manages up to 50 files at once including PDFs slides and documents and it can even run real-time searches on over 100 sites the team at moonshop made it freely accessible through their chat interface which is located at kiore AI to use it one can create an account pick the offline or online mode and toggle between normal Kimi and the Kimi K1 .

5 Loom thinking version the offline mode is meant to analyze local inputs without searching the internet and the online mode brings in web results deep seek R1 a newer release from Deep seek came out soon after V2 and caused just as much conversation in Tech circles reviews pointed to strong performance in coding and reasoning tasks along with a general open- sort approach that aligned with deep seeks philosophy observers decided to compare Kimi K 1. 5 with deep seek R1 on several practical tasks beginning with image analysis each model received two images each containing numeric data for various large language models kimy K 1. 5 managed to parse the text more accurately and identify the correct entries while deep seek R1 ended up comparing values that had not been mentioned for one of the models both ended up mixing parameters to a degree since neither stuck stuck strictly to overlapping attributes in the images Kim K 1.

5 still looked stronger because it interpreted the data a little better then they tried web searching to locate Red gowns under $200 deep seek R1 returned multiple links though a few turned out to be irrelevant or not within the desired price range Kim K 1. 5 gave two direct links that fulfilled the request and posted supplemental options in a s panel which made it more focused on the price and color constraints next they tested how each model handled multiple files at once Kimi K 1. 5 parsed at least two files out of three then summarized them deep seek R1 stumbled and returned no effective unified summary unless given the files separately finally the two models were tasked with generating HTML code for a snakes and ladders game deep seek R1 came across as more advanced with clearer modular features and a more playable interface while Kimi K 1.

5 presented something simpler that allowed the tokens to stray beyond the boundaries of the board neither model managed to implement real snakes or ladders so the code they produced stuck mostly to random movement on a board layout deep seek R1 was recognized for stronger coding output there after the scores were tallied Kim K 1. 5 finished with three points and deep seek R1 finished with one although these tests were not exhaustive the comparison showcased noticeable differences in their functionality deep seek R1 struggled with large combined file inputs but excelled at coding Kim K 1. 5 performed better for tasks like web searching basic image analysis and summarizing multiple documents kimy K 1.

5 is also known for having no usage limits in its free tier while deep seek usually focuses on Advanced architectural breakthroughs to reduce cost and keep usage fees low for Enterprise scale clients deep seek R1 and Kimmy K 1. 5 both follow an open-source methodology which they credit for speeding up progress across the AI Community the Deep seek leadership believes that code secrecy only gives a fleeting Advantage so they choose to publish their breakthroughs convinced that True Value comes from the team's deep knowledge rather than from locking up the code kimy K 1. 5 is also free partly because moonshot wants to encourage widespread development and attract outside collaborators users who want to explore these tools can register at chat.

Deep seek. com for deep seek R1 or go to Kimmy AI for Kimmy K 1. 5 deep seek has a straightforward interface called Deep think while Kimmy's chat interface offers toggles for both online and offline modes kimy K 1.

5 can read up to 128k tokens in a single session which makes it compatible with entire books or large data sets that requ require context it handles Chain of Thought in a short or long format which can be adjusted based on the user's preference for depth and step-by-step explanations kimi's math and coding accomplishments come from that combination of reinforced learning and specialized training sets that included partial rollouts and structured feedback the math tests in particular such as the math 500 data set show that it can solve Advanced problems with a high accuracy percentage that surpasses gp4 code Force's results place it in the 94th percentile an indicator of how well it can generate or evaluate code these Feats are especially impressive because moonshot AI started up in 2023 so it has not been around very long deeps story dates back a bit further through the founders roots in Guangdong and his time spent at Juan Fang the earliest Rumblings about deep seek involved the purchase of thousands of a100 gpus which seemed EX excessive until people realized the team was preparing to build enormous Next Level models the Deep seek V2 release stunned observers by delivering something that required only a fraction of the usual GPU memory load the MLA design replaced mha and deep seek Mo s par further trimmed compute requirements analysts said the new architecture likely matched the best American Labs in efficiency which gave deep seek an edge in price competition by the time bite dance Alibaba tensent and others responded it was clear that deep seeks approach wasn't a short-term tactic but a fundamental difference in architecture leang W Fung referred to it as a strategy for Bridging the Gap between Chinese labs and Silicon Valley rather than as an attempt to generate quick Returns His perspective mirrored how he worked on AI research at a Quant fund for over a decade without seeking any major publicity in interviews he spoke about the importance of tackling problems that other teams avoided such as rewriting the standard attention scheme from scratch observers see these developments as part of China's growing confidence in AI moving from an era of merely following open-source releases from the United States to a new period of parallel or even Leading Edge breakthroughs deep seek V2 deep seek R1 and Kimi K 1. 5 are not alone in that push other Chinese llm startups exist and there are also the massive players like alibaba's quen Ba's e erne and Ching hua's open research Labs what set deep seek and moonshot apart was the decision to share insights with everyone pivot toward real architectural Innovation and keep their user-facing services accessible at a lower cost kimi's Advanced handling of Chain of Thought multimodal tasks and large-scale context signals that these teams see a future in bridging text and visual understanding in parallel deep seek in invests in efficiency and cost reduction one focuses on userfriendly Broad application features while the other refines underlying computations until they reach minimal overhead both companies aim for AGI or at least something closer to a general intelligence than a standard chatbot one invests in new ways to store and process data with minimal Hardware constraints and the other deepens an integrated approach to text code and images they share belief that an AI Revolution relies on more than just chasing quick apps for chat or short-term revenue from Enterprise subscriptions enthusiasts in the AI Community are eager to see how far these methods can go and whether they will affect competition with well-known models like GPT 4 or Claude 3. 5 in the long run deep seeks meteoric rise triggered ripples well beyond China causing United States tech stocks to seesaw as investors scrambled to determine how new open source break throughs might affect established giants like Microsoft Google and meta when deep seek version 2 drove inference costs down to nearly one Yuan per million tokens analysts on Wall Street began questioning the sustainability of current pricing models in the west meta do in particular dipped briefly after rumors circulated that deep seek would eventually release a social platform plugin at a fraction of the cost Microsoft Azure and Google Cloud meanwhile rushed to an announced new AI pricing tiers presumably to match the affordability that deep seek had so swiftly showcased a sudden drop and inference costs LED hedge funds like 72 and Renaissance to rebalance their portfolios adopting a weit and see approach and giving fresh Capital to open-source AI projects that mirrored deep se's cost-saving architecture a substantial portion of the global developer Community rallied behind deep seek seeing its fully open- source relase as a direct counterweight to Big Tech's Walled Garden approach forums like Hacker News and reddits r/ maach learning lit up with praise for Leong Wen Fung and his team hailing their MLA and deep- seek mostar designs as genuine game changers the loudest cheers often came from Indie developers small startups and academic researchers who felt locked out by steep GPU prices and closed Source solutions they viewed deep seek as the people's Champion putting advanced technology Within Reach without demanding massive budgets in interviews respected AI thinkers like Andre karpathy and Yan laon expressed cautious optimism about the changes deep seek might bring applauding its architectural contributions while acknowledging that large incumbents wouldn't simply Fade Away still deep seek message of innovation over imitation resonated strongly among a global audience that was eager to see open- Source AI Dethrone entrenched big Tech Gatekeepers so so far the race is still wide open open AI anthropic and cohere have each taken notice of deep se's rapid Ascent with some insiders hinting that upcoming versions of GPT and Claud might incorporate memory saving tricks inspired by MLA meanwhile Kim 1.