this is a breakdown of grok 3 from Elon Musk and xai as we cover its five newest features and how it's about to transform robotics gaming and so much more developed using the company's Colossus supercluster which leverages 200,000 gpus and offers 10 times the computational capacity of Prior models grok 3 demonstrates notable improvements in mathematics coding scientific reasoning and instruction following tasks and alongside the primary model xai also unveiled its grok 3 mini which is a smaller variant optimized for cost-efficient performance currently both models are in training and are expected to improve with user feedback with deployment to users scheduled to begin in the coming days as for intelligence grok 3 incorporates significant advancements in reasoning which are achieved through extensive reinforcement learning this approach enables the model to process complex problems over several seconds or minutes refining its solutions by identifying errors exploring alternative methods and verifying outcomes and to Showcase these capabilities two beta reasoning variants named grok 3 and grok 3 mini were introduced testing on the 2025 American Invitational mathematics examination it yielded a score of 93. 3% for grok 3 at its highest test time compute setting additional benchmarks include 84. 6% on The Graduate level GP QA test for expert reasoning and 79.
4% on live codebench for coding tasks as for grok 3 mini it achieved 95. 8% on American Invitational mathematics examination and 80. 4% on live codebench demonstrating efficiency in stem focused applications users can also access the reasoning process of grok 3 through the think feature which provides transparency into its stepbystep problem solving this functionality is intended to support educational and professional use cases and in user preference metrics grok 3 achieved an ELO score of 1, 142 on chatbot Arena indicating strong performance relative to competing models but even without its reasoning mode activated grok 3 still delivers rapid responses thanks to its extensive pre-training on the Colossus super cluster in fact it performs competitively across benchmarks including MML Pro for general knowledge GP QA for scientific expertise and aim for mathematical proficiency Additionally the model supports multimodal tasks including image analysis generation and video comprehension with its context window of 1 million tokens being eight times larger than that of previous xai models enabling it to process lengthy documents and complex prompts effectively and on the Loft Benchmark evaluating long context retrieval across 12 tasks grock 3 also achieved leading accuracy scores an early version of grok 3 internally codenamed chocolate even previously topped the LM Arena chatbot Arena leaderboard outperforming its peers across multiple categories with XII planning to further scale its pre-training efforts using the 200,000 GPU cluster with even larger models anticipated in the future but xai has also introduced deep search its first AI agent integrated with grok 3 equipped with internet access and code interpreters deep search is designed to retrieve and analyze information resolve conflicting data and produce detailed summaries the agent targets applications ranging from real-time news aggregation to in-depth scientific research offering a more advanced alternative to traditional search tools deep search will be available to users and Enterprise Partners as part of the upcoming grok 3 API release as for Access grok 3 is already available to a huge audience with xai offering unfiltered answers and advanced capabilities in reasoning coding and visual processing through the model's integration on the xplat form gro.