I'm super excited to talk about something that's seriously changing the game in AI you might have heard a lot of Buzz about AI agents that can browse the web for you do complicated tasks and basically act like your own digital helper well open AI new agent called operator is right at the center of this wave and it's powered by a model they refer to as computer using agent or CUA for short there's a ton to unpack here so let's talk about it this video is sponsored by growth school all right so basically operator is an AI that navigates the internet just like a person would it clicks Scrolls and types inside a built-in browser interface and it can do multi-step tasks You' normally do yourself for instance it can book flights look for deals on your favorite soda fill out forms or even handle your to-do lists in apps the real magic is that it's using the same goys graphical user interfaces that we humans see no special developer friendly interface or anything like that it literally sees the screen as pixels moves a virtual Mouse and types on a virtual keyboard that's made possible by combining GPT 4 O's the O stands for a special version with vision capabilities Advanced understanding of images with Next Level reasoning thanks to reinforcement learning now open AI has run CUA through several tests one major Benchmark is called osor which checks how effectively an AI can operate an entire operating system like Windows Ubuntu or or Mac OS CUA managed a 38. 1% success rate there which is below the 72. 4% success rate for humans but it's notably above previous AI methods that hovered around 22% another set of tests web Arena and web Voyager focus on web browsing tasks like filling out forms or navigating e-commerce websites here Kua hit 58.
1% success on web Arena and 87% on web voyage again an improvement over previous state-of-the-art models that 87% might look really high but keep in mind web Voyager tasks are often simpler so there's still a challenge to get it closer to the human level performance of around 78. 2% on more complex tasks to show some real world examples open aai tested Q with tasks like updating a software license in gitlab finding canceled orders in Magento to figure out who cancels the most merging p PF documents from emails into one file compressing images in or even finishing a grammar quiz on the Cambridge Dictionary site all these tasks were basically done by letting the agent take over and click or type exactly like a person would sometimes it got stuck and had to try multiple times or eventually pass control back to the user but overall it is shown it can do a pretty wide range of tasks albeit with some stumbles open aai is rolling out operator in a research preview for chaty GPT Pro subscribers in the US now the monthly subscription is not cheap 200 bucks a month and that definitely positions operator as more of a business or Advanced user product for now but open AI says they plan to open it up to additional tiers like plus team and Enterprise in the future they also want to bring it to their API so that outside developers can build their own products using the same CUA technology so maybe we'll see a new wave of apps that rely on a universal interface essentially the idea of letting the AI look at screens click stuff and solve tasks in any digital environment with tools like operator redefining how we work with AI it's clear this technology isn't just a luxury it's becoming essential for staying ahead in an unpredictable job market that's why we've teamed up with growth school for this video to bring you something truly valuable that could help you thrive in this rapidly evolving landscape 2024 has been a whirlwind jobs popping up everywhere but layoffs are just as common it's a wild ride and even if things feel secure now you never know what's around the corner that's why I think having multiple streams of income isn't just smart it's essential here's where AI comes in with the right tools and skills you could seriously start earning an extra $10,000 a month now if you're wondering how to get started growth school has something really cool they're offering a 3-hour Hands-On AI training where you'll learn to use over 25 powerful AI tools normally it's paid but the first 1,000 AI Revolution viewers can join join for free using the link in the description on top of that you'll get $500 worth of bonus resources just for signing up the training covers everything job hunting tips salary negotiation mastering Excel even content creation and it's not just for Tech experts whether you're in finance sales marketing HR or even still studying this can work for you growth school has already helped millions of people level up and this could be your turn to stay ahead in an AI driven world so if this sounds like your kind of thing hit the link below to grab your free spot plus don't miss joining growth school's WhatsApp Community it's a great place to connect with others diving into AI too now that you know how to equip yourself with AI skills for the Future Let's dive back into the fascinating capabilities and challenges of open ai's operator and of course this kind of web browsing AI agent has people stoked about convenience but also worried about potential misuse operator can do so many tasks that if a malicious user tried to push it to break the law or do unethical stuff that could be a big problem so open AI says they've layered a bunch of safety measures in for one thing the AI is trained to refuse harmful or illegal tasks they also keep a real-time block list for websites with content that's either adult gambling or otherwise off limits plus they run automated moderation checks if the AI or user does something suspicious like repeated attempts at hacking or policy violations they can issue warnings or block usage there's also an offline detection pipeline that Flags potential child safety or deceptive behavior and another concern is if the AI might make mistakes that end up costing you like entering the wrong shipping address or deleting an important file with operator the model asks for user confirmation before finalizing any big moves like sending an email or making a purchase on top of that operator has watch mode for especially sensitive websites so the user can directly supervise the ai's moves and the AI tries to detect prompt injections which is when a web page might try to trick it into doing something malicious like revealing personal data or making unauthorized changes they say they've tested it internally and that it only slipped up once in an early test but obviously the cat and mouse game with prompt injections is never ending they also have a separate monitoring system that can freeze execution if it sees suspicious commands on the screen now in terms of everyday usage the end goal is to have an agent that can replicate the ways humans handle all sorts of digital chores like ordering takeout booking dinner reservations or searching online real estate listings actually we're seeing more examples of that from open ai's Rivals too perplexity AI just rolled out its own agent for Android that one can set reminders hail a ride or book a table anthropic which has an Enterprise focused model called Claude already offers some agent-like features and just introduced a citations feature so you can see exactly where your llm generated answers are coming from in your documents meanwhile Apple even launched an advanced Apple intelligence system integrated with Siri and partnered with open AI to bring chat GPT features to iPhones with user permission in terms of raw performance these agent-based systems used to be the stuff of sci-fi or at least borderline vaporware but the arrival of advanced Chain of Thought reasoning in large language models like open ai's GPT 40 has started making them feasible an executive told ruers that the big shift is basically these llm can now break down tasks step by step which is crucial when you're letting them roam across multiple web pages and fill out forms in real time so how do you actually get started with operator if you're in the US and on chat GPT Pro well you head over to operator. haat gp.
com and you can basically type in hey operator book me a flight from LA to Seattle next Wednesday morning with a budget of $200 or you could say find me at three-bedroom town house in Seattle on redin it needs two bathrooms be between $600,000 and $800,000 and I want something with solar panels sure it might not do it perfectly yet and sometimes it only manages it three out of 10 times especially if you didn't specify the steps or the sight's a bit quirky but the vision is that you prompt it it navigates everything and you just watch until it's done or you step in if it gets confused what about fancy text editing or controlling specialized sites right now it's kind of hit or miss the team at open aai explained that if the layout is unfamiliar or the tasks are super Advanced like creating a slideshow or precisely editing text in an HTML editor the agent can get stuck in a trial and error Loop they gave an example that when you're using something like an HTML 5 editor you might want your text color changed or something aligned to the right the agent can eventually do it but might flail around they also mentioned that giving the system more explicit instructions can boost the success rate so you might might say click the filter section on the left set the date to February 22nd 2025 from 900 a. m.