Name: Claude 3.5 Sonnet Computer Use | Full Break Down
Duration: 7 min 3 s
Channel: Relevance AI
Description: good day everybody it's your boy Dan here from relevance AI today I'm joined by Lucia hi I'm Lucia I'm a fullstack software engineer at relevance and I've been here for the last year so L why are w...

good day everybody it's your boy Dan here from relevance AI today I'm joined by Lucia hi I'm Lucia I'm a fullstack software engineer at relevance and I've been here for the last year so L why are we here today so I've heard there some big news that came out of anthropic can you tell me more about that yeah so massive day at anthropic they're on the coils of open a eyes 01 release they open AI swarm release we've got videos about those two things today anthropic have upgraded some 3.5 they've upgraded hi coup and really

interestingly they've released a new functionality for their models called computer use so what does that mean um so computer use essentially gives the model the ability to take control of your computer it runs on this Loop where it has access to actions like moving your mouse clicking typing basic computer actions essentially it emulates a person sitting at a computer and does that by on a loop taking screenshots of the screen looking at where everything is deciding doesn't have to move and mouse up a certain amount of pixels right is that not risky yeah it is

risky in fact they say it's risky but right now it's quite a limited model in terms of its ability to use your computer so in the same way that giving anyone access to your computer and letting them click around is risky I guess you could say this is risky they've made some statements about how they want to put it out there while it's quite limited to learn about how people use it so that they can make it safer in the future they've spoken about implementing more safety guards to analyze the behavior and bail out if

it starts doing something risque yeah it is it's interesting it's interesting what this means interesting so obviously relevance is an AI agent building platform so what does that mean for our AI agent Builders yeah great question I look at this as almost the ultimate extension of the co-pilot model for agents right it's almost comical to think about this as like how agents will impact automation because it's an agent doing the thing that the human has to do right now what's exciting about agency coordination and where I think it takes the world is that we'll get

to entirely reshift how work is done because agents will have access to apis we'll have to build less software so there'll be less things to click and scroll and move so really an agent that can do those things is kind of working in the same way our current world works but it's not what we think the future looks like right we think the future looks like you have ai workforces full of Agents that work on autopilot they don't need your intervention and they certainly don't need your computer to click around in right that being said

I do think it's interesting how it might be able to automate some more Legacy systems yeah if you think about there's heaps of industries that still uses software that hasn't been updated for 20 years right doesn't have good apis and I think those Industries could really benefit from this cuz the agent has to go use the Legacy system of a computer of clicking and scrolling so I think it's a really really cool addition to your agentic automation system your multi-agent system that allows it to interact with these Legacy platforms I don't think it really reflects

what Automation and what agents will look like in the future but it's still really cool yeah what do you think about it like now you know about what anthropic released what are your thoughts I think the first thing I'm trying to think of are the use cases I can use as a developer I don't see many ways as a Dev that I would use it but one way that does come to mind is QA testing that can be quite difficult we do set up like front end tests to be able to test how a user

might interact with the system but that can be constantly changing so being able to automate that I can definitely see that happening it's kind of cool it's like if you think about QA or end to end tests on software we look for software like Solutions like we write rules and get it to do these three things but we're actually trying to replicate human behavior right and that's the whole point of Agents right is to replicate human behavior so if you can get your agent to do that QA testing that makes a lot of sense that's

a really cool use case actually yeah the other direction that's becoming very popular with llms is integrating like multimodal ways to communicate with the LM you can think of a computer speaker as the mouth right and then its mic as the ears so it's also interesting that now there's this additional component of being able to physically interact with a device yeah again another cool way to think about it we think about multimodal models image models video models you can almost think of this as like the user computer modality of an of a model I think

this ties into a theme we say like we saw it with 01 from open AI the model providers now are starting to have to innovate on what their models can do for a while all they did was make their models bigger and bigger and bigger and we all celebrated every time it got bigger now it feels like they've plateaued there a bit they make things bigger does it actually make performance better it's hard to tell so now they're innovating now they're creating stuff like o1 do Chain of Thought they're building computer use that can point

and click and scroll and type and I think from our point of view Building multi-agent Systems that's so cool cuz the more models we have that can do interesting different things the more problems we can solve with agents so really exciting times it is do you want to tell the audience a bit about what you're working on at relevance right now yeah so I realized a theme of the work that I do is actually telling the agents what not to do for example even when hearing about anthropics new computer use my first thought was is

that risky because often times you will have a bit of apprehension to using AI or llms because you feel as though you can't manage it in a way that feels natural and in a way that feels reliable so a lot of what we do here is to make it as reliable as possible and that includes features a big feature that I worked on is Agent cadency which is essentially a rules engine that limited how an agent processes its messages so you would shove a whole bunch of messages into its system but for a lot of

different use cases you don't want those messages to run at once for example even with open AI limits we want to be able to spread those out but the other use case for example a lot of our customers need to uh Outreach to their clients but if you have your agent messaging them at like 5:00 a.m. it's quite rude um so to be able to mimic how an employee would work we introduce cadency so you can tell it that you can only work Monday through Friday you can only work from 9: to 5: and you

can only process 50 messages a day things like that such important features in our platform when you're building these production multi-agent systems these are the things you have to worry about because as we find managing these agent systems it's like managing humans you want to implement some framework around how and when they should work well it's been awesome having you we see the next video