Anthropic’s Claude Computer Use Is A Game Changer
The age of AI agents is here. Models can read, see, talk, and now, even use a computer all by themselves. YC President and CEO Garry Tan dives into how Claude Computer Use works, what it can do, and how it may change AI forever.
Transcript
The rocks can talk, but they can also read. They can see, and now they can use a computer browsing the web, clicking buttons, typing text all by itself. The age of AI agents is here. One of the first out of the gates is Claude Computer Use, Anthropix brand new AI agent. Let's dive into how it works, what it can do, and how it may change AI forever.
In October, Anthropic made waves when it released a set of upgraded models, Claude three point five Haiku and a new 3. 5 Sonic. They also released something special, computer use, but they're not the only ones in this space. We already know Sam Altman is working to recreate Samantha from the movie Her, and OpenAI is said to be releasing its own agent, Operator, in the new year.
Google is working on something similar too. The landscape for AI agents is growing fast, and so far, Anthropic is the first of the big AI labs to get into the game. Right now, Claude computer use is still in public beta as developers put it to the test. But already, it's looking like a complete game changer. So how does it work?
Claude had the ability to understand images for a while, so the next step was to train it on how and when to perform specific actions, like clicking buttons or writing text based on what's displayed on the screen. Quad has has had for a long time since since Quad three back in March, the ability to analyze images and respond to them with text.
The the only new thing we added is those images can be screenshots of a computer. And in response, we train the model to give a location on the screen where you can click and or buttons on the keyboard you can press in order to take action. And it turns out that with actually not all that much additional training, the models can get quite good at that task. It's a good example of generalization.
For this, Anthropic needed to train Claude to recognize exact locations on the screen down to the pixel. Anthropic was then able to train Claude to understand what's happening on screen and to reason about how it should use its software tools to do tasks. For example, it might help you automate boring and repetitive tasks.
Claude's gonna start taking screenshots of my screen and quickly realizes that the ant equipment company isn't actually in the spreadsheet. Luckily, we get a search match and Claude then starts scrolling through the page looking for all the information it needs to fill out this form. To get started with computer use, developers have to run it in a virtual machine or container like Docker.
You'll also need an Anthropic API key. Once that's all set, you can then open a dedicated browser window which shows the user prompt on the left and Claude's activity on the right. Claude starts by analyzing the prompt and deciding which tool to use. As it works, it takes a screenshot at each step to check its progress, making sure the task is on track.
If adjustments are needed, Claude loops back to try different actions or tools until it completes the task. This repeatable loop of deciding, evaluating, and acting is called the agent loop, and it's how Claude handles complicated step by step tasks all on its own. So what else can computer use make possible?
In their own demos, Anthropic shows us a few different tasks, like this one of Claude helping to plan a sunrise hike at the Golden Gate Bridge. It searches the web, figures out some important details, and then creates an event in Google Calendar.
In another example, Wharton professor Ethan Moloch puts Claude computer use to the test by feeding it a video of a construction site and prompting Claude to monitor the site and look for issues with safety. You'll see Claude take screenshot after screenshot, analyzing different parts of the site, making note of all the gear and materials, and trying to spot any potential issues.
It even finishes up by putting everything together in a nice, neat spreadsheet. Automated OSHA compliance? Check. By now, it should be clear that computer use is a step forward for AI. Up until now, developers have had to make tools to fit the model, coming up with custom environments where AIs use specially designed tools to do different various tasks. Now, we can make the model fit the tools.
That's a powerful change. Computer use opens up so many applications. Businesses can automate repetitive tasks and increase efficiency, while the average user can save time on routine things like booking flights or ordering food. It's easy to see a future where AI agents handle most of the drudge work for us. And for developers, computer use massively lowers the barriers to entry.
LLMs have already made tasks like coding way more accessible to the average person, and computer use takes that a whole step further. Computer use is still a work in progress, so it has some bugs and limitations. It's much slower than typical models and has a tendency to crash from time to time. So reliability is still an early concern.
Occasionally, Claude will misstep in its tool selection, get confused, or even sometimes veer off task. During one session that Anthropic shared on YouTube, Claude unexplainably started searching for pictures of Yellowstone National Park out of nowhere in the middle of its task. To be fair, humans get distracted and sometimes do that too. Claude does have guardrails.
Since it could easily be used for abuse, it steers clear of things like account creation or content generation for social media. It's also vulnerable to prompt injection, a security risk where the model can be tricked to follow different information or prompts embedded in the online sources it visits rather than sticking to the original prompt.
Imagine a website prompt injecting Claude to upload the contents of your password manager. That'd be bad. Anthropic thought about this and tries to keep users safe by keeping actions contained to a secure virtual machine, limiting access to sensitive data and strictly controlling approved sites. However, many of these limitations could be lifted soon.
Because this beta is just the beginning, Anthropix already said that computer use will rapidly improve to become faster, more reliable, and more useful for the tasks users want to complete. Plenty of startups are getting into the mix too.
Just recently, a YC company, Cura, released their own browser agents that seem to outperform cloud computer use on the Web Voyager benchmark, achieving a new state of the art. In the near future, LLMs with the full ability to use and control computers will reshape everything. How developers write software, how CEOs run their companies, and even how we all live our daily lives.
Each new groundbreaking application will transform how we work, connect, and live. This kind of AI won't just be an assistant. It'll take on entire tasks that once needed whole teams or companies. So what will you build with computer use?
✨ This content is provided for educational purposes. All rights reserved by the original authors. ✨
Related Videos
You might also be interested in these related videos