I will never forget the kung pao chicken I sat down to eat a few months ago. Not because the taste blew me away â 20 minutes on the back of a delivery riderâs scooter had sullied that somewhat. What made the meal memorable was that I hadnât really ordered it at all. Yet there it was, in front of me.Â
An AI assistant called Operator, developed by ChatGPT-maker OpenAI, had ordered the food on my behalf. The tech industry has dubbed such assistants âAI agentsâ, and several are now commercially available. These AI agents have the potential to transform our lives by carrying out mundane tasks, from answering emails to shopping for clothes and ordering food. Microsoft chief financial officer Amy Hood reportedly said in a recent internal memo that agents âare pushing each of us to think differently, work differentlyâ and are âa glimpse of whatâs aheadâ. In that sense, my kung pao chicken was a taste of the future.Â
But what will that future be like? To find out, I decided to put Operator and a rival product named Manus, developed by Chinese start-up Butterfly Effect, through their paces. Working with them was a mixed bag: amid the flashes of brilliance, there were moments of frustration, too. In the process, I also got a glimpse of the risks to which we are exposing ourselves. Because fully embracing these tools requires handing them the keys to our finances and our list of social contacts, as well as trusting them to perform tasks the way we want them to. Are we ready for the world of AI agents, or will they be hard to stomach?Â
Since 2023, we have lived in the era of generative AI. Built using large language models (LLMs) and trained on huge volumes of data scraped mainly from web pages, generative AI can create original content such as text or images in response to commands given in everyday language. It would be fair to say that this AI has made quite a splash, judging by the volume of media coverage devoted to the technology, and has already changed the world significantly.Â
Agentic AI promises to take things one step further. It is âempowered with actually doing something for youâ, says Peter Stone at the University of Texas at Austin. Over the past few years, many of us have grown used to the idea of asking a generative AI for information â recommendations of favourite dishes available in the neighbourhood, for instance, and contact details for the restaurants from which that food can be ordered. But ask agentic AI, âWhat should I eat tonight?â and it can pick out dishes it thinks you will like from a restaurantâs website and â if there is an online order form â pay for the food using your credit card, arrange for it to be sent to your home and let you know when to expect the delivery. âThat will feel like a fundamentally different experience,â says Stone: AI as an autopilot rather than a copilot.Â
Building an agentic AI with this sort of capability is trickier than it might appear. LLMs are still the driving force under the surface, but with agentic AI, they focus their processing power on the decisions they can make and the real-world actions they can take based on the digital tools â including web browsers and other computer-based apps â at their disposal. When given a goal such as âorder dinnerâ or âbuy me some shoesâ, the AI agent develops a multi-step plan involving those digital tools. It then monitors and analyses how close the output at each step is to the ultimate goal, and reassesses what else needs to be done. This process continues until the agent is satisfied it has reached the ultimate goal â or come as close to doing so as possible. And once the act is done, the system asks whether it achieved the goal successfully, a form of feedback also present in AI chatbots, called reinforcement learning from human feedback.Â
Stone, who is the founder and director of the Learning Agents Research Group at his university, has spent decades thinking about the possibility of AI agents. They are, he says, systems that âsense the environment, decide what to do and take an actionâ. Put in those terms, it may feel as if AI agents have been with us for years. For instance, IBMâs Deep Blue computer appeared to have reacted to events on a real-world chessboard to beat former World Chess Champion Garry Kasparov in 1997. But Deep Blue wasnât an agentic AI, says Stone. âIt was decision-making, but it wasnât sensing or acting,â he says. It relied on human operators to move chess pieces on its behalf and to inform it about Kasparovâs moves. An AI agent doesnât need human help to interact with the real world. âLanguage models that were disembodied or disconnected from the world are now being connected [to it],â says Stone.Â
Early versions of these agentic AIs are now available from many tech firms, with each, whether it is Microsoft, Amazon or the software firm Oracle, offering its own. I was eager to see how they work in practice, but doing so isnât cheap: some come with annual subscription fees running to thousands of dollars. I reached out to OpenAI and Butterfly Effect and asked for a free trial of their products â Operator and Manus, respectively. Both accepted my request. My plan was to use the AIs as personal assistants, taking on my grunt work so I would have more free time.Â
The results were mixed. I was due to give a presentation in a few weeks, so I uploaded my slide deck to Manusâs online interface and asked the AI agent to reformat it. Manus seemed to have done a good job, but after opening the slide deck in PowerPoint, I realised that it had placed every line of text in a separate text box, meaning it would be annoyingly fiddly for me to make additional edits myself. Manus did, however, fare better at compiling code for an app I wanted to upload into an app store-ready format, using various tools and its remote computerâs command line to do so.Â
Turning to Operator, I began by asking the AI agent to handle my online invoicing system. Like a well-meaning but not particularly helpful intern, it insisted on filling out the form the wrong way: inputting text defining the work for which I was invoicing into a box that could receive only numeric codes. I eventually managed to break it out of that habit, but then Operator got confused when copying over details from my âto invoiceâ list to the system, with potentially embarrassing results. Notably, it suggested I submit an invoice to the New Scientist accounts team asking for an ÂŁ8001 payment for a single article.Â
It was with some trepidation, then, that I gave Operator a promotion and asked for its help in reporting this story. I had already used ChatGPT to identify AI experts who could comment on the rise of agentic AIs. I asked Operator to send each expert an email on my behalf requesting an interview. The results, which I didnât see until the emails had already been sent, made me inwardly cringe â not least because Operator decided against acknowledging its role in composing them, giving the impression that I had written them myself. The language the AI agent used was simultaneously naive and too formal, with staccato sentences fired with a semi-hostility that put me â and, in all likelihood, the would-be interviewees â on edge. Operator also failed to mention some key information, including that my story would be published by New Scientist. In that way, it felt a lot like a junior assistant. Not really knowing how to write an email as I would, Operator made many mistakes.Â
In Operatorâs defence, however, the emails were at least partially successful. It was through an Operator email that I made contact with Stone, for instance, who took the AI-sent email in his stride. Another researcher complimented me on the approach when I later disclosed that the email had been written by Operator. âThatâs serious dogfooding!â they said â tech slang for testing experimental new products â although they declined to speak for this story because the funders of a project they were working on wouldnât let them.
The tech companies behind these AI agents present the technology as if it is an indefatigable digital assistant. But the truth is that, in my experience, we arenât quite there yet. Still, assuming the tech is going to improve, how should we view these new tools? To start with, it is worth pondering the commercial incentives that underpin all the hype, says Carissa VĂ©liz at the University of Oxford. âOf course, the AI agent works for a company before they work for you, in the sense that they are produced by a company with financial interests,â she says. âWhat will happen when there are conflicts of interest between the company who essentially leases the AI agent and your own interests?â Â
We can already see examples of this in the early AI agents: OpenAI has signed agreements with companies to collaborate on its system, so when searching for holiday flights, Operator may prefer Skyscanner over competitors, or turn first to the Financial Times and Associated Press if you ask it about the news. VĂ©liz also suggests users consider privacy concerns before leaping headfirst into using agentic AI, given the techâs access to our personal information. âThe essence of cybersecurity is to have different boxes for different things,â says VĂ©liz â using unique passwords for online banking and email, for instance, and never saving those passwords in a single document â but to use an AI agent, we must break down the barriers between those boxes. âWeâre giving these agents the key to a system in which everything is connected, and that makes them very unsafe,â she says. Â
It is a warning I can appreciate. I wasnât particularly happy that my trial with Operator necessarily involved ceding control of my email and accounting software to the AI agent â and my level of unease hit new heights when I asked Operator to order the dish of kung pao chicken on my behalf. At one point, the AI agent asked me to type my credit card details into a computer window that had popped up in the Operator chatbot interface. I reluctantly did so, even though I felt I didnât fully control the window and that I was placing an enormous amount of trust in Operator.Â
Moreover, as things stand, it isnât completely clear that AI agents have earned such trust. By definition, they tend to âaccess a lot of tools and interact a lot more with the outside worldâ, says Mehrnoosh Sameki, principal project manager of generative AI evaluation and governance at Microsoft. This makes them vulnerable to certain types of attack. Â
Tianshi Li at Northeastern University in Massachusetts recently looked at six leading agents, and studied those vulnerabilities. She and her team found that agents could fall prey to relatively simple tricks. For instance, deep within the text of a privacy policy that few people would read, a malicious actor might hide a request to click a link and insert credit card details. Liâs team found that an AI agent wouldnât hesitate to carry out the request. âI think there are a lot of very legitimate concerns these agents might not act in accordance with peopleâs expectations,â she says. âAnd there is no effective mechanism to allow people to intervene or remind them of this possibility and to avoid the possible consequences.â Â
OpenAI declined to comment on the concerns raised by Liâs research â although my experience using Operator suggests the company is aware of the trust-and-control issue. For instance, Operator seemed to go out of its way to constantly ping me notifications to check if the actions it wanted to take aligned with my expectations. The inevitable downside to that strategy, however, is that it made me feel that I was devoting so much time to micromanaging the agentâs work that I would have been quicker just performing the tasks myself.Â
âWeâre still [in the] early days in a lot of these agentic experiences,â admits Colin Jarvis, who leads OpenAIâs deployed engineering team. Jarvis says the current crop of AI agents are far from achieving their full potential. âIt still needs quite a bit of work to get that reliability,â he says. Â
Butterfly Effect made a similar point. When I reached out to the firm to discuss my problems using its agent, I was told that âManus is currently in its beta stage, and we are actively working on optimising and improving its performance and functionalityâ.Â
Tech firms have arguably been struggling to get agentic AI working for several years. In 2018, for instance, Google argued that a version of an AI agent it had developed, called Duplex, was going to change the world. The company touted Duplexâs ability to call up restaurants and reserve tables for its customers. But, for reasons unknown, it never took off as an everyday tool with widespread appeal. Â
Nevertheless, AI companies and tech analysts alike say the agentic AI revolution is just around the corner. The number of mentions of agentic AI on financial earnings calls at the end of last year was 51 times greater than it was in the first quarter of 2022. The interest here is not merely in using agents to assist human employees, but also to replace them. For example, companies including Salesforce, which helps businesses manage customer relations, are rolling out AI agents to sell services. Â
Stone doesnât think the technology is quite ready for that kind of application. âThereâs a lot of overhype right now,â he says. âItâs certainly not going to be within the next few years that all jobs are gone or that autonomous agents are doing everything.â To make good on the most ambitious claims, he says, âfundamental algorithms⊠would need to be discoveredâ.
Enthusiasm may be high because tools like ChatGPT perform so well that they have raised expectations of what AI can achieve more generally. âPeople have extrapolated to say, âOh, if they can do that, they can do everything,ââ says Stone. Certainly, I found that agentic AI can work extremely well â some of the time. But Stone says we shouldnât infer from a few limited examples that AI agents can do it all.
On reflection, I am inclined to agree with him â at least until my version of Operator recognises that I consider no order from a Chinese restaurant truly complete without a side of prawn crackers.Â
Topics:If you often open multiple tabs and struggle to keep track of them, Tabs Reminder is the solution you need. Tabs Reminder lets you set reminders for tabs so you can close them and get notified about them later. Never lose track of important tabs again with Tabs Reminder!
Try our Chrome extension today!
Share this article with your
friends and colleagues.
Earn points from views and
referrals who sign up.
Learn more