From Assistant to “Agent”: The Next Generation of AI

The next big thing in artificial intelligence (AI) is shaping up in fascinating ways: it’s AI agents, capable of performing more complex and autonomous tasks. These new tools are transforming the technology landscape and could redefine our relationship with technology.

The new horizon of artificial intelligence

By: Gabriel E. Levy B.

In the world of technology, few events generate as much buzz as the annual conferences of large companies. This year, at the Google I/O conference, the focus was on a term that promises to revolutionize our interaction with technology: AI agents. The presentation of Astra, Google’s new agent, marked a milestone by demonstrating its ability to interact through audio and video. But Google isn’t alone in this race. OpenAI, with its GPT-4o model, is also joining the creation of AI agents, and other tech giants are following suit.

The concept of AI agents is not new, but its evolution is. According to Melissa Heikkilä in MIT Technology Review, tech companies are investing considerable sums in its development, which could make the AI we’ve envisioned for decades a reality. But what exactly are these AI agents, and how will they transform our lives?

What are AI agents?

AI agents are models and algorithms designed to make autonomous decisions in a dynamic world.

Jim Fan, a researcher at Nvidia, explained in a recent article for Mit technologyreview, that these agents can execute a wide range of tasks in a similar way to a human assistant.

Imagine an agent who not only books your vacation, but remembers your preferences, suggests hotels and flights, plans your itinerary, and even notifies your friends at the destination to coordinate meet-ups. This view of AI agents promises more personalized and efficient assistance.

These agents could also revolutionize the business and government sphere.

David Barber of University College London noted in the same post that they could significantly improve internal processes, functioning as much more advanced customer service bots than the current ones.

While current wizards are limited to generating probability-based text, AI agents have the ability to act autonomously, processing complex tasks without supervision.

The types of AI agents

Jim Fan distinguishes two main categories of AI agents: software agents and embedded agents. Software agents work on devices such as computers and mobile phones, executing office tasks and event automation.

These agents are especially useful in enterprise applications where they can automate email management, schedule meetings, manage databases, and perform complex data analysis.

For example, a software agent could automate the financial audit process in a company, analyzing thousands of transactions and detecting possible irregularities without human intervention.

Another example is virtual personal assistants, such as Siri and Alexa, which are already integrated into home devices to manage everyday tasks such as setting alarms, controlling smart home devices, and providing real-time information.

On the other hand, embedded agents interact in three-dimensional environments, such as video games or robots, and can perform physical and reasoning tasks in the real world.

These agents are designed to understand and navigate complex environments, allowing them to execute physical and cognitive tasks that are beyond the capabilities of software agents.

A notable example is MineDojo, an agent developed for the game Minecraft, which learns and executes complex tasks within the game.

This agent not only performs simple actions, but can also develop strategies and adapt to new situations within the virtual world.

Another example of built-in agents is service robots, such as home cleaning robots that not only clean the floor, but can also map and remember the layout of a house to optimize its cleaning route.

In the industrial realm, embedded agents are used in factory automation, where robots assemble products, manage inventories, and perform quality inspections with accuracy and efficiency that surpasses those of humans.

Additionally, in the healthcare sector, embedded AI agents are starting to assist in surgeries, providing doctors with AI-assisted precision during complex procedures. These agents can analyze medical images in real-time and guide surgeons in making critical decisions.

These examples demonstrate the vast potential of AI agents to understand and operate in complex environments, which is essential for their real-life application.

The ability of these agents to learn, adapt, and execute autonomous tasks in various contexts points to a future where artificial intelligence will not only assist humans in specific tasks, but also transform entire industries by taking on increasingly sophisticated and essential roles.

The current challenges and limitations

Despite the advances, AI agents still face significant challenges. Kanjun Qiu, CEO of Imbue, compares the current state of AI agents to that of self-driving cars a decade ago: promising but not entirely reliable.

A coding agent, for example, can generate code but still needs human supervision to ensure its accuracy and functionality.

Another limitation is the ability of agents to maintain context over time. Current systems can lose sight of the task they are working on, limiting their effectiveness in lengthy processes.

To solve this, Google is increasing the data processing capacity of its models, so that they can maintain longer and contextually consistent interactions.

Specific application cases

Despite the limitations, we are already seeing practical applications of these agents in various areas. For example, more advanced customer support agents are already being implemented by businesses to handle inquiries and complaints more efficiently. These agents can analyze customer emails, verify information against databases, and make decisions based on business policies, all without human intervention.

In the realm of workflow automation, tools like Zapier are starting to incorporate AI agents to streamline processes. These agents can manage repetitive tasks, allowing employees to focus on more strategic and creative activities.

Moreover, in the entertainment industry, AI agents are transforming video games. AI-controlled non-player characters, such as those developed in MineDojo, make games more dynamic and immersive. These advancements not only improve the user experience, but also open up new possibilities for training and learning through simulated environments.

In conclusion

The next generation of AI agents promises to revolutionize our interaction with technology, making our lives more convenient and efficient.

Although they still face significant challenges, progress is undeniable. With continuous investments and efforts, it is only a matter of time before these agents become an integral part of our day-to-day lives, performing complex and personalized tasks with unprecedented autonomy. The age of assistants has given way to the age of agents, and the future of AI has never looked more promising.