GATO: the promise of true artificial intelligence

Although there is much talk about the concept of artificial intelligence (AI), in reality it is more a collection of good intentions than a real breakthrough in the field. This could be changing thanks to new emerging technologies, which seem to challenge the known limits in the field of autonomous algorithms.

What is GATO and what makes it different from other AIs?

By Gabriel E. Levy B.


The US company DeepMind recently unveiled a new “generalist” artificial intelligence (AI) technology, which it has dubbed GATO. This company, dedicated to the advancement of AI and with several ongoing programs, aims to offer new ideas and improvements in machine learning, engineering, simulation and computing infrastructure. [1]

According to the scientists in charge of the GATO project, who operate in the artificial intelligence laboratory of Alphabet, the same group that owns Google, this model has very relevant new autonomous capabilities, such as playing Atari video games, captioning images, chatting and stacking blocks with a real robotic arm. According to its creators, GATO can perform up to 604 different tasks, including many of them simultaneously.

DeepMind explains that GATO is trained on a large number of image, natural language and other datasets that comprise the agent’s experience in simulated and real-world environments.

Nando de Freitas, a principal investigator at DeepMind and co-author of the GATO paper, posted on his Twitter account “The game is over!”, suggesting that GATO’s path to autonomous artificial general intelligence (AGI or AGI) is an indisputable reality. [2]

GATO, like all AI systems, learns by example, incorporating billions of words, real-world images and simulated environments, button presses, joint twisting and more, in the form of tokens.

The downside is that CAT does not perform tasks as well as those models that can only perform one action. Robots have yet to learn “common sense” about how the world works from text, explains Jacob Andreas, an assistant professor at MIT specializing in artificial intelligence and natural language and speech processing, in a recent article in MIT Technology Review. [3]

Similarly, a rigorous review by scientists at the same institution found that GATO’s architecture is not that different from that of many of the AI systems in use today. [4] It does, however, present a breakthrough: it operates as a multimodal, multitasking, multicorporation network, meaning that the same network (i.e., a single architecture with a single set of weights) can perform all tasks, despite inherently involving different types of inputs and outputs. [5]

The difference with other AI

Artificial intelligence systems available on the market, and already adapted to digital life, are good at performing a specific task. Some have become famous for beating the best of humans in games as complex as chess or Go, while others go unnoticed, although they are present in our daily lives, such as Spotify’s algorithm, which generates music recommendations automatically with great precision.[6]

The so-called AI that is currently available is known as weak or narrow artificial intelligence and is efficient when it comes to processing large volumes of information in a specific way. But that same AI capable of beating the world champion in Go cannot do anything else but play this game.

Thus, the main difference between GATO and these other AIs is its ability to efficiently and functionally perform multiple tasks, thus demonstrating the versatility of transformer-based architectures for machine learning and showing how these architectures can be adapted to a variety of tasks.

In that sense, many specialized neural networks, which exist in laboratories, can play games, translate text, caption images, among other activities. GATO has the ability to perform all these tasks by itself, using a single data set and a relatively simple architecture. This is in contrast to specialized networks that require the integration of multiple modules to work together, the integration of which depends on the problem to be solved.

The GATO results also support previous findings by other scientists that training from data of a different nature can result in better learning of the information provided.

The context provided by humans

As with the announcement of new artificial intelligence technologies, it is pertinent to ask whether this or any future development could surpass human intelligence.

In this regard, as we analyzed in previous articles, in the book Framers. Human Virtue in the Digital Age, written by Kenneth Cukier, Viktor Mayer and Francis de Véricourt, the authors state that the ability to make sense of information is the turning point between machines and humans.[7]

For researchers, only people can formulate new questions in the same context and change the frame of reference; in other words, establishing frames of interpretation is a quality unique to human beings: we create mental models that we use to understand the most complex problems or the most disruptive activities, those that demand creativity, critical thinking and innovation.

“Our minds are full of frames. That’s how we think. Frames can be simple or sophisticated, precise or imprecise. But they all capture some aspect of reality. And because of them we can explain, focus, and decide.” [8]

Democracy is a framework, as is a monarchy; religion is a framework, as is secular humanism; the rule of law is a framework, as is the notion of acting rightly; racial equality is a framework, as is racism.

The problem with machines, robots, algorithms and artificial intelligence systems is that they are incapable of framing correctly. This view of the aforementioned authors is shared, since 1969, by John McCarthy, one of the promoters of the artificial intelligence concept. In this regard, McCarthy published an article entitled “Certain Philosophical Problems from the Perspective of Artificial Intelligence”, in which he made it clear that the greatest problem facing the technological nation was based on the philosophical plane, since, in the author’s words, there was an inability to “define frameworks or contexts”.[9]

The philosopher and cognitive scientist Daniel Dennett also agrees with these approaches and in an article entitled “Cognitive Wheels. The problem of the framework of artificial intelligence”, he explained how, through an experiment, he used all the resources at his disposal to make a robot make an appropriate decision in a particular context; however, all his efforts were in vain. The author concludes that:

“Machines are capable of performing a lot of logical calculations and processing a wide range of data, but they definitely can’t frame.

Framing, that is, capturing part of the essence of reality through a mental model in order to map out an effective plan of action, is a uniquely human ability, not a machine one.”[10]

In conclusion, the new technology developed by Alphabet, through the DeepMind lab, and named GATO, individually failed to prove superior to other existing AI technologies. However, by executing multiple different tasks, it demonstrated for the first time that a generalist type of AI is possible, simulating much more closely the thinking of the human brain. However, as Cukier, Mayer and Véricourt explain in the book Framers, the ability to create mental maps, contexts and frames remains unique to humans, at least for now.

[1] INFO Q – Specialized article on the new GATO technology
[2] MIT TR article on the new GATO technology
[3] MIT TR article on the new GATO technology
[4] MIT TR article on the new GATO technology
[5] MIT TR article on the new GATO technology
[6] Análisis del medio especializado El Confidencial
[7] Cukier, k., Mayer, V. and de Véricourt, F. (2021). Framers. Human virtue in the digital age.
[8] Cukier, K., Mayer, V. and de Véricourt, F. (2021). Framers. Human virtue in the digital age.
[9] McCarthy, J. (1969). “Some philosophical problems from the standpoint of Artificial Intelligence,” in Machine Intelligence 4, available at Edinburgh University Press.
[10] Dennett, D. (2007). “Cognitive wheels. The framing problem of artificial intelligence,” in Philosophical Readings in Cognitive Science, pp. 317-348, available at