Big Data, are the new creators algorithms?

Big Data is a complex concept of Anglo-Saxon origin widely spread in the information age, which is used to describe a set of information as large and complex that requires specialized computer systems, both software and hardware, for the efficient processing of such data in real time.

Big Data systems allow companies to know their audiences better, determine niches, offer better and more appropriate content for each user, geolocate offers and advertising, among many applications that are revolutionizing the contemporary economy.

How has Big Data impacted the audiovisual and content industry?

The concept of Big Data does not belong to a single author, the first papers date back to 1983 when the Japanese Takuya Katayama wrote an article called: “Treatment of Big Values in an applicative language HGP” and it is believed that it was the first time that the term Big Data was referenced in an academic article. Later, in the last decade of the 20th century, John Mashey, a researcher with a Ph.D. in computer science from The Pennsylvania State University, popularized the concept in several references.

In 2003 a text published by google, defines a model of distributed files called: “Google File System (GFS)” which can be used to manage the Big Data collected by the search engine, making the concept a reality.

In 2005 the “Apache Doug Cutting” engineers brought the principles of Google to the creation of the first Big Data Open Source platform, which they called “Hadoop” and which served as inspiration and reference for contemporary Big Data systems, and on which a huge ecosystem of tools has grown that use it as their main structure.

Nowadays, the massive data management requires three big variables, known as the 3V’s:

Volume: Refers to the amount of data that must be collected and processed.

Velocity: Refers to the speed at which the volume of information must be collected and processed.

Variety: Refers to the diversity of the type of information that must be collected and processed, such as text, numbers, algorithms, equations, video, audio, obscure data, etc.

Practically all business decisions made today by large companies are based on “Big Data” models, which led to the emergence of a new concept: “Business intelligence“, which allows organizations to decide which strategies to implement, based on data collected by the Internet from consumer habits, preferences, tastes, thoughts, behaviors, etc.

Are we able to imagine a future in which the data provided by servers are the input to create scripts or audiovisual stories?

It is not necessary to imagine it, because it already happens. The compilation and analysis of large amounts of data, which we know as Big Data, is already a very important part of the world of telecommunications.

Yet the most amazing application is the one that allows the creation of entire series. And the best example is at high level. In 2011, the video by catalog company Netflix, which was just consolidating itself in the OTT market of video on demand, understood that it was fundamental to be more than just a content aggregator and distributor and instead to become a series and film generator.

An article by analyst Roberto Baldwin in Wired magazine explains how Netflix decided to invest 100 million dollars at the beginning of 2012 to commission two complete seasons of the American remake of the British mini-series House of Cards, produced by BBC in 1990. The production would adapt the drama, which originally took place in London, to the political intrigues in the White House, the seat of the U.S. government.

To risk such enormous capital without viewers having given their approval to a single chapter of the remake would seem madness, but the company had great faith in the touchstone of the 21st century: the data derived from the original series.

The strategy was simple in its approach, but enormously complex in its execution. The company contracted an analysis of the data derived from the television viewing of the British series.

The analysis revealed that people who had seen complete seasons were also likely to watch films starring Kevin Spacey (such as American Beauty or The Usual Suspects) and also liked films directed by David Fincher, one of the producers of the new saga and director of the first episodes. In addition, the analyzed data revealed details of tastes about dramatic structures and favorite characters.

With all this information, Netflix started to plan its remake in the safest possible way. In short, instead of doing things the traditional way (by testing a product and evaluating the taste of the viewers) they made a series on demand: adapted to the previous and already known tastes of the subscribers.

Of course, the company did not start from scratch: even the series is not an original plot. But the revolution triggered by the strategy is still going on. With the same methodologies, the American versions of series like Black Mirror were planned. Other video-on-demand companies are taking advantage of the huge benefits of being able to track the consumption patterns of their subscribers on a second-by-second basis.

The most important aspect is that this case demonstrates that data is not the touchstone of television, but only its substrate, the material that will be transformed and that will enrich its owners. The real magic is created by analytics, which are the algorithms of data mining, analysis and visualization. These are produced and controlled by specialized companies that are beginning to abound in the world nowadays and whose work a good part of the commercial TV business depends on.

A similar fear is beginning to become real in the music industry, where the major record companies now rely on programs to generate the catchy song of summer. The algorithms analyze the parts of successful songs and hits of the moment that are more repeated, sung, shared or mixed in discos and OTT platforms such as YouTube.

They generate new songs with the chords of these sections, which are increasingly similar to the previous ones, but which almost guarantee the same reception and become hits that many critics consider to be of very low quality, but which the music industry appreciates as its lifeline in times of commercial uncertainty such as the ones we are experiencing.

The Amplified Rating Tyranny

But not everything is a “fairy tale”, since the closest precedent is the RATING indicators, a variable that has historically served to know the audiovisual consumption habits of viewers and which many commercial television networks base their decisions on for the content design and programming, and the results have not necessarily been the best, since according to many academic analysts, the rating triggers the production of the so-called “Trash Content”, which refers to audiovisual productions based on very predictable stories, where the component of sex and violence is used to attract audiences, with very conventional plots and resorting in many cases to produce the so-called human “morbid curiosity”, causing the emergence of thousands of industrially produced content, where the story is always the same and only characters and contexts change.

Without a doubt, the background of the Rating must be a campaign warning that not always the best content is what the audiences massively want, because if history has shown anything, it is that the best productions have emerged from the disruptive spirit of their creators, from daring to propose content out of the ordinary, to innovate and propose new forms and that is hardly achieved by fulfilling the whims of consumers.

What we definitely may not allow is that the “Big Data” in the design of audiovisual content, replace human creativity by the dictates of algorithms, whose sole purpose is to generate a highly consumable product.


Gabriel E. Levy B.

Sergio A. Urquijo M.