The Challenge of Finding What We Are Looking for on the Web

Although Google has transformed the scope of Internet searches, making an exponential leap in the improvement of information classification and identification models and methodologies, as the inexhaustible production of data reaches almost infinite volumes of bytes, the world is urging for new technologies to efficiently manage the avalanche of information that is produced every second on the network.
Is Humanity Ready to Find What It Needs on the Network?
By: Gabriel E. Levy B.
www.galevy.com
During the next 60 seconds, while you are reading this paragraph, 208 thousand meetings will be taking place on Zoom, of which 20 thousand will be uploaded to the cloud, Facebook users will upload 147 thousand photos, 277 thousand stories will be published on Instagram, while YouTube users will upload 500 hours of video, 28 new songs will be recorded on Spotify, 41 million messages will be sent on WhatsApp, and 500 thousand tweets will be published on Twitter, while 600 new web pages will be created and launched[1]. In Google, 5 million searches will occur and 400 thousand applications will be downloaded in the App Store, and all this constitutes 4% of the visible and indexable Internet, since the remaining 96% is hidden in the Dark Web, with no measurable, traceable and calculable record of the information that circulates there, from banking transactions, data exchange between servers, and even criminal activities [2].
Humanity and its digital developments have become colossal information saturation machines, and as these numbers grow every minute, it becomes much more complex to find mechanisms that allow the management, classification, but above all, search and location of data.
An Obsolete Search Model
Until now, information searches on the Internet have been associated with metadata, i.e. a set of data describing the information content of a resource, file or information. In other words, it is a type of information that describes another type of information, which, of course, exponentially increases the amount of stored information [3].
When someone uploads a website to the Internet or an image to Instagram or a video to YouTube, when uploading the information, they include the Metadata or keywords, references, concepts or even Hashtags associated to the element they have published; this way, a search engine such as Google or Bing associates those words to the content, and when a person performs a search equal or similar to this metadata, the search engines show the content that has been published as a result, a model that is combined with many other variables, such as the correlation with the published content, the frequency of updating the information, the site’s score, the user experience, the amount of multimedia content, among many other factors, which all together constitute a type of strategy called SEO (Search Engine Optimization)[4].
The algorithms of search engines such as Google have been refined over time, systematically learning with every search that users perform, creating patterns of behavior, identifying priorities and interests, and making searches ever more efficient. However, there has always been one common element: Searches require the user to enter the key phrase, concept, word or image they need.
This model, which so far has proven to be functional, is insufficient and inefficient, which leads to a very small percentage of information being located, while millions of data are lost stored and unused in servers and computers, simply because there are no vectors that allow their effective location. In other words, billions of web pages, photos, texts, podcasts, videos and data in general, lack traffic, not because of lack of interest in the public, but because of the inability to be located and consumed by the public.
Elastic-search as an Alternative
In response to the need for better and more efficient web searches, an open source technology has emerged in recent years that seems to be very promising for solving information search and classification problems. It is Elasticsearch, a free and open distributed analysis engine for all types of data, including textual, numeric, geospatial, structured and unstructured. Elasticsearch is developed from Apache Lucene and is composed of a set of free and open tools for ingesting, enriching, storing, analyzing and visualizing all types of data, i.e. it does not only work with text and photos [5].
In an interview given by the founder of Elastic: Shay Banon, to the BBC in the United Kingdom, he stated that:
“Searching in the old days was about typing text. Not so today. Searching can involve swiping right, moving a map with your fingers or talking to an app”. Shay Banon [6]

Thanks to the technology developed by Banon, applications such as Tinder learn through the fingers of its users, that by swiping a photo to the right or to the left, they are delivering essential information to the social media to determine the type of interests and tastes of the user, and thus, the quality of the results of the subsequent swipes improves with each swiped photo.
However, Tinder is not the only company that is transforming the way we search for information on the Internet, as Netflix has completely changed the video search experience through a sophisticated algorithm that offers users content to be viewed without the need for them to enter any keyword or additional information, learning from each selection made by the user, the time spent on each content, among many other variables.
In the case of Uber, Elastic’s technology has allowed to connect drivers with users based on geo-referencing, historical and statistical traffic information; thus achieving that the driver with the statistical probability of the shortest distance time provides the service to the user who requires it, while other aspects such as qualification, experience, destination route, among many other variables, are evaluated in parallel.
In conclusion, the efficient indexing of information given the colossal avalanche of data that is produced every minute on the Internet, is one of the greatest challenges for the industry and the entire ICT sector in general, being one of the greatest technological challenges of contemporaneity.
Humanity is experiencing a paradox where the problem is not the existence of information but the inability to locate and manage it efficiently, which is why it is necessary to design new indexing or search technologies, being Elasticsearch the tool that promises to solve many of these problems in the coming years, while new developments and experiments are emerging in parallel.

[1] Infographic What happens on the Internet in one minute?
[2] BBVA article on the Dark Net and Dark Web
[3] Definition of Metadata in Power Data
[4] Blog about Digital Marketing and SEO strategies
[5] Elastic’s official website
[6] BBC World article

Disclaimer: The published articles correspond to contextual reviews or analyses on digital transformation in the information society, duly supported by reliable and verified academic and/or journalistic sources. The publications are NOT opinion articles and therefore the information they contain does not necessarily represent Andinalink’s position, nor that of their authors or the entities with which they are formally linked, regarding the topics, persons, entities or organizations mentioned in the text.

Reserve your Stand