The “Scraping” phenomenon and its worrying global escalation

Recently the case of a hacker who captured the information of 700 million LinkedIn users worldwide and put it on sale for about five thousand dollars on the Dark Web was known, this phenomenon has been called “Scraping” and although it can not be defined precisely as a crime, it has become a headache for large Internet technology companies.

What is Scraping and how does it impact users?

By: Gabriel E. Levy B.

Web Scraping or simply Scraping, literally translated as “web scraping“, is a computer technique that, by means of specialized software programs, extracts information from websites, taking advantage especially of indexing systems that use robots or automatic codes for the organization of information, a technique used by practically all search engines and websites that store large amounts of information or Big Dada.

It is important to clarify that the Web Scraping technique can be used for legitimate purposes such as data indexing or malicious purposes such as data theft[1].

As its name suggests, Scraping is done by “scraping” the public surface of platforms using automatic programs to take any content that is available on users, systematically storing the information obtained step by step.

Scraping, unlike other criminal activities carried out by hackers, is not about leaking protected or confidential information, such as a password or a document number, but about massively obtaining the public information of users, to be later commercialized, either for extortion, commercial or marketing purposes.

In a practical sense, the main difficulty with Scraping is that it promotes activities that are harmful to Internet users, based on the massification of individualized data.

Cyber security expert: Troy Hunt, who was asked by the BBC in a recent interview, stated that:

“It’s definitely not about breaches. Most of this data is public anyway.”… “The question to ask in each case is how much of this information is publicly accessible by user choice and how much is not expected to be.”[2] Troy Hunt.

A scheme that demands a lot of patience

Obtaining information through the Scraping technique is like filling a pool drop by drop, which would be practically impossible to achieve, if it were not for the use of very specialized software used by hackers to collect the information in larger packages, however, it takes a long time and could be defined as a craft technique within computer science.

The self-styled: “Tom Liner“, a hacker, whose origin and real name are unknown, recently compiled in a database the information of 700 million LinkedIn users from all over the world and put it on sale for about US$5,000[3].

The Hacker stated in an interview conducted by BBC journalist Joe Tidy that it was very time-consuming to obtain the information:

“It took me several months to do it. It was very complex. I had to hack the LinkedIn API. If you make too many user data requests at the same time, the system permanently vetoes you,” Tom Liner.

The data market

Data brokers, also known as data sellers or marketers, are individuals or companies that collect information from consumers through algorithms, either with or without their permission or that buy it in the markets, legally or illegally, and sell it to a third party interested in obtaining it, for multiple purposes, legal or not.

Pirates who use the Scraping technique to build databases will always find potential customers in the Dark Web, as this is a growing market, where there are always companies, individuals and organizations willing to pay for information.

A phenomenon exacerbated by the Pandemic

The prolonged confinement that resulted from the current pandemic, significantly and exponentially increased the use of the Internet globally, which in turn also triggered a greater generation of data by users, both in virtual purchases, use of platforms and devices, as well as in the increase of the digital footprint, This has boosted the business of data commerce, even for many people who until now resisted the use of digital platforms, but because of the pandemic were forced as the only mechanism of communication to be connected and therefore having to sacrifice their privacy.

The current pandemic has represented a major turning point, as it is no longer about the Internet as an alternative, but the Internet as the primary means of human communication.

Companies minimize impact

LinkedIN’s recent statements regarding the action taken by the hacker “Tom Liner” are limited to emphasizing that there was no leak of sensitive data and that in the end it was public information:

“This was not a LinkedIn data breach and no LinkedIn member’s private data was exposed. LinkedIn data mining is a violation of our Terms of Service and we are constantly working to ensure that our members’ privacy is protected.” official LinkedIn Press Release

While both LinkedIN now, as Facebook in the past, have tried to minimize the impact of such actions, making it look like public data, it is inevitable to hide that represents a risk to all parties and is ultimately the credibility of these companies, which ends up compromised and many users prefer to simply stop posting content, which is why, large Internet technology companies, should take much stronger actions to prevent such actions are repeated.

In conclusion, although strictly speaking Scraping cannot be considered a criminal activity, there is a direct harm to users who see their information exposed for extortive or commercial uses, while large Internet technology companies lose credibility with their users, which is why it is convenient that the issue receives more attention from the authorities and the technology companies.

[1] Antevenio specialized blog article on the Web Scraping phenomenon.

[2] Tory Hunt in an interview with the British media BBC Mundo.

[3] News published by the British media BBC Mundo