The IET is carrying out some important updates between 17-30 April and all of our websites will be view only. For more information, read this Announcement

Mindaugas Čaplinskas, co-founder of proxy provider IPRoyal, discusses the process of web scraping, how it supports the internet and its value to digital businesses.

Web scraping, the practice of automatically extracting information from web pages at scale, is an often misunderstood concept. This is primarily because most of us deal with the end result of web scraping – the business side – and not with the underlying process.

The current iteration of the internet, however, would be almost impossible without the existence of web scraping and web crawling.

Many people are inherently suspicious of automated data extraction from the public internet, yet use the services enabled by web scraping every day.

How web scraping works

Web scraping relies on two essential turning points: automated access to websites, and proxies. Automated programs (often called bots) visit a website and download the HTML file to capture most of the information that’s visible on the website.

While the process seems simple...

  • The big question is how the output from search engines change when the basic web sources (used in the scraping process) radically change.

    Example the reduction in emphasis on  BLM (Black lives matter) and DEI (Diversity, equality and inclusion ) policies and initiatives, with the arrival of the new US administration.