Home Page ContentPress Releases Public data collection is advancing, but still far from its full potential

Public data collection is advancing, but still far from its full potential

by Anthony Weaver

The web scraping industry is maturing both from the technology and business perspective, however, it still lacks proper regulation. For this reason, key market players are launching an Ethical Web Data Collection Initiative (EWDCI) to share best practices and advocate for common principles. These were some of the main takeaways from this year’s edition of the prominent industry conference — OxyCon.

Organized by a leading public web data gathering solutions provider Oxylabs, OxyCon connected global web scraping experts for a two-day online event. From practical tips for engineers to high-level panel discussions, the conference speakers reviewed the most recent developments in the field.

Allen O’Neill,  CEO and CTO at The DataWorks, argued that while the web scraping industry has been developing rapidly over the years, there’s still so much potential left for the future: 

“The web scraping industry hasn’t even scratched the surface with its potential yet. There will be many new unicorns in the industry in the upcoming ten years – those who will be able to harness the power of information extraction (not data extraction, but information extraction) and use that to gain insights that have never been seen before”, – said Allen.

The fast growth of the industry was illustrated by scaling being the hottest topic at OxyCon. Karsten Madsen, CEO at SEO company Morningscore, shared the story of his team moving from small data requests to having to compete with SEO industry giants. According to him, it’s not always about having the most data or the smartest data – it’s about having smarter algorithms to manage it. 

Glen De Cauwsemaecker, Lead Crawler Engineer at OTA Insight had another tip for scaling data operations: “Be pragmatic and look for cost-reward balance”, – he recommended to the fast-growing data companies.

Besides the technical challenges of scaling, legal issues are also often close to the top of the list of concerns. The participants of the panel discussion “Lawyers discuss scraping” emphasized the ambiguity and many unclear areas that come with the lack of proper industry regulation. As a result, the industry itself must be proactive in safeguarding it from within and sharing best practices among each other.

In this light, Christian Dawson, Executive Director at I2Coalition made an announcement of a new web scraping industry initiative. I2Coalition, together with 5 public data aggregators – Oxylabs, Zyte, Smartproxy, Coresignal, and Sprious has launched an Ethical Web Data Collection Initiative (EWDCI). The aim of the group will be to promote the industry’s best practices and advocate for beneficial technical standards.

OxyCon is an annual knowledge sharing opportunity for the global web scraping community. The sessions from this year’s edition are available to watch on demand here

– ENDS –

About Oxylabs

Established in 2015, Oxylabs is a premium proxy and public web data acquisition solution provider, enabling companies of all sizes to utilize the power of big data. Constant innovation, a large patent portfolio, and a focus on ethics have allowed Oxylabs to become a global leader in the data acquisition industry and forge close ties with dozens of Fortune Global 500 companies. In 2022, Oxylabs was named the fastest-growing public data gathering solutions company in Europe in the Financial Times’ FT 1000 list.

Related Articles

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More