AI

AI Chatbots Are Web Scraping News Outlets and Copyrighted Content: 7 Alarming Dangers for the Industry

By Scott Faulkner5 Mins Read

AI Chatbots Are Web Scraping News Outlets and Copyrighted Content: 7 Alarming Dangers for the Industry

A prominent news media trade group, News Media Outlet has raised concerns about AI technology companies that engage in web scraping to train their chatbots and generative AI systems. The organisation, which represents nearly 2,000 media outlets in the United States, recently published research that highlights how companies like OpenAI and Google have utilised news, magazine, and digital media content for training their AI systems.

Notably, the research revealed that these AI companies have instructed their bots to place significantly more trust in information sourced from reputable publishers, as opposed to content from other sources on the internet. This raises important questions about the use of copyrighted news material and the credibility of AI-generated information.

“The research and analysis we’ve conducted shows that AI companies and developers are not only engaging in unauthorised copying of our members’ content to train their products, but they are using it pervasively and to a greater extent than other sources,” said Danielle Coffey, in a statement.

Also Read: Xiaohongshu: Artists Plan Bold Boycott Against AI Image Generator on the App

Navigator

What Is Web Scraping?

“This acknowledgment underscores their awareness of the distinctive worth of our content. However, it’s crucial to note that many of these developers are not acquiring the necessary permissions through licensing agreements or providing compensation to publishers for utilising this content,” Coffey emphasised. “This failure to respect high-quality, human-generated content negatively impacts not only publishers but also the long-term viability of AI models and the accessibility of dependable, credible information.”

In their published white paper, the trade group unequivocally dismissed arguments suggesting that AI bots have merely “Learned” facts in the same manner as humans do, by absorbing information from various datasets. The group asserted that forming such a conclusion is “Inaccurate” since these models retain the expressions of facts present in the works included in their training materials (Which are protected by copyright) without genuinely comprehending the underlying concepts.

Publishers, who have been engaged in a sort of Cold War with AI companies, have recently begun implementing defensive measures to safeguard their content. In August, a review by Reliable Sources revealed that a dozen prominent media firms had embedded code into their websites to protect their content from AI bots that scrape the internet for information. Furthermore, many more publishers have since adopted similar protective measures to preserve the integrity of their content.

Related: Qatar Asks AI Assist in Judicial System: Beware of the AI Gavel

Unlawful News Scraping Has Got to Stop

"After discovering an obscure website was using AI bot scraping to generate news articles, Redditors laid a trap."https://t.co/CwaT5sIrpX pic.twitter.com/7mb2sIu7fr
— mike cook (@mtrc) July 21, 2023

Indeed, these defensive measures primarily focus on safeguarding news organisations from future web scraping activities. Unfortunately, they do not address the issue of prior scraping, which, as noted by news outlets, has been used to train AI bots. To address this challenge, the News Media Alliance has put forward a set of recommendations aimed at preserving the place of news publishers in this rapidly evolving landscape.

These recommendations call for policymakers to acknowledge that the unauthorised use of copyrighted material for training AI bots constitute infringement, and they emphasise the importance of allowing publishers to efficiently license the use of their content under fair terms.

“Our culture, our economy, and our democracy require a solution that allows the news and media industry to grow and flourish, and both to share in the profit from and participate in the development of the GAI revolution that is being built upon the fruits of its labor,” the News Media Alliance said.

For the latest AI news, check out player.me/category/ai/.

Why Is News Scraping Bad for the Industry? 7 Potential Dangers

AI Chatbots Are Web Scraping News Outlets and Copyrighted Content: 7 Potential Dangers for the Industry

Web scraping, the automated process of extracting news content from websites and other sources, is considered problematic for several reasons:

1. Copyright Infringement

Many news articles, images, and videos are protected by copyright law, meaning that they are owned by the original content creators or publishers. Web scraping without proper authorisation can infringe upon these copyrights, as the scraper may use this content without permission.

2. Revenue Impact

News organisations rely on various revenue streams, such as advertising, subscriptions, and content licensing, to support their journalism. Web scraping can undermine these revenue models by freely reproducing content that should be subject to licensing fees or advertising placements.

AI Chatbots Are Web Scraping News Outlets and Copyrighted Content: 7 Alarming Dangers for the Industry

3. Credibility and Accuracy

Scraped content may not always be properly attributed or may be taken out of context. This can affect the credibility and accuracy of the information presented, potentially misleading readers or distorting the original intent of the content.

4. Privacy Concerns

Web scraping can also raise privacy concerns if personal or sensitive information is extracted and used without consent. This may apply to both the individuals mentioned in news articles and the website visitors themselves.

AI Chatbots Are Web Scraping News Outlets and Copyrighted Content: 7 Potential Dangers for the Industry

5. Resource Drain

Frequent web scraping can put a strain on the resources and infrastructure of news websites, potentially leading to slower loading times, increased server costs, and other technical issues.

6. Content Manipulation

Some scrapers use the content for malicious purposes, such as generating fake news or spamming online forums and social media with misleading information.

7. Ethical Considerations

Engaging in web scraping without proper authorisation may be seen as unethical, as it may violate the principles of respecting intellectual property, fair use, and the terms of service of websites.

In other news, check out the latest 4 features for the beta of ChatGPT 4.0.

The Good of the People Is the Greatest Law

To address these concerns, there are ongoing discussions and legal battles surrounding web scraping, focusing on the need to strike a balance between data access for legitimate purposes (Such as data analytics) and the protection of content creators’ rights. Data protection is essential to safeguard individuals’ privacy and prevent unauthorised access or misuse of sensitive information. What are your thoughts on this, do you think web scraping is bad or an essential tool for work efficiency?

Author Profile

Scott Faulkner

Latest entries

NEWS2024.03.18Elon Musk’s SpaceX Ventures into National Security to Empower Spy Satellite Network for U.S.
GAMING2024.03.17PS Plus: 7 New Games for March and Beyond
GAMING2024.03.17Last Epoch Necromancer Builds: All You Need To Know About It
AI2024.03.16The Impact of Super AI: Blessing or Curse?

Visited 7 times, 1 visit(s) today

Artificial Intelligence Web Scraping

Previous ArticleMusk’s Innovative X: The Ultimate “Everything App” Transforms From Dating to News Distribution

Next Article Top 6 Studios with the Most Layoffs

Scott Faulkner