New York Times: Don't use our content to train AI systems

Many large language models are trained using website content without permission – and some brands are demanding compensation.

Danny Goodwin on August 10, 2023 at 9:47 am | Reading time: 2 minutes

Although Google wants all online content available for AI training, the New York Times clearly wants to opt out.

The Times has changed its terms of service, aiming to prevent AI companies from using the media organization’s content to train their systems.

Why we care. Many large language models are trained using website content (see: Search the 15.7 million websites in Google’s C4 dataset). While Google is exploring alternatives or supplemental ways of controlling crawling and indexing beyond robots.txt, many brands (e.g., Reddit) are making it clear right now they don’t want their content used to improve the products and increase the profits for Google, Microsoft and OpenAI – at least not without compensation. You may want to consider adding some similar AI-related messaging to your website’s terms page.

What has changed. The New York Times updated its terms of service page Aug. 3. It includes AI-specific additions that apply to its content (which it defines as “including, but not limited to text, photographs, images, illustrations, designs, audio clips, video clips, ‘look and feel,’ metadata, data, or compilations”).

In the “Prohibited use of the services” section:

(3) use the Content for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.

Will AI companies compensate publishers? OpenAI and the Associated Press signed a deal last month. OpenAI licensed the AP’s news article archive dating back to 1985 for training.

Google and the New York Times Co. already have a lucrative “commercial agreement” in place, but that deal is about working together on “tools for content distribution and subscriptions.”

Microsoft is also promising publishers some sort of revenue sharing. However, most of the benefits will apparently go to members of its Start program.

Many large language models are trained using website content without permission – and some brands are demanding compensation.

Recommend

WordPress Releases Version 6.3 “Lionel” | Web Designer Depot

How to limit iPhone Photos app access for apps with iOS 17

Nature：中科大南大「自然指数」排名超清北，中大山大近三年自然科学论文产出增长神速

Backend Engineer - Systems/Data

Top Ten Reasons to go to Chain React

Quake II remaster now available on all platforms with enhanced visuals, AI, and...

Corsair’s latest keyboard is its first with magnetic Hall effect switches

The Venture Activity Index

Notable Graph API additions for Summer 2023

BNB Chain Gears Up for August Upgrades: Here’s What You Need to Know

About Joyk