When news broke last year that AI heavyweight OpenAI and Axel Springer had reached a financial agreement and partnership, it seemed to bode well for harmony between folks who write words, and tech companies that use them to help create and train artificial intelligence models. At the time OpenAI had also come to an agreement with the AP, for reference.
Then as the year ended the New York Times sued OpenAI and its backer Microsoft, alleging that the AI company’s generative AI models were “built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.” Due to what the Times considers to be “unlawful use of [its] work to create artificial intelligence products,” OpenAI’s “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples.”
The Exchange explores startups, markets and money.
Read it every morning on TechCrunch+ or get The Exchange newsletter every Saturday.
The Times added in its suit that it “objected after it discovered that Defendants were using Times content without permission to develop their models and tools,” and that “negotiations have not led to a resolution” with OpenAI.
How to balance the need to respect copyright and ensure that AI development doesn’t grind to a halt will not be answered quickly. But the agreements and more fractious disputes between creators and the AI companies that want to ingest and use that work to build artificial intelligence models create an unhappy moment for both sides of the conflict. Tech companies are busy baking new generative AI models trained on data that includes copyright-protected material into their software products; Microsoft is a leader in that particular work, it’s worth noting. And media companies that have spent massively over time to build up a corpus of reported and otherwise created materials are incensed that their efforts are being subsumed into machines that give nothing back to the folks who provided their training data.
techcrunch.com