On Thursday, November 7, a New York district court dismissed a copyright lawsuit against OpenAI that accused the company of misusing copyrighted material to train ChatGPT. News outlets Raw Story and AlterNet filed the copyright infringement lawsuit in February. The district judge ruled that the news outlets had not shown sufficient harm caused by ChatGPT but is allowing the news outlets to refile the lawsuit if they can show more substantial evidence. See https://ipwatchdog.com/2024/11/08/barks-reports-finds-record-number-global-patent-filings-itc-finds-semiconductor-compan/id=183027/
My thoughts:
The court held that just because AI removes your copyright info doesn’t mean you’re automatically harmed. You need to show real damage, not just theoretical harm. The court pointed out that while OpenAI may have removed the information during training, there was no evidence this directly harmed the news organizations.
But how do we measure such harm? How do we know if in the future, AI would not use those materials to harm them—should the companies only wait until then? The court also addressed this future harm, but they were skeptical. Given the massive amount of training data ChatGPT uses, they didn’t see a “substantial risk” of AI specifically copying these organizations’ articles.
So, does having more training data actually protect AI companies from liability?