AI Ethics in Question: OpenAI Accuses China's DeepSeek of Data Theft
OpenAI suspects that China's DeepSeek AI models, significantly cheaper than Western counterparts, may have been trained using OpenAI's data. This revelation, coupled with DeepSeek's rapid rise in popularity, triggered a market downturn for major AI companies. Nvidia, a key player in GPU technology crucial for AI, suffered the largest single-day stock loss in Wall Street history, losing nearly $600 billion in market capitalization. Other tech giants like Microsoft, Meta, Google, and Dell also experienced significant drops.
DeepSeek's R1 model, built on the open-source DeepSeek-V3, boasts a significantly lower training cost (estimated at $6 million) compared to Western models. While this claim is disputed, it fueled investor concerns about the massive investments in AI by American companies. DeepSeek's success, particularly its top ranking on U.S. app download charts, further amplified these concerns.
OpenAI and Microsoft are investigating whether DeepSeek violated OpenAI's terms of service by using its API to train its models through a technique called distillation—extracting data from larger models. OpenAI acknowledges that Chinese companies, and others, actively attempt to replicate leading U.S. AI models. They are actively employing countermeasures and collaborating with the U.S. government to protect their intellectual property.
Donald Trump's AI advisor, David Sacks, confirmed the suspicion that DeepSeek employed distillation, a practice OpenAI considers a violation of its terms. This situation highlights the ongoing debate surrounding the use of copyrighted material in AI model training.
The irony is not lost on observers, given OpenAI's own history. OpenAI previously argued that creating AI models like ChatGPT is impossible without using copyrighted material, citing a broad definition of copyright encompassing various forms of human expression. This stance is further supported by their submission to the UK's House of Lords and their ongoing legal battles. The New York Times and 17 authors, including George R. R. Martin, have filed lawsuits against OpenAI and Microsoft, alleging copyright infringement. OpenAI defends its actions by claiming "fair use."
The legal landscape surrounding AI training data remains complex, particularly concerning copyright issues, as highlighted by a 2018 U.S. Copyright Office ruling that AI-generated art is not copyrightable due to the lack of a "nexus between the human mind and creative expression."





