Did DeepSeek Use Google Gemini to Train Its New AI Model?

Chinese AI lab DeepSeek recently released an updated R1 reasoning model, boasting improved performance in math and coding. However, the company's silence on its training data source has sparked speculation. Some AI researchers suspect DeepSeek leveraged Google's Gemini AI family.

Evidence Points to Gemini

Developer Sam Paeach claims DeepSeek's R1-0528 model exhibits linguistic patterns similar to Google's Gemini 2.5 Pro. He shared his findings on X (formerly Twitter):

If you're wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs. pic.twitter.com/Oex9roapNv

— Sam Paeach (@sam_paech) May 29, 2025

Another developer supports this theory, noting similarities between the "thought" processes of DeepSeek's model and Gemini.

Past Accusations and Industry Concerns

This isn't the first time DeepSeek has faced accusations of training on competitor data. Previous instances involved OpenAI's ChatGPT, leading to investigations by OpenAI and Microsoft. These incidents highlight the growing concern over data distillation, a technique where smaller models are trained using data extracted from larger, more powerful ones.

While distillation is not inherently prohibited, OpenAI's terms of service forbid using its model outputs to develop competing AI.

The Challenge of AI Data Contamination

The increasing prevalence of AI-generated content online makes it difficult to filter training datasets effectively. This "contamination" complicates determining the origin of training data.

However, AI experts like Nathan Lambert from AI2 believe DeepSeek using Gemini data is plausible, given their resources and the potential benefits:

If I was DeepSeek I would definitely create a ton of synthetic data from the best API model out there. Theyre short on GPUs and flush with cash. It’s literally effectively more compute for them. yes on the Gemini distill question.

— Nathan Lambert (@natolambert) June 3, 2025

AI Companies Increase Security Measures

To combat data misuse, AI companies are strengthening security. OpenAI now requires ID verification for advanced model access, excluding countries like China. Google and Anthropic have implemented trace summarization to protect their models and competitive advantages.

Google has been contacted for comment.