Unveiling DeepSeek: Technology Battle or Publicity Battle?

Breaking News: Nvidia, which has been the darling of Wall Street investors in the past year, recently experienced a sudden drop of 16% in one day, losing over $600 billion in market value. The reason behind this plunge is a relatively unknown small Chinese company that unveiled an AI model tool named DeepSeek, boasting its advanced capabilities at a low cost.

It is well-known that Nvidia has been highly regarded for its development of high-end AI chips. Investors suddenly realized that with the emergence of advanced AI models like DeepSeek, Nvidia’s stock was sold off. But is this the true scenario? What groundbreaking features has DeepSeek discovered within the depths of China, perhaps through a propaganda war orchestrated by the Chinese Communist Party?

Independent TV producer Li Jun stated on NTDTV’s “Elite Forum” program that DeepSeek caused quite a stir when it was introduced around Christmas time in 2024. The online community was abuzz with rumors that China had once again overtaken the AI field, surging far ahead of the competition.

DeepSeek is a product of Hangzhou DeepSearch Artificial Intelligence Basic Technology Research Co., Ltd., established by the Chinese quant giant, Phantom Quantitative, in July 2023. The founder and CEO is 40-year-old Liang Wenfeng.

Li Jun mentioned that the company spent $5 million to train an AI comparable to Chat GPT and OpenAI. The day the news was released, it even made it to the headlines on China Central Television. The fact that the Party media promoted it, after obtaining approval from the Propaganda Department, and almost all media outlets reported that DeepSeek’s model outperformed OpenAI in multiple tests while costing only a fraction of the training expenses, possibly as low as 1% of OpenAI. This latest advancement purportedly shook the invincible position of the US tech industry.

However, scholars at home and abroad soon began to question how a company with only 4 employees paying social security was able to achieve this. The CEO of an American AI company claimed that DeepSeek used at least 50,000 Nvidia H100 chips manufactured using 4-nanometer technology. He questioned where these 50,000 4-nanometer chips had come from given the comprehensive US chip sanctions against China.

This issue may be of great concern to Microsoft, a major investor in OpenAI, and other high-tech companies as more problems emerge. The Financial Times reported that Microsoft security researchers found that individuals associated with DeepSeek used the OpenAI API to steal a large amount of data.

Li Jun reported that Trump’s AI chief, Sachs, stated that there was ample evidence indicating that DeepSeek developed its technology by distilling data from OpenAI models. Reuters mentioned that DeepSeek’s information accuracy was only 17%, far below the Western AI software standards. In conclusion, DeepSeek is neither original nor advanced but excels in its publicity capabilities.

Taiwan AI Lab founder Du Yijin expressed on “Elite Forum” that DeepSeek likely had two stages of research papers. The first stage, DeepSeek V3, was their self-trained foundational model. This foundational model utilized a mixed expert model and a multi-layer attention mechanism enabling training on lower-tier GPUs, taking around 2.788 million hours to train.

If this is true, their foundational model’s training was relatively low-cost compared to others. But why did they need to use knowledge distillation to extract data from OpenAI? When training the foundational model, data preparation is crucial, but instead of organizing it themselves, they utilized knowledge distillation. Knowledge distillation involves using a high-performing foundational model to teach a smaller model to achieve better performance when answering similar questions.

Thus, the information obtained from OpenAI indicates that OpenAI provided a paid knowledge distillation method previously. The knowledge distilled method was meant to teach others to distill knowledge, transforming into their own model. However, it appears that DeepSeek extracted this knowledge distillation content, trained it to become their model, and even had this code segment in the open-source code. This is the controversial aspect.

In essence, training large models currently entails high costs primarily due to the foundational model. However, preparing the data to training the model all being directly derived from a large model’s results might not necessarily denote cost savings but rather utilizing others’ model results through knowledge distillation before training their own, thus indeed saving costs.

Furthermore, DeepSeek introduced a model inference capability. Large language models in the past responded quickly without much of an inference capability. When asking a large language model a question, the difference between GPT4 and o1 is that o1 undergoes an inference process before providing an answer, unlike GPT4, which directly provides an answer. Using results from inference before answering generally yields better outcomes, although it entails higher time and cost requirements.

Therefore, with part of DeepSeek’s model release called R1, from DeepSeek’s announcement to OpenAI’s response, DeepSeek learned how to conduct model inference like o1, enabling them to train models using relatively small resources.

Du Yijin noted that the use of a hybrid expert model or model distillation is widespread in the AI industry. Many believe that what DeepSeek is doing is similar to OpenAI’s o1 method, which OpenAI had not previously published. Hence, there’s a debate in the market, with many praising the achievement of nearing the results of OpenAI’s large model. However, some who truly engage in model training might view this as unimpressive since it involves utilizing existing results through knowledge distillation to achieve their results.

Regarding the purported investment of only $5 million, as mentioned before, there are several crucial aspects to consider. The first is an investment in computing power, followed by data collection and organization. Through these perspectives, it’s believed that DeepSeek’s claim of only spending $5 million may not account for the computing resources involved. Moreover, since they extracted content from OpenAI for knowledge distillation, the investment in data organization is likely minimal.

There’s uncertainty surrounding DeepSeek V3 not disclosing its original open-source nature and data organization content. This uncertainty makes it seem like they genuinely trained their foundational model with vast amounts of data at a cost of several million dollars or merely used knowledge extracted from others to aid in training costs, which hasn’t been made clear.

As a result, those truly involved in model foundational training may find the $5 million claim less impressive when the actual cost is considered. In the market, there are already numerous models engaging in knowledge distillation without the need for genuine innovation.

In conclusion, DeepSeek’s recent release strays from the traditional technological advancements seen in data tech evolution. Numerous AI startups, whether OpenAI or others, are consistently introducing new models or evolving existing ones. Advancements in AI-related models are built upon continual accumulation and progress over time.

DeepSeek’s recent release is accompanied by extensive promotion from Chinese state media which can be observed through collaborative groups on online platforms being mobilized. Additionally, content creation through AI synthesis can be identified on platforms like Facebook and YouTube, where numerous seemingly dormant accounts suddenly flood with content, spreading messages about DeepSeek’s emergence and Nvidia’s dim prospects, all AI-generated.

This entire promotional campaign, led by official media and spreading across online platforms, took place at an interesting time, on the eve of Chinese New Year and during pre-market trading before the US market opening. The narrative about Nvidia’s stock dropping by over 10%, creating a market decline, coupled with the amplification of Nvidia’s stock price decrease during pre-market trading, resulted in widespread market repercussions due to the extensive promotion of this news.

Therefore, DeepSeek’s reception and operation deviate from the traditional approach followed by tech companies that typically introduce new technology for trial and adoption. It’s clear that DeepSeek’s information dissemination was heavily influenced by national-level forces.

Veteran media figure Guo Jun shared on “Elite Forum” that over the past 20 years, China’s rapid technological innovation has created an unprecedented crisis for the US. The discussions between American and Chinese elites about technical advancements signify intense technological competition. However, in Guo Jun’s perspective, this competition is not simply about one or two technologies but rather an entire system, a comprehensive ecological environment, and ecosystem. He believed that the potential for China to succeed in such competition lies beyond breakthrough technologies.

Guo Jun highlighted the importance of fostering new technology in an ecosystem context, emphasizing that lacking this ecosystem might hinder China’s ability to sustain growth, even with revolutionary technologies and emerging companies. The environment that nurtured tech giants like Alibaba, Tencent, and Huawei has now fundamentally changed.

He pointed out the shift in career aspirations among American and Chinese university graduates, where the former gravitate towards high-tech and startup companies, while the latter increasingly seek positions in the government sector. The differing systems in place hint at disparate outcomes, as evidenced by past competitions between the US and the Soviet Union.

In the ongoing disengagement between the US and China, from financial to technological disengagement representing just one facet, the detachment from cultivating a new technological ecosystem might be the most crucial aspect. Without this ecosystem, China may produce groundbreaking new technologies and companies, but struggle to sustain their growth. The previous ecosystem that nurtured companies like Alibaba, Tencent, and Huawei has been entirely transformed.

Guo Jun explained that the unique combination of Silicon Valley’s technological innovation and Wall Street’s financial support has been crucial. Wall Street not only provides funds but also offers marketing services, engineering and financial management, and strategies to attract more investors, establishing a mechanism that isn’t hands-off.

In Guo Jun’s view, an innovative company supported by a team of experts embodying Wall Street’s functions operate at their best. Such professionals who are detached from the system but capable of independent actions are vital components of Wall Street companies. China currently lacks an abundant pool of such professionals, many of whom are leaving the country.

Presently, the US and China are witnessing a comprehensive decoupling, extending beyond financial and technological rifts to a withdrawal from nurturing a new tech ecosystem which may indeed be the most critical factor. Without this ecosystem, China may flaunt superior new technologies and spawn innovative companies, but struggle to maintain their growth. The earlier environment that gave rise to ecosystems nurturing giants like Alibaba, Tencent, and Huawei has irrevocably shifted.

By observing a fundamental discrepancy, Guo Jun emphasized how American university graduates today favor high-tech and startup companies, whereas Chinese graduates lean towards governmental roles. These diverging systems ultimately lead to different outcomes, mirroring the lessons learned from the US-Soviet competition.

The program “Elite Forum,” introduced by NTDTV and Epoch Times, offers an elevated television forum rooted in the Chinese world, uniting global elites to focus on hot topics, analyze global trends, and provide profound insights into current affairs and historical truths.

For the complete content of “Elite Forum,” kindly watch online.

Producer group of “Elite Forum”

For reproduction and citation of articles from “Elite Forum,” please maintain original content and provide proper attribution.