Currently, the main ways users can access DeepSeek-R1 are through cloud services or “local deployment.” However, the official server often experiences downtime, and personal deployments typically use a distilled version with 90% fewer parameters. As a result, it is very difficult for ordinary users to run the full version of DeepSeek-R1 on regular hardware, and the cost of renting servers is a significant burden even for developers. Tsinghua University’s Open Source Project Breaks Through Large Model Computational Bottleneck: RTX 4090 Single Card Runs DeepSeek-R1 at Full Capacity This week, the KVCache.AI team at Tsinghua University, in collaboration with Quijing Technology, announced a major update to their KTransformers (pronounced “Quick Transformers”) open-source project. They successfully solved the problem of deploying trillion-parameter models locally, marking a significant step towards democratizing large model inference, moving from “cloud monopolies” to “universal access.” Tsinghua University’s Open Source Project Breaks Through Large Model Computational Bottleneck: RTX 4090 Single Card Runs DeepSeek-R1 at Full Capacity As shown in the image, the KTransformers team successfully ran the full version of DeepSeek-R1 V3 with 671 billion parameters on a PC with 24 GB of VRAM and 382 GB of memory on February 10, achieving a speedup of 3 to 28 times. Today, KTransformers announced support for longer context lengths (24GB single card supports 4–8K) and a 15% speed improvement (up to 16 tokens per second). Tsinghua University’s Open Source Project Breaks Through Large Model Computational Bottleneck: RTX 4090 Single Card Runs DeepSeek-R1 at Full Capacity According to the official introduction, KTransformers is a flexible, Python-centric framework designed for scalability. It allows users to access Transformer-compatible interfaces, RESTful APIs compliant with OpenAI and Ollama standards, and even simplified web user interfaces like ChatGPT, with just one line of code to implement and inject optimization modules. The technology now supports running the full version of DeepSeek-R1 V3 with 671 billion parameters on a single 24GB VRAM consumer-grade GPU (like RTX 4090D), with preprocessing speeds up to 286 tokens per second and inference generation speeds up to 14 tokens per second. This completely rewrites the history of relying on expensive cloud servers for large AI models. Tsinghua University’s Open Source Project Breaks Through Large Model Computational Bottleneck: RTX 4090 Single Card Runs DeepSeek-R1 at Full Capacity DeepSeek-R1 is based on a mixture of experts (MoE) architecture, where tasks are allocated to different expert modules, and only a portion of the parameters are activated during each inference. The team innovatively offloaded non-shared sparse matrices to CPU memory, and combined it with high-speed operator optimization, reducing VRAM requirements from the traditional 320GB on 8 A100 GPUs to a single 24GB card. Thanks to KTransformers, ordinary users can now run the full version of DeepSeek-R1 V3 with 671 billion parameters locally with just 24GB of VRAM. Preprocessing speeds can reach up to 286 tokens per second, with inference speeds reaching up to 14 tokens per second. In response to the characteristics of the MoE architecture, the KTransformers team implemented matrix quantization via Marlin GPU operators,
Category: News
The deployment of private servers for large models such as DeepSeek is increasing rapidly. Cybersecurity companies: Nearly 90% of them are “naked”
After the domestic large model DeepSeek became the focus of the AI field, some companies and individuals began to build private deployments of the DeepSeek large model. On the 14th, a reporter from the Global Times learned from the network security company Qi’anxin that as many as 88.9% of the active servers running large models such as DeepSeek have not taken security measures, which will lead to risks such as computing power theft, data leakage, service interruption, and even large model file deletion.After the domestic large model DeepSeek became popular, the number of servers running the DeepSeek R1 large model is increasing rapidly. Qi’anxin Asset Mapping Eagle Chart Platform monitored and found that among the 8,971 Ollama large model servers, there were 6,449 active servers, of which 88.9% were “naked” on the Internet. “5,669 of the above 8,971 servers are in China, and the proportion of “naked” is basically the same as the world.” Qi’anxin’s relevant technical personnel told the Global Times reporter.According to the introduction, such a “naked” state without security measures will cause anyone to call these services at will without any authentication and access these services without authorization, which may lead to data leakage and service interruption, and even send instructions to delete the deployed DeepSeek, Qwen and other large model files.Ollama is a tool that can easily obtain and run large models. It supports a variety of advanced language models, including but not limited to Qwen, Llama, DeepSeek-R1, etc. It allows users to run and use these models in the server. Ollama does not provide security authentication by default, which causes many users who deploy DeepSeek to ignore the necessary security restrictions and fail to set access control for the service. As a result, anyone can access these services without authorization.The technicians in the above text describe Ollama as a warehouse full of smart furniture, which can help you quickly move out high-end devices such as “DeepSeek Whole House Butler”, “Qwen Smart Air Conditioner”, and “Llama Sweeping Robot”. “But this ‘warehouse’ defaults to ‘no door locks’, and the landlord only cares about enjoying the convenience of ‘delivery to your door with a shout’, and forgets to lock the door. As a result, passers-by sneak in casually: someone secretly opens the ‘DeepSeek housekeeper’ to adjust the room temperature, someone dismantles the ‘Qwen air conditioner’ parts to sell for money, etc. – in the blink of an eye, all the furniture is cleared out, leaving only an empty rough house.” The technician said. Qi’anxin Asset Surveying and Mapping Eagle Chart Platform monitoring found that there have been incidents in which DeepSeek servers in a “naked” state have been scanned by automated scripts, and maliciously occupied a large amount of computing resources, stole computing power and caused some users’ servers to crash.Therefore, technical experts recommend that all companies and individuals who deploy DeepSeek services should immediately modify the Ollama configuration and add identity authentication methods. At the same time, timely modify relevant security configurations such as firewalls, WAFs, and
Hunan University opens DeepSeek to the public for free, and students are eager to try it out
Recently, the National Supercomputing Changsha Center of Hunan University announced the completion of the local deployment of the DeepSeek-R1 large model and its online trial operation, which is open to all teachers and students of the university for free use. The reporter learned that at present, Hunan University teachers and students only need to access the campus network and use their student ID to log in to the DeepSeek platform directly. In the future, the platform will gradually realize convenient access without VPN to meet the needs of teachers and students for off-campus use. Hunan University graduate student Chen mentioned in an interview with reporters that although the system was slightly stuck due to the large number of users the night before, the experience today was relatively smooth. She noticed that Line 1 had 301 users using it at the same time in the morning. Chen successfully translated an English paper using the Deepseek platform and questioned AI about the significance of the school providing this service. Deepseek immediately gave positive feedback, praising the school’s move as a wise decision to adapt to the development trend of the times and improve the quality and efficiency of education through the application of advanced technology. Although he has not returned to school yet, Xiang, a junior, has already shared the news with his classmates and received screenshots of his classmates successfully logging in. After multiple experiences, the students told the reporter: “It was very smooth at the beginning, but there were some freezes in the middle. After all, it was just a preliminary deployment. Overall, I am still looking forward to it!” Powerful functions, promising future Netizen “HuDa Baishitong” shared the three functions of DeepSeek that he personally tested: generating a course mind map in 3 seconds, cooperating with Kimi to make PPT with one click, and Matlab code bug corrector, and exclaimed: “Although it is only a 32B distillation model, it is actually faster than me in writing papers and making tables!” He also pointed out that the current platform still has problems such as occasional instability in model response speed and the need to improve the terminology library of some engineering disciplines, but “completely free + ten times the speed of campus network, it wins!” Continuous optimization, serving the society At present, the National Supercomputing Changsha Center has released a demonstration case of DeepSeek in the campus network environment, and stated that it will continue to explore technologies based on CPU, domestic acceleration cards, domestic computing power, heterogeneous fusion, supercomputing and intelligent computing fusion in the future to provide stable computing power support for large models, and successively launch services for the public network, enterprises and the public.
WeChat tests access to DeepSeek
DeepSeek continues to expand its ecosystem! On February 15, some WeChat users found that WeChat search has launched the “AI search” function and accessed the “deep thinking” service provided by DeepSeek-R1. On February 16, the reporter confirmed from Tencent Group that WeChat search, while calling the Hunyuan large model to enrich AI search, officially grayscale test access to DeepSeek. Tencent said that some test users can search for the entrance at the top of the WeChat dialog box and see the words “AI search”. After clicking to enter, they can use the DeepSeek-R1 full-blooded model for free to get a more diversified search experience. If the entrance is not displayed, it means that the grayscale test has not yet covered the user account, and you can wait patiently for the subsequent opening. Some users said that the answer to the question “How to use DeepSeek’s R1 model on WeChat” through WeChat AI search is that the function is in grayscale testing and is only visible to some users. The WeChat version needs to be updated to the latest version. If the test plan is not available yet, the WeChat team is gradually expanding the scope of testing. It is recommended to check for updates and changes in search functions regularly. Users can also download the “Tencent Yuanbao” APP to use the full-blooded version of DeepSeek-R1 for free. On February 13, Tencent’s AI assistant “Tencent Yuanbao” ushered in a major update, supporting both Hunyuan and DeepSeek models. Open Tencent Yuanbao and enter the dialogue interface to use the full-blooded version of DeepSeek-R1 for free. According to reports, DeepSeek provided by Tencent Yuanbao supports online search and integrates Tencent ecological information sources such as WeChat public accounts and video accounts, which can provide users with more stable, real-time, comprehensive and accurate answers. Recently, DeepSeek has attracted widespread attention due to its inference performance, high cost-effectiveness, and open source advantages, but its official entrance has also frequently experienced server busy due to access overload. Therefore, more services developed based on access to DeepSeek continue to emerge. Many cloud service providers at home and abroad have announced access and support, and many GPU chip manufacturers have announced hardware adaptation. In addition, upper-level application parties in multiple industries including office, automobile, medical, financial securities, etc. also announced their access to DeepSeek.