Кластер #4176 - News Clusters

Gemini Deep Think: Redefining the Future of Scientific Research — Google DeepMind

closed

Тип события	product_launch
Тема	large language models
Организация	Google
Страна	China

Статей	4
Уник. источников	3
Важность / Момент	1.52 / 0
Период	11.02.2026 15:00 — 19.02.2026 16:00
Создан	06.04.2026 06:19:52

Статьи в кластере 4

Заголовок

Источник

Дата публикации

Score

Gemini Deep Think: Redefining the Future of Scientific Research — Google DeepMind

deepmind

11.02.2026 15:00

Embedding sim.	1
Entity overlap	1
Title sim.	1
Time proximity	1

NLP тип	experiment
NLP организация	Google
NLP тема	large language models
NLP страна

Открыть оригинал

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
 July 2025 Research
 Learn more

Gemini 3 Deep Think: Advancing science, research and engineering

deepmind

12.02.2026 16:13

0.771

Embedding sim.	0.8672
Entity overlap	0.1429
Title sim.	0.3763
Time proximity	0.8499

NLP тип	product_launch
NLP организация	Google
NLP тема	large language models
NLP страна	United States

Открыть оригинал

Breadcrumb

 Innovation & AI

 Models & research

 Gemini Models

 Gemini 3 Deep Think: Advancing science, research and engineering

 Feb 12, 2026

 ·

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.

 The Deep Think team

 Read AI-generated summary

 General summary

 Gemini 3 Deep Think has a major upgrade to help solve science, research and engineering challenges. Google AI Ultra subscribers can now access the updated Deep Think in the Gemini app. Researchers, engineers and enterprises can express interest in early access to test Deep Think via the Gemini API.

 Summaries were generated by Google AI. Generative AI is experimental.

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 Your browser does not support the audio element.

 Listen to article

 This content is generated by Google AI. Generative AI is experimental

 [[duration]] minutes

 Voice

 Speed

 Voice

 Speed
 0.75X
 1X
 1.5X
 2X

 Today, we’re releasing a major upgrade to Gemini 3 Deep Think , our specialized reasoning mode, built to push the frontier of intelligence and solve modern challenges across science, research, and engineering.
 We updated Gemini 3 Deep Think in close partnership with scientists and researchers to tackle tough research challenges — where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete. By blending deep scientific knowledge with everyday engineering utility, Deep Think moves beyond abstract theory to drive practical applications.
 The new Deep Think is now available in the Gemini app for Google AI Ultra subscribers and, for the first time, we’re also making Deep Think available via the Gemini API to select researchers, engineers and enterprises. Express interest in early access here .
 Here is how our early testers are already using the latest Deep Think:

 Lisa Carbone, a mathematician at Rutgers University, works on the mathematical structures required by the high-energy physics community to bridge the gap between Einstein’s theory of gravity and quantum mechanics. In a field with very little existing training data, she used Deep Think to review a highly technical mathematics paper. Deep Think successfully identified a subtle logical flaw that had previously passed through human peer review unnoticed.

 At Duke University, the Wang Lab utilized Deep Think to optimize fabrication methods for complex crystal growth for the potential discovery of semiconductor materials. Deep Think successfully designed a recipe for growing thin films larger than 100 μm, meeting a precise target that previous methods had challenges to hit.

 Anupam Pathak, an R&D lead in Google’s Platforms and Devices division and former CEO of Liftware, tested the new Deep Think to accelerate the design of physical components.

 Elevating reasoning with mathematical and algorithmic rigor
 Last year, we showed that specialized versions of Deep Think could successfully navigate some of the toughest challenges in reasoning, achieving gold-medal standards at math and programming world championships. More recently, Deep Think has enabled specialized agents to conduct research-level mathematics exploration.
 The updated Deep Think mode continues to push the frontiers of intelligence, reaching new heights across the most rigorous academic benchmarks, including:
 Setting a new standard (48.4%, without tools) on Humanity’s Last Exam, a benchmark designed to test the limits of modern frontier models
 Achieving an unprecedented 84.6% on ARC-AGI-2, verified by the ARC Prize Foundation
 Attaining a staggering Elo of 3455 on Codeforces, a benchmark consisting of competitive programming challenges
 Reaching gold-medal level performance on the International Math Olympiad 2025

 Navigating complex scientific domains
 Beyond mathematics and competitive coding, Gemini 3 Deep Think now also excels across broad scientific domains such as chemistry and physics. Our updated Deep Think mode demonstrates gold medal-level results on the written sections of the 2025 International Physics Olympiad and Chemistry Olympiad. It also demonstrates proficiency in advanced theoretical physics, achieving a score of 50.5% on CMT-Benchmark.

 Accelerating real-world engineering
 In addition to its state-of-the-art performance, Deep Think is built to drive practical applications, enabling researchers to interpret complex data, and engineers to model physical systems through code. Most importantly, we are working to bring Deep Think to researchers and practitioners where they need it most — beginning with surfaces such as the Gemini API.

 With the updated Deep Think, you can turn a sketch into a 3D-printable reality. Deep Think analyzes the drawing, models the complex shape and generates a file to create the physical object with 3D printing.

 Available to Google AI Ultra Subscribers and the Gemini API via our Early Access Program
 Google AI Ultra subscribers will be able to access the updated Deep Think mode starting today in the Gemini app. Scientists, engineers and enterprises can also now express interest in our early access program to test Deep Think via the Gemini API.
 We can’t wait to see what you discover.

 POSTED IN:

Gemini models

AI

Google DeepMind

Google One

Gemini 3.1 Pro: A smarter model for your most complex tasks

deepmind

19.02.2026 16:00

0.677

Embedding sim.	0.8748
Entity overlap	0.1333
Title sim.	0.1143
Time proximity	0.0013

NLP тип	product_launch
NLP организация	Google
NLP тема	large language models
NLP страна

Открыть оригинал

Breadcrumb

 Innovation & AI

 Models & research

 Gemini Models

 Gemini 3.1 Pro: A smarter model for your most complex tasks

 Feb 19, 2026

 ·

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 3.1 Pro is designed for tasks where a simple answer isn’t enough.

 The Gemini Team

 Read AI-generated summary

 General summary

 Gemini 3.1 Pro is here to help you tackle complex tasks. The upgraded core intelligence is rolling out across consumer and developer products. You can access 3.1 Pro through the Gemini API, Vertex AI, the Gemini app, and NotebookLM.

 Summaries were generated by Google AI. Generative AI is experimental.

 Bullet points

 "Gemini 3.1 Pro: A smarter model for your most complex tasks" introduces Google's upgraded AI model.

 Gemini 3.1 Pro is rolling out to developers, enterprises, and consumers via various platforms.

 This new model shows improved reasoning, scoring significantly higher on complex problem-solving benchmarks.

 It's designed for tasks needing advanced reasoning, like synthesizing data or explaining complex topics.

 Google is releasing 3.1 Pro in preview to validate updates and advance agentic workflows further.

 Summaries were generated by Google AI. Generative AI is experimental.

 Explore other styles:

 General summary

 Bullet points

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 Your browser does not support the audio element.

 Listen to article

 This content is generated by Google AI. Generative AI is experimental

 [[duration]] minutes

 Voice

 Speed

 Voice

 Speed
 0.75X
 1X
 1.5X
 2X

 Last week, we released a major update to Gemini 3 Deep Think to solve modern challenges across science, research and engineering. Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro. We are shipping 3.1 Pro across our consumer and developer products to bring this progress in intelligence to your everyday applications.
 Starting today, 3.1 Pro is rolling out:
 For developers in preview via the Gemini API in Google AI Studio , Gemini CLI , our agentic development platform Google Antigravity and Android Studio
 For enterprises in Vertex AI and Gemini Enterprise
 For consumers via the Gemini app and NotebookLM
 Building on the Gemini 3 series, 3.1 Pro represents a step forward in core reasoning. 3.1 Pro is a smarter, more capable baseline for complex problem-solving. This is reflected in our progress on rigorous benchmarks. On ARC-AGI-2, a benchmark that evaluates a model’s ability to solve entirely new logic patterns, 3.1 Pro achieved a verified score of 77.1%. This is more than double the reasoning performance of 3 Pro.

 Intelligence applied
 3.1 Pro is designed for tasks where a simple answer isn’t enough, taking advanced reasoning and making it useful for your hardest challenges. This improved intelligence can help in practical applications — whether you’re looking for a clear, visual explanation of a complex topic, a way to synthesize data into a single view, or bringing a creative project to life.

 Code-based animation: 3.1 Pro can generate website-ready, animated SVGs directly from a text prompt. Because these are built in pure code rather than pixels, they remain crisp at any scale and maintain incredibly small file sizes compared to traditional video.

 Complex system synthesis: 3.1 Pro utilizes advanced reasoning to bridge the gap between complex APIs and user-friendly design. In this example, the model built a live aerospace dashboard, successfully configuring a public telemetry stream to visualize the International Space Station’s orbit.

 Interactive design: 3.1 Pro codes a complex 3D starling murmuration. It doesn't just generate the visual code; it builds an immersive experience where users can manipulate the flock with hand-tracking and listen to a generative score that shifts based on the birds’ movement. For researchers and designers, this provides a powerful way to prototype sensory-rich interfaces.

 Creative coding: 3.1 Pro can translate literary themes into functional code. When prompted to build a modern personal portfolio for Emily Brontë’s "Wuthering Heights," the model didn’t just summarize the text. It reasoned through the novel’s atmospheric tone to design a sleek, contemporary interface, creating a website that captures the essence of the protagonist.

 What’s next
 Since releasing Gemini 3 Pro in November, your feedback and the pace of progress have driven these rapid improvements. We are releasing 3.1 Pro in preview today to validate these updates and continue to make further advancements in areas such as ambitious agentic workflows before we make it generally available soon.
 Starting today, Gemini 3.1 Pro in the Gemini app is rolling out with higher limits for users with the Google AI Pro and Ultra plans. 3.1 Pro is also now available on NotebookLM exclusively for Pro and Ultra users. And developers and enterprises can access 3.1 Pro now in preview in the Gemini API via AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI and Android Studio.
 We can’t wait to see what you build and discover with it.

 POSTED IN:

Gemini models

DeepSeek’s Next Move: What V4 Will Look Like 👀

ai_supremacy

13.02.2026 11:15

0.622

Embedding sim.	0.7235
Entity overlap	0.037
Title sim.	0.0515
Time proximity	0.8867

NLP тип	product_launch
NLP организация	DeepSeek
NLP тема	large language models
NLP страна	China

Открыть оригинал

DeepSeek
 DeepSeek’s Next Move: What V4 Will Look Like 👀
 China's open-source AI momentum will push the frontier in 2026 and 2027.

 Michael Spencer and Tony Peng
 Feb 13, 2026

 61

 4
 11

 Share

 Good Morning,
 We are on the cusp of Chinese New Year and something is happening that’s a bit dark on the U.S. frontier.
 Silicon Valley has adopted a 996 lifestyle for AI research. As models get more fine tuned to agentic capabilities, something remarkable is happening. A few Chinese AI startups are now showing signs of being more innovative at the frontier of open-weight models (than Silicon Valley and the rest of the world), this - with a fraction of the capital and with significantly less access to the best AI chips. How could this be?
 Share
 After the DeepSeek moment of last year (January, 2025), China has surged ahead in Open-source AI models. A flurry of Chinese IPOs means the following AI startups and open-source incumbents are going to be a big deal:
 Qwen (Alibaba Cloud)

 Zhipu AI (Z.AI)

 Moonshot AI

 DeepSeek

 Minimax

 It’s not just the quality of their models, it’s the pace of iteration. 🚀 The DeepSeek moment of last year snowballed into a very different open-source AI reality.
 One year after Chinese startup DeepSeek rattled the global tech industry with the release of a low-cost artificial intelligence model, its domestic rivals are better prepared, vying with it to launch new models, some designed with more consumer appeal. - Reuters

 Read more DeepSeek articles
 There’s a buzz for what DeepSeek is ready to announce, and high expectations. There’s software, reinforcement learning, LLM training and hardware innovations that DeepSeek is making that has rubbed off on China’s vibrant open-weight LLM ecosystem. (The U.S. has no equivalent).
 The bifurcation of closed models and open-models has never been this intense. While American companies get the hype, Chinese models are helping developers build real products.

 DeepSeek have been working more behind the scenes in recent months.
 DeepSeek as a Research Lab is taking a different approach

 There is a sense of anticipation because DeepSeek is currently on the verge of its next major release cycle. As of February 2026, the primary focus is on DeepSeek V4 , which is expected to launch around the Lunar New Year (Tuesday, February 17th begins) .
 Rumored to be codenamed "MODEL1," DeepSeek V4 is expected to be a total architectural overhaul rather than just a minor update. Meanwhile its next flagship model DeepSeek-R2 has been significantly delayed and is still likely perhaps weeks away.
 What will be DeepSeek’s next play?

 Engram : A breakthrough “conditional memory” system that separates factual recall from reasoning. This allows the model to access vast amounts of information (over 1 million tokens) without the usual performance “amnesia.”

 mHC (Manifold-Constrained Hyper-Connections): A technique designed to stabilize training and improve scalability while reducing energy consumption.

 DeepSeek Sparse Attention (DSA) : This enables an massive 1 million+ token context window . It allows the model to “read” an entire codebase—thousands of files—in one go, enabling true multi-file debugging and refactoring.

 MODEL1 Architecture : A tiered storage system for the KV cache that reduces GPU memory consumption by 40% , making it much cheaper to run than previous frontier models.

 I asked China AI blogger Tony Peng to take a deeper look into DeepSeek’s expected moves in the days and weeks ahead. DeepSeek has been busy behind the scenes the past year and we don’t know its full capabilities as it comes out of stealth so to speak with major releases.
 Join Recode China AI for China AI spotlights and deeper resources, trends and stories around the ecosystem.

 Recode China AI China AI Spotlight: Your weekly guide to China's AI breakthroughs, trends, and stories.
 By Tony Peng

 Zhipu AI and MiniMax Just Went Public, But They’re Not China’s OpenAI

 DeepSeek-R1 and Kimi k1.5: How Chinese AI Labs Are Closing the Gap with OpenAI’s o1

 The Most Important AI Panel of 2026: Can China Lead the Next Paradigm?

 While Western closed-source models double down on AI coding capabilities, Chinese open-source models are showing more agentic, browser and fundamental capabilities. Major IPOs and consolidation is occurring and China’s AI engineering prowess is starting to show ultra competitive results. The convergence of China’s AI ecosystem and AI chip makers hurrying to catch up with the rest of the world in semiconductors and HBM, China’s place in the world of AI is likely on the ascent.
 I try my best to give global coverage around AI and also to call upon people with “ boosts on the ground ” in East Asia. That’s why I respect people like Tony Peng , Grace Shao and Poe Zhao so much, they are the ones immersed and connected in the latest AI news in China. You can also get full access to this post soon on Tony’s blog .
 But how good will DeepSeek’s latest models be? 🤔
 “I believe V4 and R2 will remain among the best open-source LLMs available, potentially even narrowing the gap with leading proprietary models. “ - Tony Peng (former AI reporter for Synced)

 That being said let’s try to dig into that anticipation and those key details of what we know before the unveiling event:
 👀DeepSeek’s Next Move: What V4 Will Look Like

 By Tony Peng
 Sparsity is DeepSeek’s sauce to scale intelligence under hard constraints

 The widely anticipated DeepSeek V4 large language model (LLM) is expected to meet the public in mid-February, right before the Chinese New Year. V4, DeepSeek’s upcoming generation base model, is supposed to be the largest challenge of open-source models to the proprietary models, and the most anticipated releases in the AI space for early 2026.
 According to Reuters and The Information, DeepSeek V4 is optimized primarily for coding and long-context software engineering tasks. Internal tests (per reporting) suggest V4 could outperform Claude and ChatGPT on long-context coding tasks.

 2025 was a watershed moment for DeepSeek and the following open-source movement fueled by Chinese AI labs. The release of V3 and R1 upended multiple AI narratives: that only spending hundreds of millions could produce a frontier LLM, that only Silicon Valley companies had talents to train competitive models, that the U.S.-China gap in AI was widening due to chip shortages.
 Throughout the rest of 2025, DeepSeek continued churning out notable models, including DeepSeek-V3.2-Thinking and DeepSeek-Math-V2, which won the International Olympiad in Informatics. Yet the widely anticipated DeepSeek-V4 and DeepSeek-R2, which were reportedly slated for release in the first half of 2025, were delayed. According to reports, DeepSeek CEO Liang Wenfeng was dissatisfied with the results and chose to delay the launch.
 The Financial Times offered an alternative explanation: DeepSeek initially attempted to train R2 using Huawei’s Ascend AI chips rather than Western silicon like Nvidia’s GPUs, partly due to pressure from the Chinese government to reduce reliance on U.S.-made hardware. The training runs encountered repeated failures and performance issues stemming from stability problems, slow chip-to-chip interconnect speeds, and immature software tooling for Huawei’s chips. Ultimately, DeepSeek had to revert to Nvidia hardware for training while relegating Huawei chips to inference tasks only. This back-and-forth process and subsequent re-engineering significantly delayed the timeline.
 Sparsity Through Iteration

 DeepSeek’s architectural evolution has been driven by a consistent principle: Sparsity.
 While computation can be scaled relatively easily by adding more data, more parameters of the model, and more chips, sparsity is the most straightforward way DeepSeek can scale intelligence under constraints, including compute, memory bandwidth, and chips. These constraints shaped every iteration of their model architecture, from DeepSeekMoE’s initial Mixture of Experts (MoE) through the attention optimizations in V2, V3, and V3.2.
 In a Transformer, you have two main computational blocks: Attention and Network (specifically Feed-Forward Network). DeepSeek started by making the network layers sparse using MoE. Instead of every token going through the entire network, they route each token to only a small, relevant subset of parameters as experts.
 DeepSeekMoE laid the groundwork with a MoE architecture containing 64 specialized experts and 2 shared experts. For each input, the model would route it to the 6 most relevant experts (topk=6).

 DeepSeek-V2 expanded the expert pool to 160 specialized experts while keeping 2 shared experts and the same topk=6 routing, with improvements focused on distribution and efficiency.

 DeepSeek-V3 scaled further to 256 experts with 1 shared expert, and increased routing to topk=8. The architecture became more sophisticated in how it managed experts, and introduced a new communication system called DeepEP that makes expert coordination more efficient.

 Once the network sparsity was well-optimized, attention became the next target for improvement. As the core module of the Transformer architecture, attention is where each token in a sequence analyzes and weights the importance of every other token to understand context and relationships. Attention presents a different challenge as computational complexity exponentially grows along with the increasing context window.
 Multi-head Latent Attention (MLA) was DeepSeek’s first major attention innovation introduced in V2. Instead of storing complete key-value information for every token (as standard Multi-Head Attention does), MLA compresses this information into a smaller representation. This compression reduces how much data needs to be moved in and out of memory.

 Then in a February 2025 paper, DeepSeek introduced Native Sparse Attention (NSA) with optimized design for modern hardware. Tokens are processed through three attention paths: compressed coarse-grained tokens, selectively retained fine-grained tokens, and sliding windows for local contextual information.

 DeepSeek Sparse Attention (DSA) , introduced in V3.1 and V3.2, streamlined the NSA design. Instead of selecting blocks of tokens, DSA selects individual tokens. It uses a lightweight indexer model to identify the 2,048 most relevant tokens from the full context. This indexer is trained through a process where it learns to mimic the full attention pattern—the model first trains with full attention, then the indexer learns to predict which tokens the full attention would focus on. DSA can work with MLA.

 DSA+mHC+Engram

 With the foundation of DSA established in V3.2, DeepSeek appears ready to push optimization further in V4. The evidence points to two other papers that DeepSeek released over the past few months.
 Manifold-Constrained Hyper-Connections (mHC)

 mHC represents a rethinking of how information flows through deep neural networks. While traditional networks pass information sequentially from one layer to the next, mHC introduces richer connectivity patterns between layers—essentially creating multiple pathways for information to flow across the model’s depth.
 To understand mHC’s, we must first revisit residual connections, the backbone of modern deep neural networks regardless of architecture (CNN or Transformer). Proposed in the landmark 2015 paper Deep Residual Learning for Image Recognition , residual connections add the input (x) of a block directly to its output F(x), typically formatted as y = F(x) + x.
 This simple shortcut proved revolutionary. By allowing gradients to bypass layers, residual connections solved the vanishing gradient problem that had plagued deep networks, enabling effective training of architectures with hundreds or even thousands of layers.
 Then in 2024, ByteDance researchers proposed hyper-connections (HC) as an alternative approach. Rather than simple additive shortcuts, HC creates richer connectivity patterns that allow information to flow more freely between non-adjacent layers. This architectural flexibility offers advantages over standard residual connections, particularly in avoiding the gradient-representation tradeoffs.
 But HC introduced its own challenge. As training scales increase and connectivity grows richer, the risk of training instability rises. More pathways for information flow means more opportunities for gradients to either explode (grow uncontrollably large) or vanish (shrink to near-zero) during training.
 This is where DeepSeek mHC’s innovation emerges. The approach treats the model’s parameter space as existing on a high-dimensional geometric structure, a manifold. By imposing mathematical constraints based on this manifold geometry, mHC creates a guardrail system that maintains stable information flow even as connectivity increases.
 Think of it as a multi-lane highway with intersections. Vehicles can change lanes, merge, or split—increasing routing flexibility. But the total traffic flow must remain constant and balanced. Information can take different paths through the network, but the overall flow is constrained to prevent either congestion (gradient explosion) or emptiness (gradient vanishing).
 The mathematical formulation ensures that information transformations remain well-behaved across the manifold. This prevents the instabilities that would otherwise emerge from unconstrained hyper-connections, enabling deeper and more flexible architectures while maintaining training stability.
 For V4, mHC could appear strategic. As DeepSeek pushes toward increasingly sparse and selective processing through DSA, Engram, and other mechanisms, having stable hyper-connections allows these components to interact more effectively across layers.
 Engram

 Engram , detailed in DeepSeek’s January 2026 paper Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models , introduces a new dimension to model architecture by adding conditional memory to the Transformer through efficient lookup mechanisms.
 Traditional Transformer-based LLMs compress all learned knowledge into neural network weights. Whether answering a simple factual query like “Barack Obama was a U.S. president” or solving a complex mathematical proof, the model must route every computation through the same expensive neural processing.
 Consider the phrase “New York City.” A standard Transformer must learn that “New,” “York,” and “City” together form a specific entity, then rebuild that relationship through attention computation every single time. This is static knowledge that never changes, yet the model treats it like novel information requiring full neural processing each time.
 Engram’s premise is elegantly simple: not all knowledge requires neural computation. Static facts and established patterns can be stored in a complementary memory system and retrieved efficiently when needed. This mirrors biological memory: You don’t re-calculate that 2+2=4 each time; you simply recall it.
 Engram implements this through a modernized N-gram lookup module operating in constant time (O(1))—retrieval speed remains constant regardless of how much information is stored. This creates a fundamental architectural separation:
 Neural computation (attention and MoE): Complex reasoning, novel synthesis, context-dependent processing

 Memory lookup (Engram): Static knowledge, established patterns, factual recall

 The architecture optimizes the balance using a U-shaped scaling law—a mathematical framework determining ideal parameter allocation between neural computation and memory lookup at different model scales. This is crucial because the tradeoff isn’t straightforward: too much reliance on lookup tables risks brittleness; too much neural computation wastes resources on static knowledge. The U-shaped law identifies the sweet spot where both systems work synergistically.

 Memory efficiency improves dramatically because Engram offloads static knowledge from expensive GPU memory to host CPU memory. Empirical results demonstrate that offloading a 100B-parameter lookup table to host memory incurs negligible overhead (less than 3%). This enables massive knowledge bases without proportional GPU memory costs, while supporting much longer effective context windows.
 For reasoning and knowledge-intensive tasks, Engram delivers measurable improvements over comparable MoE-only architectures. Benchmarks show particular gains in multi-hop reasoning and long-context understanding—tasks that benefit from quick access to established knowledge while applying neural computation to novel reasoning steps.
 Model1: New Clues to DeepSeek V4?

 Just over a year after DeepSeek-R1 became the most-liked model on Hugging Face, sharp-eyed developers have spotted a mysterious “Model1” in recent code updates to the FlashMLA library. The timing is suggestive—could Model1 be the codename for DeepSeek V4?
 Analysis of recent commits reveals several architectural signatures suggesting Model1 is an entirely new flagship model:
 The 512-Dimensional Shift : Model1 switches from V3.2’s 576-dimensional configuration to 512 dimensions, likely optimizing for NVIDIA’s Blackwell (SM100) architecture where power-of-2 dimensions align better with hardware.

 Blackwell GPU Optimization : New SM100-specific interfaces, CUDA 12.9 requirement, and performance benchmarks showing 350 TFlops on B200 for sparse MLA operations represent deep integration with next-generation hardware.

 Token-Level Sparse MLA : Separate test scripts for sparse and dense decoding indicate parallel processing pathways. The implementation uses FP8 for storing KV cache and bfloat16 for matrix multiplication, suggesting design for extreme long-context scenarios.

 Value Vector Position Awareness (VVPA) : This new VVPA mechanism likely addresses a known weakness in traditional MLA—positional information decay over long contexts. As sequences extend into hundreds of thousands of tokens, compressed representations can lose fine-grained positional details. VVPA appears designed to preserve this spatial information even under aggressive compression.

 Engram Integration : References to Engram throughout the codebase suggest deep integration into Model1’s architecture.

 Concluding Remarks

 I believe V4 and R2 will remain among the best open-source LLMs available, potentially even narrowing the gap with leading proprietary models. However, since DeepSeek R1’s release over a year ago, the competitive race to push LLM capabilities forward has only intensified. The “DeepSeek effect” has motivated several other Chinese AI labs—including Moonshot AI, MiniMax, and Zhipu—to redouble their efforts in releasing top-tier LLMs. This doesn’t even account for tech giants Alibaba and ByteDance, both of which are producing frontier models while simultaneously expanding into chatbots, AI cloud services, chips, and hardware.
 Given this rapidly evolving landscape, expecting another watershed “DeepSeek moment” like the one in early 2025 seems nearly impossible.

 61

 4
 11

 Share

 Previous

 A guest post by
 Tony Peng
 I’m Tony Peng, ex-Baidu Global Head of Comms and a former AI reporter; a longtime AI observer with a keen focus on China’s AI development.

 Subscribe to Tony