Кластер #4154 - News Clusters

Keeping your data safe when an AI agent clicks a link

closed

Тип события	other
Тема	ai security
Организация	OpenAI
Страна	United States

Статей	14
Уник. источников	7
Важность / Момент	2.46 / 0
Период	28.01.2026 00:00 — 13.02.2026 10:00
Создан	06.04.2026 06:19:40

Статьи в кластере 14

Заголовок

Источник

Дата публикации

Score

Keeping your data safe when an AI agent clicks a link

openai

28.01.2026 00:00

Embedding sim.	1
Entity overlap	1
Title sim.	1
Time proximity	1

NLP тип	other
NLP организация	OpenAI
NLP тема	ai security
NLP страна

Открыть оригинал

Learn how OpenAI protects user data when AI agents open links, preventing URL-based data exfiltration and prompt injection with built-in safeguards.

Bringing ChatGPT to GenAI.mil

openai

09.02.2026 11:00

0.756

Embedding sim.	0.8403
Entity overlap	0.2727
Title sim.	0.2683
Time proximity	1

NLP тип	product_launch
NLP организация	OpenAI
NLP тема	generative ai
NLP страна	United States

Открыть оригинал

OpenAI for Government announces the deployment of a custom ChatGPT on GenAI.mil, bringing secure, safety-forward AI to U.S. defense teams.

Introducing Trusted Access for Cyber

openai

05.02.2026 10:00

0.734

Embedding sim.	0.8171
Entity overlap	0.2
Title sim.	0.2745
Time proximity	0.9762

NLP тип	product_launch
NLP организация	OpenAI
NLP тема	cybersecurity
NLP страна

Открыть оригинал

OpenAI introduces Trusted Access for Cyber, a trust-based framework that expands access to frontier cyber capabilities while strengthening safeguards against misuse.

Introducing OpenAI Frontier

openai

05.02.2026 06:00

0.696

Embedding sim.	0.8027
Entity overlap	0.3333
Title sim.	0.2593
Time proximity	0.5714

NLP тип	product_launch
NLP организация	OpenAI
NLP тема	enterprise ai
NLP страна

Открыть оригинал

OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.

Inside OpenAI’s in-house data agent

openai

29.01.2026 10:00

0.693

Embedding sim.	0.7984
Entity overlap	0.1667
Title sim.	0.1806
Time proximity	0.7976

NLP тип	other
NLP организация	OpenAI
NLP тема	large language models
NLP страна

Открыть оригинал

How OpenAI built an in-house AI data agent that uses GPT-5, Codex, and memory to reason over massive datasets and deliver reliable insights in minutes.

Snowflake and OpenAI partner to bring frontier intelligence to enterprise data

openai

02.02.2026 06:00

0.691

Embedding sim.	0.8217
Entity overlap	0.25
Title sim.	0.2135
Time proximity	0.4524

NLP тип	partnership
NLP организация	OpenAI
NLP тема	enterprise ai
NLP страна

Открыть оригинал

OpenAI and Snowflake partner in a $200M agreement to bring frontier intelligence into enterprise data, enabling AI agents and insights directly in Snowflake.

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk | NVIDIA Technical Blog

nvidia_dev_blog

30.01.2026 16:13

0.663

Embedding sim.	0.7892
Entity overlap	0.25
Title sim.	0.078
Time proximity	0.6178

NLP тип	other
NLP организация	nvidia
NLP тема	ai security
NLP страна

Открыть оригинал

AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a significant, often overlooked, attack surface by running tools from the command line with the same permissions and entitlements as the user, making them computer use agents, with all the risks those entail . 

 The primary threat to these tools is that of indirect prompt injection, where a portion of the content ingested by the LLM driving the model is provided by an adversary through vectors such as malicious repositories or pull requests, git histories with prompt injections, .cursorrules
, CLAUDE/AGENT.md files that contain prompt injections or malicious MCP responses. Such malicious instructions to the LLM can result in it taking attacker-influenced actions with adverse consequences.

 Manual approval of actions performed by the agent is the most common way to manage this risk, but it also introduces ongoing developer friction, requiring developers to repeatedly return to the application to review and approve actions. This creates a risk of user habituation where they simply approve potentially risky actions without reviewing them. A key requirement for agentic system security is finding the balance between hands-on user input and automation. The following controls are what the NVIDIA AI Red Team considers either required or highly recommended, but ‌should be implemented to reflect your specific use case and your organization’s risk tolerance.

 Based on the NVIDIA AI Red Team ’s experience, the following mandatory controls mitigate the most serious attacks that can be achieved with indirect prompt injection: 

 Network egress controls: Blocking network access to arbitrary sites prevents exfiltration of data or establishing a remote shell without additional exploits. 

 Block file writes outside of the workspace: Blocking write operations to files outside of the workspace prevents a number of persistence mechanisms, sandbox escapes, and remote code execution (RCE) techniques.

 Block writes to configuration files, no matter where they are located: Blocking writes to config files prevents exploitation of hooks, skills, and local model context protocol (MCP) configurations that often run outside of a sandbox context.

 These recommended controls further reduce the attack surface, making host enumeration and exploration more difficult, limiting risks posed by hooks, local MCP configurations, and kernel exploits, and closing other exploitation and disclosure risks.

 Prevent reads from files outside of the workspace.

 Sandbox the entire integrated development environment (IDE) and all spawned functions (e.g., hooks, MCP startup scripts, skills, and tool calls), and, where possible, are run as their own user.

 Use virtualization to isolate the sandbox kernel from the host kernel (e.g., microVM, Kata container, full VM)

 Require user approval for every instance of specific actions (e.g., a network connection) that otherwise violate isolation controls. Allow-once / run-many is not an adequate control.

 Use a secret injection approach to prevent secrets (e.g., in environment variables) from being shared with the agent.

 Establish lifecycle management controls for the sandbox to prevent the accumulation of code, intellectual property, or secrets.

 Note: This post doesn’t address risks arising from inaccurate or adversarially manipulated output from AI-powered tools, which are treated as user-level responsibilities.

 Why enforce sandbox controls at an OS level?

 Agentic tools, particularly for coding, perform arbitrary code execution by design. Automating test- or specification-driven development requires that the agent create and execute code to observe the results. In addition, tool-using agents are moving toward writing and executing throwaway scripts to perform tasks . 

 This makes application-level controls insufficient. They can intercept tool calls and arguments before execution, but once control passes to a subprocess, the application has no visibility into or control over the subprocess. Attackers often use indirection—calling a more restricted tool through a safer, approved one—as a common way to bypass application-level controls such as allowlists. OS-level controls, like macOS Seatbelt, work beneath the application layer to cover every process in the sandbox. No matter how these processes start, they’re kept from reaching risky system capabilities, even through indirect paths. 

 Mandatory sandbox security controls

 This section briefly outlines controls that the Red Team considers mandatory for agentic applications and the classes of attacks they help mitigate. When implemented together, they block simple exploitation techniques observed in practice. The section concludes with guidance on layering controls in real-world deployments.

 Network egress except to known-good locations

 The most obvious and direct threat of network access is remote access (a network implant, malware, or a simple reverse shell), enabling an attacker access to the victim machine, where they can directly probe and enumerate controls and attempt to pivot or escape. 

 Another significant threat is data exfiltration. Developer machines often contain a wide range of secrets and intellectual property of value to an attacker, often even in a current workspace (e.g., .env
 files with API tokens). Exfiltrating the contents of directories such as  ~/.ssh
 to gain access to other systems is a major target, as is exfiltrating sensitive source code.

 Network connections created by sandbox processes should not be permitted without manual approval. Tightly scoped allowlists enforced through HTTP proxy, IP, or port-based controls reduce user interaction and approval fatigue. Limiting DNS resolution to designated trusted resolvers to avoid DNS-based exfiltration is also recommended. A default-ask posture combined with enterprise-level denylists that cannot be overridden by local users provides a good balance between functionality and security.  

 Block file writes outside of the active workspace 

 Writing files outside of an active workspace is a significant risk. Files such as ~/.zshrc
 are executed automatically and can result in both RCE and sandbox escape. URLs in various key files, such as ~/.gitconfig
 or ~/.curlrc
, can be overwritten to redirect sensitive data to attacker-controlled locations. Malicious files, such as a backdoored python or node binary, could be placed in  ~/.local/bin
 to establish persistence or escape the sandbox.

 Write operations must be blocked outside of the active workspace at an OS level. Similarly to network controls, use an enterprise-level policy that blocks any such operation on known-sensitive paths, regardless of whether or not the user manually approves the action. These protected files should include dotfiles, configuration directories, and any additional paths enumerated by enterprise policy.  Any other out-of-workspace file write operations may be permitted with manual user approval.

 Block all writes to any agent configuration file or extension

 Many agentic systems, including agentic IDEs, permit the creation of extensions that enhance functionality and often include executable code. “Hooks” may define shell code to be executed on specific events (such as on prompt submission). MCP servers using an stdio transport define shell commands required to start the server. Claude Skills can include scripts, code, or helper functions that run as soon as the skill is called. Files such as .cursorrules
, CLAUDE.md
, copilot-instructions.md
, can provide adversaries with a durable way to shape the agent’s behavior, and in some cases, gain full control or even arbitrary code execution.

 In addition, agentic IDEs often contain global and local settings, including command allow and denylists, with local configuration settings in the active workspace. This can give attackers the ability to pivot or extend their reach if these local settings are modified. For example, adding a poisoned hooks configuration to a Git repository in a workspace can affect every user who clones it. Additionally, hooks and MCP initialization functions often run outside of a sandbox environment, offering an opportunity to escape sandbox controls.

 Application-specific configuration files, including those located within the current workspace, must be protected from any modification by the agent, with no user approval of such actions by the IDE possible. Direct, manual modification by the user is the only acceptable modification mechanism for these sensitive files.

 Tiered implementation of controls

 Defining universally applicable allow/denylists is difficult, given the wide range of use cases that agentic tools may be applied. The goal should be to block exploitable behavior while preserving manual user interventions as an infrequently-used fallback for unanticipated cases using a tiered approach such as the following:

 Establish clear enterprise-level denylists for access to critical files outside the current workspace that can’t be overridden by user-level allowlists or manual approval decisions.

 Allow read-write access within the agent’s workspace (with the exception of configuration files) without user approval.

 Permit specific allowlisted operations (e.g., read from ~/.ssh/gitlab-key
) that may be required for the proper functionality of specific functions.

 Assume default-deny for all other actions, permitting case-by-case user approval.

 This post doesn’t specifically address command allow/denylisting, as OS-level restrictions should make command-level blocks redundant, though they may be useful as a defense-in-depth mitigation against potential sandbox misconfigurations.

 Recommended sandbox security controls

 The required controls discussed provide strong protection against indirect prompt injection and help reduce approval fatigue. However, there are remaining potential vulnerabilities, including:

 Ingestion of malicious hooks or local MCP initialization commands.

 Kernel-level vulnerabilities that lead to sandbox escape and full host control.

 Agent access to secrets.

 Failure modes in product-specific caching of manual approvals.

 The accumulation of secrets, IP, or exploitable code in the sandbox.

 The additional controls and considerations help close some of these remaining potential vulnerabilities.

 Sandbox IDE and all spawned functions 

 Many agentic systems only apply sandboxing at the time of tool invocation (commonly only for the use of shell/command-line tools). While this does prevent a wide range of abuse mechanisms, there remain many agentic functionalities that often default to running outside of the sandbox. These include hooks, MCP configurations that spawn local processes, scripts used by ‘skills’, or other tools managed at the application layer. This is often required when sandboxes are associated only with command-line tools, while file-editing tools or search tools execute outside of a sandbox and are controlled at the application level. These unsandboxed execution paths can make it easier for attackers to bypass sandbox controls or obtain remote code execution.  

 The sandbox restrictions discussed should be enforced for all agentic operations, not just command-line tool invocations. Restrictions on write operations for files outside of the current workspace and configuration files are the most critical, while network egress from the sandbox should only be permitted for properly configured remote MCP server calls.

 Use virtualization to isolate the sandbox kernel from the host kernel

 Many sandbox solutions (macOS Seatbelt, Windows AppContainer, Linux Bubblewrap, Dockerized dev containers) share the host kernel, leaving it exposed to any code executed within the sandbox. Because agentic tools often execute arbitrary code by design, kernel vulnerabilities can be directly targeted as a path to full system compromise. 

 To prevent these attacks at an architectural level, run agentic tools within a fully virtualized environment isolated from the host kernel at all times, including VMs, unikernels, or Kata containers. Intermediate mitigations like gVisor, which mediate system calls via a separate user-space kernel, are preferable to fully shared solutions, but offer different and potentially weaker security guarantees than full virtualization.  

 While virtualization typically introduces some amount of overhead, it’s frequently modest compared to that induced by LLM calls. The lifecycle management of the virtualized environment should be tuned against the associated overhead required to minimize developer friction while preventing the accumulation of information.

 Prevent reads from files outside of the workspace

 Sandbox solutions often require access to certain files outside of the workspace, such as ~/.zshrc
, to reproduce the developer’s environment. Unrestricted read access exposes information of value to an attacker, enabling enumeration and exploration of the user’s device, secrets, and intellectual property.

 This follows a tiered approach consistent with the principle of least access:

 Use enterprise-level denylists to block reads from highly sensitive paths or patterns not required for sandbox operation.

 Limit allowlist external reads access to what is strictly necessary, ideally permitting reads only during sandbox initialization and blocking reads thereafter.

 Block all other reads outside the workspace unless manually approved by the user.

 Require manual user approval every time an action would violate default-deny isolation controls

 As described in the tiered implementation approach, default-deny actions that aren’t allowlisted or explicitly blocked should require manual user approval before execution. Enterprise-level denylists should never be overridden by user approval.

 Critically, approvals should never be cached or persisted, as a single legitimate approval immediately opens the door to future adversarial abuse.  For instance, permitting modification of ~/.zshrc
 once to perform a legitimate function may allow later adversarial activity to implant code on a subsequent execution without requiring re-approval. Each potentially dangerous action should require fresh user confirmation. 

 Use a secret injection approach to prevent secrets from being exposed to the agent

 Developer environments commonly contain a wide range of secrets, such as API keys in environment variables, credentials in ~/.aws
, tokens in . env
 files, and SSH keys. These secrets are often inherited by sandboxed processes or accessible within the filesystem, even when they aren’t required for the task at hand. This creates unnecessary exposure. 

 Even with network controls in place, exposed secrets remain a risk.

 Sandbox environments should rely on explicit secret injection to scope credentials to the minimum required for a given task, rather than inheriting the full set of host environment credentials. In practice:

 Start the sandbox with a simple or empty credential set.

 Remove any secrets that aren’t required for the current task. 

 Inject required secrets based only on the specific task or project, ideally via a mechanism that is not directly accessible to the agent (e.g., a credential broker that provides short-lived tokens on demand rather than long-lived credentials in environment variables).

 Continue enforcing standard security practices such as least privilege for all secrets.

 The goal is to limit the blast radius of any compromise so that a hypothetical attacker who gains control of agent behavior can only use secrets that have been explicitly provisioned for the current task and not the full set of credentials available in the host system.

 Establish lifecycle management controls for the sandbox

 Long-running sandbox environments can accumulate artifacts over time from downloaded dependencies, generated scripts, cached credentials, intellectual property from previous projects, and temporary files that persist longer than intended. This expands the potential attack surface and increases the value of a compromise. When an attacker gains access to an agent operating in a stale sandbox, they may find secrets, proprietary code, or tools required for earlier work that can be repurposed.

 The details of lifecycle management vary based on sandbox architecture, initialization overhead, and project complexity. The key principle is ensuring that the sandbox state doesn’t persist indefinitely, whether through:

 Ephemeral sandboxes: Using sandbox architectures where the environment exists only for the duration of a specific task or command (e.g., Kata containers created and destroyed per execution), preventing accumulation.

 Explicit lifecycle management: Periodically destroying and recreating the sandbox environment in a known-good state (e.g., weekly for VM-based sandboxes), ensuring accumulated state is cleared on a known schedule.

 While the provider of the agentic tool is responsible for ensuring lifecycle management, organizations should evaluate their sandbox architecture and establish lifecycle policies that balance initialization overhead and developer friction against accumulation risk.  

 Learn more

 Agentic tools represent a significant shift in how developers work. They offer productivity gains through automated code generation, testing, and execution. However, these benefits come with a corresponding expansion of the attack surface. As agentic tools continue to evolve, gaining new capabilities, integrations, and autonomy, the attack surface evolves with them. The principles outlined in this post should be revisited as new features come out. Organizations should regularly validate that their sandbox implementations provide the isolation and security controls they expect.

 Learn more about agentic security from the NVIDIA AI Red Team, including:

 The risks of unsandboxed code .

 Research on agentic developer tools .

 How the AI Red Team does threat modeling for agentic applications.

 Discuss (1)

 Like

 Tags

 Agentic AI / Generative AI | Trustworthy AI / Cybersecurity | General | Intermediate Technical | Deep dive | AI Agent | AI Red Team | featured

 About the Authors

 About Rich Harang

 Rich Harang is a Principal Security Architect at NVIDIA, specializing in ML/AI systems, with over a decade of experience at the intersection of computer security, machine learning, and privacy. He received his PhD in Statistics from the University of California Santa Barbara in 2010. Prior to joining NVIDIA, he led the Algorithms Research team at Duo, led research on using machine learning models to detect malicious software, scripts, and web content at Sophos AI, and worked as a Team Lead at the US Army Research Laboratory. His research interests include adversarial machine learning, addressing bias and uncertainty in machine learning, and ways to use machine learning to support human analysis. Richard’s work has been presented at USENIX, BlackHat, IEEE S&P workshops, and DEF CON AI Village, among others, and has also been featured in The Register and KrebsOnSecurity.

 View all posts by Rich Harang

 Comments

 Related posts

 Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell

 Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell

 How Code Execution Drives Key Risks in Agentic AI Systems

 How Code Execution Drives Key Risks in Agentic AI Systems

 From Assistant to Adversary: Exploiting Agentic AI Developer Tools

 From Assistant to Adversary: Exploiting Agentic AI Developer Tools

 Practical LLM Security Advice from the NVIDIA AI Red Team

 Practical LLM Security Advice from the NVIDIA AI Red Team

 Agentic Autonomy Levels and Security

 Agentic Autonomy Levels and Security

 Related posts

 Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere 

 Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere 

 Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI

 Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI

 Design, Simulate, and Scale AI Factory Infrastructure with NVIDIA DSX Air

 Design, Simulate, and Scale AI Factory Infrastructure with NVIDIA DSX Air

 Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell

 Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell

 NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

 NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

 L

 T

 F

 R

 E

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

huggingface

03.02.2026 15:03

0.647

Embedding sim.	0.7539
Entity overlap	0.0256
Title sim.	0.1356
Time proximity	0.8032

NLP тип	other
NLP организация	DeepSeek
NLP тема	open source
NLP страна	China

Открыть оригинал

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

 Team Article Published
 February 3, 2026

 Upvote 52

 +46

 Adina Yakefu AdinaY 

 huggingface

 Irene Solaiman irenesolaiman 

 huggingface

 China's Organic Open Source AI Ecosystem The Established
 The Normalcy of "DeepSeek Moments"

 Foundations for the Future
 Looking Back to Look Forward

 This is the third and final blog in a three-part series on China's open source community's historical advancements since January 2025's "DeepSeek Moment." The first blog on strategic changes and open artifact growth is available here , and the second blog on architectural and hardware shifts is available here .

 In this third article, we examine paths and trajectories of prominent Chinese AI organizations, and posit future directions for open source.

 For AI researchers and developers contributing to and relying on the open source ecosystem and for policymakers understanding the rapidly changing environment, due to intraorganizational and global community gains, open source is the dominant and popular approach for Chinese AI organizations for the near future. Openly sharing artifacts from models to papers to deployment infrastructure maps to a strategy with the goal of large-scale deployment and integration.  

 China's Organic Open Source AI Ecosystem

 Having examined strategic and architectural changes since DeepSeek's R1, we get a glimpse for the first time at how an organic open source AI ecosystem is taking shape in China. A culmination of powerful players, some established in open source, some new players, and some changing course entirely to contribute to the new open culture, signal that the open collaborative approach is mutually beneficial. 

 This collaboration is reaching beyond national boundaries; the most followed organization on Hugging Face is DeepSeek, and the fourth most followed is Qwen. 

 In addition to models, openly sharing science and techniques has not only informed other AI organizations, but also the entire open source community. The most popular papers on Hugging Face largely come from Chinese organizations, namely ByteDance, DeepSeek, Tencent, and Qwen.

 source: https://huggingface.co/spaces/evijit/PaperVerse

 The Established

 Alibaba positioned open source as an ecosystem and infrastructure strategy. Qwen was not shaped as a single flagship model, but continuously expanded into a family covering multiple sizes, tasks, and modalities, with frequent updates on Hugging Face and their own platform ModelScope. Its influence did not concentrate on any single version. Instead, it was repeatedly reused as a component across different scenarios, gradually taking on the role of a general AI foundation. By mid-2025, Qwen became the model with most derivatives on Hugging Face, with over 113k models using Qwen as a base, and over 200k model repositories tagging Qwen, far exceeding Meta's Llama's 27k or DeepSeek's 6k. Organization-wide, Alibaba boasts the most derivatives almost as much as both Google and Meta combined.

 At the same time, Alibaba aligned model development with cloud and hardware infrastructure, integrating models, chips, platforms, and applications into a single engineering stack.

 Tencent also made a significant move from borrowing to building. As one of the first major companies to integrate DeepSeek into core consumer-facing products after R1's release, Tencent did not initially frame open source as a public narrative. Instead, it brought mature models in through plug-in style integration, ran large-scale internal validation, and only later began to release its own capabilities. From May 2025 onward, Tencent accelerated open releases in areas where it already had strengths, such as vision, video, and 3D with its own brand named Tencent Hunyuan (it's now Tencent HY), and these models quickly gained adoption in the community. 

 ByteDance , by following its "AI application factory" approach, starts selectively open sourcing high value components while keeping its competitive focus on product entry points and large scale usage. In this context, the ByteDance Seed team has contributed several notable open-source artifacts, including UI-TARS-1.5 for multimodal UI understanding, ** Seed-Coder **for data-centric code modeling, and the SuperGPQA dataset for systematic reasoning evaluation. Despite a relatively low-profile open-source presence, ByteDance has achieved significant scale in China's AI market, with its AI application Doubao surpassing 100 million DAU in December 2025.

 The most notable change within Baidu , whose CEO openly calls short on open source, also started its shift: after years of prioritizing closed models, it re-entered the ecosystem through free access and open release, such as the Ernie 4.5 series . This shift was accompanied by renewed investment in its open-source framework, PaddlePaddle , as well as its own AI chip Kunlunxin , which announced an IPO on January 1, 2026. By connecting models, chips, and PaddlePaddle within a more open system, Baidu can lower costs, attract developers, and influence standards, while maintaining strategic control under shared constraints of compute, cost, and regulation.

 The Normalcy of "DeepSeek Moments"

 Among startups, Moonshot , Z.ai , and MiniMax adjusted rapidly and brought new momentum to the open-source community within months after R1. Models such as Kimi K2 , GLM-4.5 , and MiniMax M2 all earned places on AI-World's open-model milestone rankings.At the end of 2025, Z.ai and MiniMax released their most advanced open-source models to date and subsequently announced their IPO plans in close succession.

 The open-sourcing of Kimi K2 was widely described as a "Another DeepSeek moment" for the community. Although Moonshot has not announced an IPO, market reports indicate that the company raised approximately $500M in funding by the end of 2025, with AGI and agent-based systems positioned as its primary commercialization objectives. 

 Those application-first companies such as Xiaohongshu , Bilibili , Xiaomi , and Meituan , previously focused only on the application layer, began training and releasing their own models. With their native advantage in real usage scenarios and domain data, once strong reasoning became available at low cost through open source, building in-house models became practical. It tunes AI around their specific businesses, rather than being constrained by the cost structures or limits of external providers.

 If the business world seized the ROI positive opportunity for growth, research institutions and the broader community welcomed the shift even more willingly. Organizations such as BAAI and Shanghai AI Lab redirected more effort toward toolchains, evaluation systems, data platforms, and deployment infrastructure, projects like FlagOpen, OpenDataLab, and OpenCompass. These efforts did not chase single-model performance, but instead strengthened the long-term foundations of the ecosystem.

 Foundations for the Future

 The defining feature of the new ecosystem is not that there are more models, but that an entire chain has formed. Models can be open-sourced and extended; deployments can be reused and scaled; software and hardware can be coordinated and swapped; and governance capabilities can be embedded and audited. This is a shift from isolated breakthroughs to a system that can actually run in the real world.

 This ecosystem did not appear overnight. It is built on years of accumulated infrastructure "tailwind" since 2017. Over the past several years, China has periodically invested in data centers and compute centers, gradually forming a nationwide, integrated compute layout centered on the "East Data, West Compute" strategy . The national plan established 8 major compute hubs and 10 data center clusters, guiding compute demand from the east toward the central and western regions.

 Public information suggests China intends to invest in continual growth in energy capacity . China's total compute capacity is around 1590 EFLOPS as of the year 2025, ranking among the top globally. Sources in China assert that intelligent compute capacity, tailored for AI training and deployment, is expected to grow by roughly 43% year over year , far outpacing general-purpose compute. At the same time,the average data center power usage effectiveness (PUE) fell to around 1.46, indicating better effectiveness and providing a solid hardware foundation for AI at scale. Energy is a clear key focus. 

 If the 2017 "New Generation AI Development Plan" was mainly about setting direction and building foundations, then the August 2025 "AI+" action plan clearly shifted focus toward large-scale deployment and deep integration. This marks a directionally different pursuit from AGI . The emergence of R1 provided the missing "lift" at the engineering and ecosystem level. It was the catalyst that systematically activated compute, energy, and data infrastructure that had already been built.

 As a result, in the year following R1's release, China's AI development accelerated along two main paths. First, AI became more deeply embedded in industrial processes, moving beyond chatbots toward agents and workflows. Second, greater emphasis was placed on autonomous and controllable AI systems , reflected in more flexible training pathways and increasingly localized deployment strategies. 

 Looking back, the real turning point was not the growth in the number of models, but a fundamental change in how open-source models are used . Open source moved from an optional choice to a default assumption in system design . Models became reusable and composable components within larger engineering systems.

 Looking Back to Look Forward

 From DeepSeek to "AI+", China's path in 2025 was not about chasing peak performance. It was about building a practical path organized around open source, engineering efficiency, and scalable delivery , a path that has already begun to run on its own.

 Resource constraints did not limit China's AI development. In some respects, they reshaped its trajectory. The release of DeepSeek R1 acted as a catalytic event, triggering a chain of responses across the domestic industry and accelerating the formation of a more organically structured ecosystem. At the same time, this shift created a critical window for continued domestic research and development. As this ecosystem matures, its longer-term impact---and how the global AI community may engage with an increasingly self-sustaining AI ecosystem in China---will become important questions for future discussion.

 More from this author

 State of Open Source on Hugging Face: Spring 2026

 73
 March 17, 2026

 Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek 

 45
 January 27, 2026

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT

openai

13.02.2026 10:00

0.647

Embedding sim.	0.7765
Entity overlap	0.1
Title sim.	0.2059
Time proximity	0.4345

NLP тип	product_launch
NLP организация	ChatGPT
NLP тема	ai security
NLP страна

Открыть оригинал

Introducing Lockdown Mode and Elevated Risk labels in ChatGPT to help organizations defend against prompt injection and AI-driven data exfiltration.

Testing ads in ChatGPT

openai

09.02.2026 11:00

0.645

Embedding sim.	0.7694
Entity overlap	0.375
Title sim.	0.1321
Time proximity	0.4226

NLP тип	other
NLP организация	OpenAI
NLP тема	enterprise ai
NLP страна

Открыть оригинал

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.

OpenAI researcher quits over ChatGPT ads, warns of "Facebook" path

arstechnica_ai

11.02.2026 20:44

0.642

Embedding sim.	0.7518
Entity overlap	0.1111
Title sim.	0.1667
Time proximity	0.6563

NLP тип	leadership_change
NLP организация	OpenAI
NLP тема	ai ethics
NLP страна	United States

Открыть оригинал

Ad nausea

 OpenAI researcher quits over ChatGPT ads, warns of “Facebook” path

 Zoë Hitzig resigned on the same day OpenAI began testing ads in its chatbot.

 Benj Edwards

 –

 Feb 11, 2026 3:44 pm

 |

 84

 Credit:

 Aurich Lawson | Getty Images

 Credit:

 Aurich Lawson | Getty Images

 Text
 settings

 Story text

 Size

 Small
 Standard
 Large

 Width
 *

 Standard
 Wide

 Links

 Standard
 Orange

 * Subscribers only

    Learn more

 Minimize to nav

 On Wednesday, former OpenAI researcher Zoë Hitzig published a guest essay in The New York Times announcing that she resigned from the company on Monday, the same day OpenAI began testing advertisements inside ChatGPT. Hitzig, an economist and published poet who holds a junior fellowship at the Harvard Society of Fellows, spent two years at OpenAI helping shape how its AI models were built and priced. She wrote that OpenAI’s advertising strategy risks repeating the same mistakes that Facebook made a decade ago.

 “I once believed I could help the people building A.I. get ahead of the problems it would create,” Hitzig wrote. “This week confirmed my slow realization that OpenAI seems to have stopped asking the questions I’d joined to help answer.”

 Hitzig did not call advertising itself immoral. Instead, she argued that the nature of the data at stake makes ChatGPT ads especially risky. Users have shared medical fears, relationship problems, and religious beliefs with the chatbot, she wrote, often “because people believed they were talking to something that had no ulterior agenda.” She called this accumulated record of personal disclosures “an archive of human candor that has no precedent.”

 She also drew a direct parallel to Facebook’s early history, noting that the social media company once promised users control over their data and the ability to vote on policy changes. Those pledges eroded over time, Hitzig wrote, and the Federal Trade Commission found that privacy changes Facebook marketed as giving users more control actually did the opposite.

 She warned that a similar trajectory could play out with ChatGPT: “I believe the first iteration of ads will probably follow those principles. But I’m worried subsequent iterations won’t, because the company is building an economic engine that creates strong incentives to override its own rules.”

 Ads arrive after a week of AI industry sparring

 Hitzig’s resignation adds another voice to a growing debate over advertising in AI chatbots. OpenAI announced in January that it would begin testing ads in the US for users on its free and $8-per-month “Go” subscription tiers, while paid Plus, Pro, Business, Enterprise, and Education subscribers would not see ads. The company said ads would appear at the bottom of ChatGPT responses, be clearly labeled, and would not influence the chatbot’s answers.

 The rollout on Sunday followed a week of public jabs between OpenAI and its rival, Anthropic. Anthropic declared Claude would remain ad-free, then ran Super Bowl ads with the tagline “Ads are coming to AI. But not to Claude,” which depicted AI chatbots awkwardly inserting product placements into personal conversations.

 OpenAI CEO Sam Altman called the ads “funny” but “clearly dishonest,” writing on X that OpenAI “would obviously never run ads in the way Anthropic depicts them.” He framed the ad-supported model as a way to bring AI to users who cannot afford subscriptions, writing that “Anthropic serves an expensive product to rich people.”

 Anthropic responded as part of an advertising campaign of its own that including ads in conversations with its Claude chatbot “would be incompatible with what we want Claude to be: a genuinely helpful assistant for work and for deep thinking.” The company said more than 80 percent of its revenue comes from enterprise customers.

 What Hitzig saw from the inside

 Regardless of the debate over whether AI chatbots should carry ads, OpenAI’s support documentation reveals that ad personalization is enabled by default for users in the test. If left on, ads will be selected using information from current and past chat threads, as well as past ad interactions. Advertisers do not receive users’ chats or personal details, OpenAI says, and ads will not appear near conversations about health, mental health, or politics.

 In her essay, Hitzig pointed to what she called an existing tension in OpenAI’s principles. She noted that while the company states it does not optimize for user activity solely to generate advertising revenue, reporting has suggested that OpenAI already optimizes for daily active users, “likely by encouraging the model to be more flattering and sycophantic.”

 She warned that this optimization can make users feel more dependent on AI models for support, pointing to psychiatrists who have documented instances of “chatbot psychosis” and allegations that ChatGPT reinforced suicidal ideation.

 OpenAI currently faces multiple wrongful death lawsuits, including one alleging ChatGPT helped a teenager plan his suicide and another alleging it validated a man’s paranoid delusions about his mother before a murder-suicide.

 Rather than framing the debate as ads versus no ads, Hitzig proposed several structural alternatives. These included cross-subsidies modeled on the FCC’s universal service fund (in which businesses that pay for high-value AI labor would subsidize free access for others), independent oversight boards with binding authority over how conversational data is used in ad targeting, and data trusts or cooperatives in which users retain control of their information. She pointed to the Swiss cooperative MIDATA and Germany’s co-determination laws as partial precedents.

 Hitzig closed her essay with what she described as the two outcomes she fears most: “a technology that manipulates the people who use it at no cost, and one that exclusively benefits the few who can afford to use it.”

 A changing of the AI seasons

 Hitzig was not the only prominent AI researcher to publicly resign this week. On Sunday, Mrinank Sharma, who led Anthropic’s Safeguards Research Team and co-authored a widely cited 2023 study on AI sycophancy, announced his departure in a letter warning that “the world is in peril.” He wrote that he had “repeatedly seen how hard it is to truly let our values govern our actions” inside the organization and said he plans to pursue a poetry degree (Hitzig, coincidentally, is also a published poet ).

 On Monday, xAI co-founder Yuhuai “Tony” Wu also resigned , followed the next day by fellow co-founder Jimmy Ba. They were part of a larger wave: at least nine xAI employees, including the two co-founders, publicly announced their departures over the past week, according to TechCrunch. Six of the company’s 12 original co-founders have now left.

 The departures follow Elon Musk’s decision to merge xAI with SpaceX in an all-stock deal ahead of a planned IPO, a transaction that converted xAI equity into shares of a company valued at $1.25 trillion, though it is unclear whether the timing of the departures is related to vesting schedules.

 The three sets of departures across OpenAI, Anthropic, and xAI appear unrelated in their specifics, but they arrive during a period of rapid commercialization across the AI industry that has tested the patience of researchers at multiple companies, and they fit a broader pattern of turnover and burnout that has become common at major AI labs.

 Benj Edwards

 Senior AI Reporter

 Benj Edwards

 Senior AI Reporter

 Benj Edwards was a reporter at Ars Technica covering artificial intelligence and technology history.

 84 Comments

Thoughts on the job market in the age of LLMs

interconnects

30.01.2026 15:49

0.641

Embedding sim.	0.7318
Entity overlap	0.1154
Title sim.	0.0549
Time proximity	0.9953

NLP тип	other
NLP организация	ai2
NLP тема	large language models
NLP страна

Открыть оригинал

Thoughts on the job market in the age of LLMs
 On standing out and finding gems.
 Nathan Lambert
 Jan 30, 2026

 96

 14
 14

 Share

 Article voiceover
 0:00

 -10:41

 Audio playback is not supported on your browser. Please upgrade. There’s a pervasive, mutual challenge in the job market today for people working in (or wanting to work in) the cutting edge of AI. On the hiring side, it often feels impossible to close, or even get interest from, the candidates you want. On the individual side, it quite often feels like the opportunity cost of your current job is extremely high — even if on paper the actual work and life you’re living is extremely good — due to the crazy compensation figures.
 For established tech workers, the hiring process in AI can feel like a bit of a constant fog. For junior employees, it can feel like a bit of a wall.
 In my role as a bit of a hybrid research lead, individual contributor, and mentor, I spend a lot of time thinking about how to get the right people for me to work with and the right jobs for my mentees.
 The advice here is shaped by the urgency of the current moment in LLMs. These are hiring practices optimized for a timeline of relevance that may need revisiting every 1-2 years as the core technology changes — which may not be best for long-term investment in people, the industry, or yourself. I’ve written separately about the costs of this pace, and don’t intend to carry this on indefinitely.
 The most defining feature of hiring in this era is the complexity and pace of progress in language models. This creates two categories. For one, senior employees are much more covetable because they have more context of how to work in and steer complex systems over time. It takes a lot of perspective to understand the right direction for a library when your team can make vastly more progress on incremental features given AI agents. Without vision, the repositories can get locked with too many small additions. With powerful AI tools I expect the impact of senior employees to grow faster than adding junior members to the team could.
 This view on the importance of key senior talent has been a recent swing, given my experiences and expectations for current and future AI agents , respectively:
 Every engineer needs to learn how to design systems. Every researcher needs to learn how to run a lab. Agents push the humans up the org chart.

 On the other side, junior employees have to prove themselves in a different way. The number one defining trait I look for in a junior engineering employee is an almost fanatical obsession with making progress, both in personal understanding and in modeling performance. The only way to learn how the sausage gets made is to do it, and to catch up it takes a lot of hard work in a narrow area to cultivate ownership. With sufficient motivation, a junior employee can scale to impact quickly, but without it, it’s almost replaceable with coding agents (or will be soon). This is very hard work and hard to recruit for. The best advice I have on finding these people is “vibes,” so I am looking for advice on how to find them too! 1
 For one, when I brought Florian Brand on to help follow open models for Interconnects, when I first chatted with him he literally said “since ChatGPT came out I’ve been fully obsessed with LLMs.” You don’t need to reinvent the wheel here — if it’s honest, people notice.
 For junior researchers, there’s much more grace, but that’s due to them working in an education institution first and foremost, instead of the understatedly brutal tech economy. A defining feature that creates success here is an obsession with backing up claims. So a new idea improves models, why? So our evaluation scores are higher, what does this look like in our harness? Speed of iteration follows from executing on this practice. Too many early career researchers try to build breadth of impact (e.g. collecting contributions on many projects) before clearly demonstrating, to themselves and their advisors, depth. The best researchers then bring both clarity of results and velocity in trying new ideas.
 Working in academia today is therefore likely to be a more nurturing environment for junior talent, but it comes with even greater opportunity costs financially. I’m regularly asked if one should leave a Ph.D. to get an actual job, and my decision criteria is fairly simple. If you’re not looking to become a professor and have an offer to do modeling research at a frontier lab (Gemini, Anthropic, OpenAI is my list) then there’s little reason to stick around and finish your Ph.D.
 The little reason that keeps people often ends up being personal pride in doing something hard, which I respect. It’s difficult to square these rather direct pieces of career advice with my other recommendations of choosing jobs based on the people, as you’ll spend a ton of your life with them, more than the content of what you’ll be doing. Choosing jobs based on people is one of the best ways to choose your job based on the so-called “vibes.”
 Working in a frontier lab in product as an alternative to doing a Ph.D. is a path to get absorbed in the corporate machine and not stand out, reducing yourself to the standard tech career ladder. Part of what I feel like works so well for me , and other people at Ai2, is having the winning combination of responsibility, public visibility, and execution in your work. There is something special for career progression that comes from working publicly, especially when the industry is so closed, where people often overestimate your technical abilities and output. Maybe this is just the goodwill that comes from open-source contributions paying you back.
 Share
 If you go to a closed lab, visibility is almost always not possible, so you rely on responsibility and execution. It doesn’t matter if you execute if you’re doing great work on a product or model that no one ever touches. Being in the core group matters.
 This then all comes back to finding the people hiring pipeline.
 There are many imperfect signals out there, both positive and negative. For individuals building their portfolio, it’s imperative to avoid negative signals because the competition for hiring is so high. A small but clear negative signal is a junior researcher being a middle author on too many papers. Just say no, it helps you.
 The positive signals are messier, but still doable. It’s been said that you can tell someone is a genius by reading one Tweet from them, and I agree with this. The written word is still an incredibly effective and underutilized communication form. One excellent blog post can signify real, rare understanding. The opposite holds true for AI slop. One AI slop blog post will kill your application.
 The other paths I often advise people who reach out asking how to establish a career in AI are open-source code contributions or open research groups (e.g. EluetherAI). I’ve seen many more success cases on the former, in open-source code. Still, it’s remarkably rare, because A) most people don’t have the hardware to add meaningful code to these popular LLM repositories and B) most people don’t stick with it long enough. Getting to the point of making meaningful contributions historically has been very hard.
 Doing open-source AI contributions could be a bit easier in the age of coding agents, as a lot of the limiting factors today are just bandwidth in implementing long todo lists of features, but standing out amid the sea of AI slop PRs and Issues will be hard. That’ll take class, creativity, humanity, and patience. So, to be able to run some tiny models on a $4000 DGX Spark is an investment, but it’s at least somewhat doable to iterate on meaningful code contributions to things like HuggingFace’s ML libraries (I’ve been writing and sharing a lot about how I’m using the DGX Spark to iterate on our codebases at Ai2).
 Back to the arc of hiring, the above focused on traits, but the final piece of the puzzle is alignment. The first question to ask is “is this person good?” The second question is, “will this person thrive here?” Every organization has different constraints, but especially in small teams, the second question defines your culture. In a startup, if you grow too fast you definitely lose control of your culture. This isn’t to say that the company won’t have a strong or useful culture, it’s to say you can’t steer it. The culture of an organization is the byproduct of how all the individuals interact. You do not want to roll the dice here.
 Interconnects AI is a reader-supported publication. Consider becoming a subscriber.

 Subscribe

 Personally, I’m working on building out a few more spots in a core post-training methods team at Ai2. Post-training recipes have gotten very complicated, and we’re working on making them easier to run while doing research on fundamentals such as post-training data mixing and scaling laws. To be a little vague, getting the post-training recipes done for both Olmo 3 and Olmo 2 was... very hard on the team. At the same time, post-training hasn’t gotten much more open, so hiring through it and doing the hard work is the only way.
 Ideally I would hire one engineer and one researcher, both fairly senior, meaning at least having a Ph.D. or a similar number of years working in technology. Junior engineers with some experience and the aforementioned obsession would definitely work.
 This callout serves as a good lesson for hiring. It is intentional that people should self-filter for this, no one likes when you way overreach on selling yourself for a job. I also intentionally make people find my email for this as an exercise. The art of cold emailing and approaching people in the correct pipelines is essential to getting hired. Many people you look up to in AI read their emails, the reason you don’t get a response is because you didn’t format your email correctly. The best cold emails show the recipient that they learned from it or obviously benefitted from getting it. Platitudes and compliments are of course nice to receive, but the best cold emails inspire action.
 Two of the most recent people I helped hire at Ai2 I learned of through these side-door job applications (i.e. not found through the pile of careers page applications). I learned of Finbarr through his blogs and online reputation. Tyler sent me an excellent cold email with high-quality blog posts relating to my obvious, current areas of interest and had meaningful open-source LLM contributions. Both have been excellent teammates (and friends), so I’m always happy to say the system works, it’s just intimidating.
 All together, I’m very torn on the AI job market. It’s obviously brutal for junior members of our industry, it obviously feels short sighted, it obviously comes with tons of opportunity costs, and so on. At the same time, it’s such a privilege to be able to contribute to such a meaningful, and exciting technology. My grounding for hiring is still going to be a reliance on my instincts and humanity, and not to get too tied down with all the noise. Like most things, it just takes time and effort.

 Other posts in my “ life thoughts ” series include the following. I send these to people when they ask me for career advice generally, as I don’t have time to give great individual responses:
 Apr 05, 2023: Behind the curtain: what it feels like to work in AI right now

 Oct 11, 2023: The AI research job market shit show (and my experience)

 Oct 30, 2024: Why I build open language models

 May 14, 2025: My path into AI

 Jun 06, 2025: How I Write

 Oct 25, 2025: Burning out

 1 Some companies hire heavily out of Twitter, some hire from communities such as GPU Mode or NanoGPT speedrunning.

 96

 14
 14

 Share

 Previous Next

One path that replaces 50 saved tabs and 12 half-started repos

towards_ai

30.01.2026 15:02

0.638

Embedding sim.	0.7479
Entity overlap	0.1111
Title sim.	0.046
Time proximity	0.8271

NLP тип	other
NLP организация	Towards AI
NLP тема	large language models
NLP страна

Открыть оригинал

This week, Dario Amodei&#8217;s essay put words to what many teams are quietly bumping up against: the models are maturing faster than the builders. That&#8217;s why so many LLM projects keep dying in the same spot.
 In 48 hours (Feb 1, 2026), we&#8217;re running a live cohort kickoff call that closes this exact gap with a production-ready plan: what to build first, what to measure, and how to ship LLM systems that actually hold up.
 How to join the kickoff: enroll in any Towards AI course, and the cohort link lands in your welcome email.
 Access the Cohort by Enrolling! 
 
 If your goal is to go from fundamentals to production habits and full-stack execution, this is the most straightforward track we recommend:
 10-Hour Crash Course &#8594; Expert LLM Developer (Bundle) 
 

 
 It combines our most adopted courses with our bestselling book, and it&#8217;s sequenced like a real build path, so your effort compounds.
 Start the LLM Developer track (bundle + cohort access) 
 Here&#8217;s how the bundle pulls you out of demo-land:
 1) Guesswork, replaced by a mental model. 
 10-Hour LLM Fundamentals (video) gives you the core understanding: how LLMs behave, how to build with them, how to evaluate outputs, and how to maintain robust solutions as requirements shift.
 2) Fragility, replaced by production discipline. 
 Building LLMs for Production gives you timeless principles for building dependable systems: how to measure quality, debug failures, and iterate without rewriting the whole app every time something breaks.
 3) &#8220;I can&#8217;t ship this,&#8221; replaced by full-stack skill. 
 Full Stack AI Engineering is where you put it all together end-to-end and ship a real product: data, retrieval, prompting/agents, evaluation, and deployment.
 If you&#8217;ve been circling this space for months, the risk isn&#8217;t &#8220;starting and failing.&#8221; The risk is staying in demo-land while the bar for real LLM skill quietly becomes: can you ship something that holds up? 
 Cohort kickoff is in 48 hours (Feb 1, 2026) . If you want the end-to-end framework we use in enterprise projects, start with the kickoff.
 Join before Feb 1 and get the cohort access!

Project Genie: Experimenting with infinite, interactive worlds

deepmind

29.01.2026 17:00

0.637

Embedding sim.	0.734
Entity overlap	0.0833
Title sim.	0.0455
Time proximity	0.9583

NLP тип	product_launch
NLP организация	Google DeepMind
NLP тема	generative ai
NLP страна	United States

Открыть оригинал

Breadcrumb

 Innovation & AI

 Models & research

 Google DeepMind

 Project Genie: Experimenting with infinite, interactive worlds

 Jan 29, 2026

 ·

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.

 Diego Rivas

 Product Manager, Google DeepMind

 Elliott Breece

 Product Manager, Google Labs

 Suz Chambers

 Director, Google Creative Lab

 Read AI-generated summary

 General summary

 Google is rolling out Project Genie to Google AI Ultra subscribers in the U.S. Project Genie is a research prototype that lets you create, explore and remix interactive worlds. You can use text prompts and images to build environments and navigate them in real time.

 Summaries were generated by Google AI. Generative AI is experimental.

 Bullet points

 "Project Genie" lets Google AI Ultra users create, explore, and remix interactive worlds.

 Genie 3 powers the prototype, generating real-time paths as you move and interact.

 Users can sketch worlds with text/images, explore them, and remix existing creations.

 The prototype has limitations, like world realism and character control, but is improving.

 Google aims to expand access to Project Genie and its world-building tech in time.

 Summaries were generated by Google AI. Generative AI is experimental.

 Explore other styles:

 General summary

 Bullet points

 Share

 x.com

 Facebook

 LinkedIn

 Mail

 Copy link

 Your browser does not support the audio element.

 Listen to article

 This content is generated by Google AI. Generative AI is experimental

 [[duration]] minutes

 Voice

 Speed

 Voice

 Speed
 0.75X
 1X
 1.5X
 2X

 In August, we previewed Genie 3 , a general-purpose world model capable of generating diverse, interactive environments. Even in this early form, trusted testers were able to create an impressive range of fascinating worlds and experiences, and uncovered entirely new ways to use it. The next step is to broaden access through a dedicated, interactive prototype focused on immersive world creation.
 Starting today, we're rolling out access to Project Genie for Google AI Ultra subscribers in the U.S (18+). This experimental research prototype lets users create, explore and remix their own interactive worlds.
 How we’re advancing world models
 A world model simulates the dynamics of an environment, predicting how they evolve and how actions affect them. While Google DeepMind has a history of agents for specific environments like Chess or Go , building AGI requires systems that navigate the diversity of the real world.
 To meet this challenge and support our AGI mission, we developed Genie 3. Unlike explorable experiences in static 3D snapshots, Genie 3 generates the path ahead in real time as you move and interact with the world. It simulates physics and interactions for dynamic worlds, while its breakthrough consistency enables the simulation of any real-world scenario — from robotics and modelling animation and fiction, to exploring locations and historical settings.
 Building on our model research with trusted testers from across industries and domains, we are taking the next step with an experimental research prototype: Project Genie.
 How Project Genie works
 Project Genie is a prototype web app powered by Genie 3, Nano Banana Pro and Gemini , which allows users to experiment with the immersive experiences of our world model firsthand. The experience is centred on three core capabilities:

 1. World sketching
 Prompt with text and generated or uploaded images to create a living, expanding environment. Create your character, your world, and define how you want to explore it — from walking to riding, flying to driving, and anything beyond.
 For more precise control, we have integrated “World Sketching” with Nano Banana Pro. This allows you to preview what your world will look like and modify your image to fine tune your world prior to jumping in. You can also define your perspective for the character — such as first-person or third-person — giving you control over how you experience the scene before you enter.

 2. World exploration
 Your world is a navigable environment that’s waiting to be explored. As you move, Project Genie generates the path ahead in real time based on the actions you take. You can also adjust the camera as you traverse through the world.
 3. World remixing
 Remix existing worlds into new interpretations, by building on top of their prompts. You can also explore curated worlds in the gallery or by selecting the randomizer icon for inspiration, or build on top of them. And once you’re done, you can download videos of your worlds and your explorations.

 How we’re building responsibly
 Project Genie is an experimental research prototype in Google Labs, powered by Genie 3. As with all our work towards general AI systems, our mission is to build AI responsibly to benefit humanity. Since Genie 3 is an early research model, there are a few known areas for improvement:
 Generated worlds might not look completely true-to-life or always adhere closely to prompts or images, or real-world physics
 Characters can sometimes be less controllable, or experience higher latency in control
 Limitations in generations to 60 seconds
 A few of the Genie 3 model capabilities we announced in August, such as promptable events that change the world as you explore it, are not yet included in this prototype. You can find more details on model limitations and future updates on how we’re improving the experience, here .
 Building on the work we have been doing with trusted testers, we are excited to share this prototype with users of our most advanced AI to better understand how people will use world models in many areas of both AI research and generative media.
 Access to Project Genie begins rolling out today to Google AI Ultra subscribers

 1

 in the U.S. (18+), expanding to more territories in due course. We look forward to seeing the infinitely diverse worlds they create, and in time, our goal is to make these experiences and technology accessible to more users.

 POSTED IN:

Google DeepMind

AI

Google One