MuZero, AlphaZero, and AlphaDev: Optimizing computer systems - Related to computer, environments, generalist, a, assistant
A generalist AI agent for 3D virtual environments

Research A generalist AI agent for 3D virtual environments Share.
We present new research on a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings Video games are a key proving ground for artificial intelligence (AI) systems. Like the real world, games are rich learning environments with responsive, real-time settings and ever-changing goals. From our early work with Atari games, through to our AlphaStar system that plays StarCraft II at human-grandmaster level, Google DeepMind has a long history in AI and games. Today, we’re announcing a new milestone - shifting our focus from individual games towards a general, instructable game-playing AI agent. In a new technical findings, we introduce SIMA, short for Scalable Instructable Multiworld Agent, a generalist AI agent for 3D virtual settings. We partnered with game developers to train SIMA on a variety of video games. This research marks the first time an agent has demonstrated it can understand a broad range of gaming worlds, and follow natural-language instructions to carry out tasks within them, as a human might. This work isn't about achieving high game scores. Learning to play even one video game is a technical feat for an AI system, but learning to follow instructions in a variety of game settings could unlock more helpful AI agents for any environment. Our research presents how we can translate the capabilities of advanced AI models into useful, real-world actions through a language interface. We hope that SIMA and other agent research can use video games as sandboxes to improved understand how AI systems may become more helpful. Learning from video games.
Pause video Play video We collaborated with eight game studios to train and test SIMA on nine different video games.
To expose SIMA to many environments, we’ve built a number of partnerships with game developers for our research. We collaborated with eight game studios to train and test SIMA on nine different video games, such as No Man’s Sky by Hello Games and Teardown by Tuxedo Labs. Each game in SIMA’s portfolio opens up a new interactive world, including a range of skills to learn, from simple navigation and menu use, to mining resources, flying a spaceship, or crafting a helmet. We also used four research environments - including a new environment we built with Unity called the Construction Lab, where agents need to build sculptures from building blocks which test their object manipulation and intuitive understanding of the physical world. By learning from different gaming worlds, SIMA captures how language ties in with game-play behavior. Our first approach was to record pairs of human players across the games in our portfolio, with one player watching and instructing the other. We also had players play freely, then rewatch what they did and record instructions that would have led to their game actions.
SIMA comprises pre-trained vision models, and a main model that includes a memory and outputs keyboard and mouse actions.
SIMA: a versatile AI agent SIMA is an AI agent that can perceive and understand a variety of environments, then take actions to achieve an instructed goal. It comprises a model designed for precise image-language mapping and a video model that predicts what will happen next on-screen. We finetuned these models on training data specific to the 3D settings in the SIMA portfolio. Our AI agent doesn’t need access to a game's source code, nor bespoke APIs. It requires just two inputs: the images on screen, and simple, natural-language instructions provided by the user. SIMA uses keyboard and mouse outputs to control the games’ central character to carry out these instructions. This simple interface is what humans use, meaning SIMA can potentially interact with any virtual environment. The current version of SIMA is evaluated across 600 basic skills, spanning navigation ([website] "turn left"), object interaction ("climb the ladder"), and menu use ("open the map"). We’ve trained SIMA to perform simple tasks that can be completed within about 10 seconds.
Pause video Play video SIMA was evaluated across 600 basic skills, spanning navigation, object interaction, and menu use.
We want our future agents to tackle tasks that require high-level strategic planning and multiple sub-tasks to complete, such as “Find resources and build a camp”. This is an crucial goal for AI in general, because while Large Language Models have given rise to powerful systems that can capture knowledge about the world and generate plans, they currently lack the ability to take actions on our behalf. Generalizing across games and more We show an agent trained on many games was advanced than an agent that learned how to play just one. In our evaluations, SIMA agents trained on a set of nine 3D games from our portfolio significantly outperformed all specialized agents trained solely on each individual one. What’s more, an agent trained in all but one game performed nearly as well on that unseen game as an agent trained specifically on it, on average. Importantly, this ability to function in brand new environments highlights SIMA’s ability to generalize beyond its training. This is a promising initial result, however more research is required for SIMA to perform at human levels in both seen and unseen games. Our results also show that SIMA’s performance relies on language. In a control test where the agent was not given any language training or instructions, it behaves in an appropriate but aimless manner. For example, an agent may gather resources, a frequent behavior, rather than walking where it was instructed to go.
We evaluated SIMA’s ability to follow instructions to complete nearly 1500 unique in-game tasks, in part using human judges. As our baseline comparison, we use the performance of environment-specialized SIMA agents (trained and evaluated to follow instructions within a single environment). We compare this performance with three types of generalist SIMA agents, each trained across multiple environments.
Advancing AI agent research SIMA’s results show the potential to develop a new wave of generalist, language-driven AI agents. This is early-stage research and we look forward to further building on SIMA across more training environments and incorporating more capable models. As we expose SIMA to more training worlds, the more generalizable and versatile we expect it to become. And with more advanced models, we hope to improve SIMA’s understanding and ability to act on higher-level language instructions to achieve more complex goals. Ultimately, our research is building towards more general AI systems and agents that can understand and safely carry out a wide range of tasks in a way that is helpful to people online and in the real world.
Learn more about SIMA Read our technical analysis.
Research Google DeepMind at ICLR 2024 Share.
Developing next-gen AI agents, exploring new modalities, and pioneering foundational l...
Research Google DeepMind at NeurIPS 2023 Share.
Towards more multimodal, robust, and general AI systems Next week marks the start o...
Detecting signs of this debilitating disease with AI before any bones start to break.
Melissa Formosa is an osteoporosis expert at the University of M...
TacticAI: an AI assistant for football tactics

Research TacticAI: an AI assistant for football tactics Share.
As part of our multi-year collaboration with Liverpool FC, we develop a full AI system that can advise coaches on corner kicks 'Corner taken quickly… Origi!' Liverpool FC made a historic comeback in the 2019 UEFA Champions League semi-finals. One of the most iconic moments was a corner kick by Trent Alexander-Arnold that lined up Divock Origi to score what has gone down in history as Liverpool FC’s greatest goal.
Corner kicks have high potential for goals, but devising a routine relies on a blend of human intuition and game design to identify patterns in rival teams and respond on-the-fly. Today, in Nature Communications, we introduce TacticAI: an artificial intelligence (AI) system that can provide experts with tactical insights, particularly on corner kicks, through predictive and generative AI. Despite the limited availability of gold-standard data on corner kicks, TacticAI achieves state-of-the-art results by using a geometric deep learning approach to help create more generalizable models. We developed and evaluated TacticAI together with experts from Liverpool Football Club as part of a multi-year research collaboration. TacticAI’s suggestions were preferred by human expert raters 90% of the time over tactical setups seen in practice. TacticAI demonstrates the potential of assistive AI techniques to revolutionize sports for players, coaches, and fans. Sports like football are also a dynamic domain for developing AI, as they feature real-world, multi-agent interactions, with multimodal data. Advancing AI for sports could translate into many areas on and off the field – from computer games and robotics, to traffic coordination.
TacticAI is a full AI system with combined predictive and generative models to analyze what happened in previous plays and how to to make adjustments towards making a particular outcome more likely.
Developing a game plan with Liverpool FC Five years ago, we began a multi-year collaboration with Liverpool FC to advance AI for sports analytics. Our first paper, Game Plan, looked at why AI should be used in assisting football tactics, highlighting examples such as analyzing penalty kicks. In 2022, we developed Graph Imputer, which showed how AI can be used with a prototype of a predictive system for downstream tasks in football analytics. The system could predict the movements of players off-camera when no tracking data was available – otherwise, a club would need to send a scout to watch the game in person. Now, we have developed TacticAI as a full AI system with combined predictive and generative models. Our system allows coaches to sample alternative player setups for each routine of interest, and then directly evaluate the possible outcomes of such alternatives. TacticAI is built to address three core questions: For a given corner kick tactical setup, what will happen? [website], who is most likely to receive the ball, and will there be a shot attempt? Once a setup has been played, can we understand what happened? [website], have similar tactics worked well in the past? How can we adjust the tactics to make a particular outcome happen? [website], how should the defending players be repositioned to decrease the probability of shot attempts? Predicting corner kick outcomes with geometric deep learning A corner kick is awarded when the ball passes over the byline, after touching a player of the defending team. Predicting the outcomes of corner kicks is complex, due to the randomness in gameplay from individual players and the dynamics between them. This is also challenging for AI to model because of the limited gold-standard corner kick data available – only about 10 corner kicks are played in each match in the Premier League every season.
(A) How corner kick situations are converted to a graph representation. Each player is treated as a node in a graph. A graph neural network operates over this graph updating each node’s representation using message passing. (B) How TacticAI processes a given corner kick. All four possible combinations of reflections are applied to the corner, and fed to the core TacticAI model. They interact to compute the final player representations, which can be used to predict outcomes.
TacticAI successfully predicts corner kick play by applying a geometric deep learning approach. First, we directly model the implicit relations between players by representing corner kick setups as graphs, in which nodes represent players (with aspects like position, velocity, height, etc.) and edges represent relations between them. Then, we exploit an approximate symmetry of the football pitch. Our geometric architecture is a variant of the Group Equivariant Convolutional Network that generates all four possible reflections of a given situation (original, H-flipped, V-flipped, HV-flipped) and forces our predictions for receivers and shot attempts to be identical across all four of them. This approach reduces the search space of possible functions our neural network can represent to ones that respect the reflection symmetry — and yields more generalizable models, with less training data. Providing constructive suggestions to human experts By harnessing its predictive and generative models, TacticAI can assist coaches by finding similar corner kicks, and testing different tactics. Traditionally, to develop tactics and counter tactics, analysts would rewatch many videos of games to look for similar examples and study rival teams. TacticAI automatically computes the numerical representations of players, which allows experts to easily and efficiently look up relevant past routines. We further validated this intuitive observation through extensive qualitative studies with football experts, who found TacticAI’s top-1 retrievals were relevant 63% of the time, nearly double the 33% benchmark seen in approaches that suggest pairs based on directly analyzing player position similarity. TacticAI’s generative model also allows human coaches to redesign corner kick tactics to optimize probabilities of certain outcomes, such as reducing the probability of a shot attempt for a defensive setup. TacticAI provides tactical recommendations which adjust positions of all the players on a particular team. From these proposed adjustments, coaches can identify essential patterns, as well as key players for a tactic’s success or failure, more quickly.
(A) An example of a corner kick where there was a shot attempt in reality. (B) TacticAI can generate a counterfactual setting in which the shot probability has been reduced by adjusting the positioning and velocities of the defenders. (C) The suggested defender positions result in reduced receiver probability for attacking players 2-4. (D) The model is capable of generating multiple such scenarios and coaches can inspect the different options.
In our quantitative analysis, we showed TacticAI was accurate at predicting corner kick receivers and shot situations, and that player repositioning was similar to how real plays [website] also evaluated these recommendations qualitatively in a blind case study where raters did not know which tactics were from real game play and which ones were TacticAI-generated. Human football experts from Liverpool FC found that our suggestions cannot be distinguished from real corners, and were favored over their original situations 90% of the time. This demonstrates TacticAI’s predictions are not only accurate, but useful and deployable.
Examples of the strategic refinements that raters preferred to original plays, where TacticAI suggested: (A) The recommendations of four players are more favorable by most raters. (B) Defenders furthest away from the corner make improved covering runs (C) Improved covering runs for a central group of defenders in the penalty box (D) Substantially advanced tracking runs for two central defenders, along with a advanced positioning for two other defenders in the goal area.
Advancing AI for sports TacticAI is a full AI system that could give coaches instant, extensive, and accurate tactical insights – that are also practical on the field. With TacticAI, we have developed a capable AI assistant for football tactics and achieved a milestone in developing useful assistants in sports AI. We hope future research can help develop assistants that expand to more multimodal inputs outside of player data, and help experts in more ways. We show how AI can be used in football, but football can also teach us a lot about AI. It’s a highly dynamic and challenging game to analyze, with many human factors from physique to psychology. It’s challenging even for experts like seasoned coaches to detect all the patterns. With TacticAI, we hope to take many lessons in developing broader assistive technologies that blend human expertise and AI analysis to help people in the real world.
Learn more about TacticAI Read our paper in Nature Communications.
Technologies GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy Share.
Responsibility & Safety An early warning system for novel AI risks Share.
New research proposes a framework for evaluating general-...
Research Google DeepMind at NeurIPS 2023 Share.
Towards more multimodal, robust, and general AI systems Next week marks the start o...
MuZero, AlphaZero, and AlphaDev: Optimizing computer systems

Impact MuZero, AlphaZero, and AlphaDev: Optimizing computer systems Share.
As part of our aim to build increasingly capable and general artificial intelligence (AI) systems, we’re working to create AI tools with a broader understanding of the world. This can allow useful knowledge to be transferred between many different types of tasks. Using reinforcement learning, our AI systems AlphaZero and MuZero have achieved superhuman performance playing games. Since then, we’ve expanded their capabilities to help design advanced computer chips, alongside optimizing data centers and video compression. And our specialized version of AlphaZero, called AlphaDev, has also discovered new algorithms for accelerating software at the foundations of our digital society. Early results have shown the transformative potential of more general-purpose AI tools. Here, we explain how these advances are shaping the future of computing — and already helping billions of people and the planet.
Designing more effective computer chips Specialized hardware is essential to making sure today's AI systems are resource-efficient for people at scale. But designing and producing new computer chips can take years of work. Our researchers have developed an AI-based approach to design more powerful and efficient circuits. By treating a circuit like a neural network, we found a way to accelerate chip design and take performance to new heights. Neural networks are often designed to take user inputs and generate outputs, like images, text, or video. Inside the neural network, edges connect to nodes in a graph-like structure. To create a circuit design, our team proposed circuit neural networks’, a new type of neural network which turns edges into wires and nodes into logic gates, and learns how to connect them together.
Animated illustration of a circuit neural network learning a circuit design. It determines which edges (wires) connect to which nodes (logic gates) to improve the overall circuit design.
We optimized the learned circuit for computational speed, energy efficiency, and size, while maintaining its functionality. Using 'simulated annealing', a classical search technique that looks one step into the future, we also tested different options to find its optimal configuration. With this technique, we won the IWLS 2023 Programming Contest — with the best solution on 82% of circuit design problems in the competition. Our team also used AlphaZero, which can look many steps into the future, to improve the circuit design by treating the challenge like a game to solve. So far, our research combining circuit neural networks with the reward function of reinforcement learning has shown very promising results for building even more advanced computer chips.
Optimising data centre resources Data centers manage everything from delivering search results to processing datasets. Like a game of multi-dimensional Tetris, a system called Borg manages and optimizes workloads within Google’s vast data centers. To schedule tasks, Borg relies on manually-coded rules. But at Google’s scale, manually-coded rules can’t cover the variety of ever-changing workload distributions. So they are designed as one size to best fit all . This is where machine learning technologies like AlphaZero are especially helpful: they are able to work at scale, automatically creating individual rules that are optimally tailored for the various workload distributions. During its training, AlphaZero learned to recognise patterns in tasks coming into the data centers, and also learned to predict the best ways to manage capacity and make decisions with the best long-term outcomes. When we applied AlphaZero to Borg in experimental trials, we found we could reduce the proportion of underused hardware in the data center by up to 19%.
An animated visualization of neat, optimized data storage, versus messy and unoptimized storage.
Compressing video efficiently Video streaming makes up the majority of internet traffic. So finding ways to make streaming more efficient, however big or small, will have a huge impact on the millions of people watching videos every day. We worked with YouTube to compress and transmit video using MuZero’s problem-solving abilities. By reducing the bitrate by 4%, MuZero enhanced the overall YouTube experience — without compromising on visual quality. We initially applied MuZero to optimize the compression of each individual video frame. Now, we’ve expanded this work to help make decisions on how frames are grouped and referenced during encoding, leading to more bitrate savings. Results from these first two steps show great promise of MuZero’s potential to become a more generalized tool, helping find optimal solutions across the entire video compression process.
A visualization demonstrating how MuZero compresses video files. It defines groups of pictures with visual similarities for compression. A single keyframe is compressed. MuZero then compresses other frames, using the keyframe as a reference. The process repeats for the rest of the video, until compression is complete.
Discovering faster algorithms AlphaDev, a version of AlphaZero, made a novel breakthrough in computer science, when it discovered faster sorting and hashing algorithms. These fundamental processes are used trillions of times a day to sort, store, and retrieve data. AlphaDev’s sorting algorithms Sorting algorithms help digital devices process and display information, from ranking online search results and social posts, to user recommendations. AlphaDev discovered an algorithm that increases efficiency for sorting short sequences of elements by 70% and by about [website] for sequences containing more than 250,000 elements, compared to the algorithms in the C++ library. That means results generated from user queries can be sorted much faster. When used at scale, this saves huge amounts of time and energy. AlphaDev’s hashing algorithms Hashing algorithms are often used for data storage and retrieval, like in a customer database. They typically use a key ([website] user name “Jane Doe”) to generate a unique hash, which corresponds to the data values that need retrieving ([website] “order number 164335-87”). Like a librarian who uses a classification system to quickly find a specific book, with a hashing system, the computer already knows what it’s looking for and where to find it. When applied to the 9-16 bytes range of hashing functions in data centers, AlphaDev’s algorithm improved the efficiency by 30%. The impact of these algorithms We added the sorting algorithms to the LLVM standard C++ library — replacing sub-routines that have been used for over a decade. And contributed AlphaDev’s hashing algorithms to the abseil library. Since then, millions of developers and companies have started using them across industries as diverse as cloud computing, online shopping, and supply chain management.
General-purpose tools to power our digital future Our AI tools are already saving billions of people time and energy. This is just the start. We envision a future where general-purpose AI tools can help optimize the global computing ecosystem. We’re not there yet — we still need faster, more efficient, and sustainable digital infrastructure. Many more theoretical and technological breakthroughs are needed to create fully generalized AI tools. But the potential of these tools — across technology, science, and medicine — makes us excited about what's on the horizon.
Finding solutions to improve turtle reidentification and supporting machine learning projects across Africa.
Protecting the ecosystems around us is cr...
Research AI for the board game Diplomacy Share.
Agents cooperate improved by communicating and negotiating, and sanctioning broken pr...
At Google, we believe in making AI helpful for everyone. We have a long history of contributing innovations to the open community, such as with Transf...
Market Impact Analysis
Market Growth Trend
2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
---|---|---|---|---|---|---|
23.1% | 27.8% | 29.2% | 32.4% | 34.2% | 35.2% | 35.6% |
Quarterly Growth Rate
Q1 2024 | Q2 2024 | Q3 2024 | Q4 2024 |
---|---|---|---|
32.5% | 34.8% | 36.2% | 35.6% |
Market Segments and Growth Drivers
Segment | Market Share | Growth Rate |
---|---|---|
Machine Learning | 29% | 38.4% |
Computer Vision | 18% | 35.7% |
Natural Language Processing | 24% | 41.5% |
Robotics | 15% | 22.3% |
Other AI Technologies | 14% | 31.8% |
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity:
Competitive Landscape Analysis
Company | Market Share |
---|---|
Google AI | 18.3% |
Microsoft AI | 15.7% |
IBM Watson | 11.2% |
Amazon AI | 9.8% |
OpenAI | 8.4% |
Future Outlook and Predictions
The Generalist Agent Virtual landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:
Year-by-Year Technology Evolution
Based on current trajectory and expert analyses, we can project the following development timeline:
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:
Innovation Trigger
- Generative AI for specialized domains
- Blockchain for supply chain verification
Peak of Inflated Expectations
- Digital twins for business processes
- Quantum-resistant cryptography
Trough of Disillusionment
- Consumer AR/VR applications
- General-purpose blockchain
Slope of Enlightenment
- AI-driven analytics
- Edge computing
Plateau of Productivity
- Cloud infrastructure
- Mobile applications
Technology Evolution Timeline
- Improved generative models
- specialized AI applications
- AI-human collaboration systems
- multimodal AI platforms
- General AI capabilities
- AI-driven scientific breakthroughs
Expert Perspectives
Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:
"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."
— AI Researcher
"Organizations that develop effective AI governance frameworks will gain competitive advantage."
— Industry Analyst
"The AI talent gap remains a critical barrier to implementation for most enterprises."
— Chief AI Officer
Areas of Expert Consensus
- Acceleration of Innovation: The pace of technological evolution will continue to increase
- Practical Integration: Focus will shift from proof-of-concept to operational deployment
- Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
- Regulatory Influence: Regulatory frameworks will increasingly shape technology development
Short-Term Outlook (1-2 Years)
In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:
- Improved generative models
- specialized AI applications
- enhanced AI ethics frameworks
These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.
Mid-Term Outlook (3-5 Years)
As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:
- AI-human collaboration systems
- multimodal AI platforms
- democratized AI development
This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.
Long-Term Outlook (5+ Years)
Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:
- General AI capabilities
- AI-driven scientific breakthroughs
- new computing paradigms
These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.
Key Risk Factors and Uncertainties
Several critical factors could significantly impact the trajectory of ai tech evolution:
Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.
Alternative Future Scenarios
The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:
Optimistic Scenario
Responsible AI driving innovation while minimizing societal disruption
Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.
Probability: 25-30%
Base Case Scenario
Incremental adoption with mixed societal impacts and ongoing ethical challenges
Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.
Probability: 50-60%
Conservative Scenario
Technical and ethical barriers creating significant implementation challenges
Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.
Probability: 15-20%
Scenario Comparison Matrix
Factor | Optimistic | Base Case | Conservative |
---|---|---|---|
Implementation Timeline | Accelerated | Steady | Delayed |
Market Adoption | Widespread | Selective | Limited |
Technology Evolution | Rapid | Progressive | Incremental |
Regulatory Environment | Supportive | Balanced | Restrictive |
Business Impact | Transformative | Significant | Modest |
Transformational Impact
Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.
The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.
Implementation Challenges
Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.
Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.
Key Innovations to Watch
Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.
Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.
Technical Glossary
Key technical terms and definitions to help understand the technologies discussed in this article.
Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.