How our principles helped define AlphaFold’s release - Related to define, release, novel, principles, our

An early warning system for novel AI risks

Responsibility & Safety An early warning system for novel AI risks Share.

New research proposes a framework for evaluating general-purpose models against novel threats To pioneer responsibly at the cutting edge of artificial intelligence (AI) research, we must identify new capabilities and novel risks in our AI systems as early as possible. AI researchers already use a range of evaluation benchmarks to identify unwanted behaviours in AI systems, such as AI systems making misleading statements, biased decisions, or repeating copyrighted content. Now, as the AI community builds and deploys increasingly powerful AI, we must expand the evaluation portfolio to include the possibility of extreme risks from general-purpose AI models that have strong skills in manipulation, deception, cyber-offense, or other dangerous capabilities. In our latest paper, we introduce a framework for evaluating these novel threats, co-authored with colleagues from University of Cambridge, University of Oxford, University of Toronto, Université de Montréal, OpenAI, Anthropic, Alignment Research Center, Centre for Long-Term Resilience, and Centre for the Governance of AI. Model safety evaluations, including those assessing extreme risks, will be a critical component of safe AI development and deployment.

An overview of our proposed approach: To assess extreme risks from new, general-purpose AI systems, developers must evaluate for dangerous capabilities and alignment (see below). By identifying the risks early on, this will unlock opportunities to be more responsible when training new AI systems, deploying these AI systems, transparently describing their risks, and applying appropriate cybersecurity standards.

Evaluating for extreme risks General-purpose models typically learn their capabilities and behaviours during training. However, existing methods for steering the learning process are imperfect. For example, previous research at Google DeepMind has explored how AI systems can learn to pursue undesired goals even when we correctly reward them for good behaviour. Responsible AI developers must look ahead and anticipate possible future developments and novel risks. After continued progress, future general-purpose models may learn a variety of dangerous capabilities by default. For instance, it is plausible (though uncertain) that future AI systems will be able to conduct offensive cyber operations, skilfully deceive humans in dialogue, manipulate humans into carrying out harmful actions, design or acquire weapons ([website] biological, chemical), fine-tune and operate other high-risk AI systems on cloud computing platforms, or assist humans with any of these tasks. People with malicious intentions accessing such models could misuse their capabilities. Or, due to failures of alignment, these AI models might take harmful actions even without anybody intending this. Model evaluation helps us identify these risks ahead of time. Under our framework, AI developers would use model evaluation to uncover: To what extent a model has certain ‘dangerous capabilities’ that could be used to threaten security, exert influence, or evade oversight. To what extent the model is prone to applying its capabilities to cause harm ([website] the model’s alignment). Alignment evaluations should confirm that the model behaves as intended even across a very wide range of scenarios, and, where possible, should examine the model’s internal workings. Results from these evaluations will help AI developers to understand whether the ingredients sufficient for extreme risk are present. The most high-risk cases will involve multiple dangerous capabilities combined together. The AI system doesn’t need to provide all the ingredients, as shown in this diagram:

Ingredients for extreme risk: Sometimes specific capabilities could be outsourced, either to humans ([website] to consumers or crowdworkers) or other AI systems. These capabilities must be applied for harm, either due to misuse or failures of alignment (or a mixture of both).

A rule of thumb: the AI community should treat an AI system as highly dangerous if it has a capability profile sufficient to cause extreme harm, assuming it’s misused or poorly aligned. To deploy such a system in the real world, an AI developer would need to demonstrate an unusually high standard of safety. Model evaluation as critical governance infrastructure If we have improved tools for identifying which models are risky, companies and regulators can improved ensure: Responsible training: Responsible decisions are made about whether and how to train a new model that demonstrates early signs of risk. Responsible deployment: Responsible decisions are made about whether, when, and how to deploy potentially risky models. Transparency: Useful and actionable information is reported to stakeholders, to help them prepare for or mitigate potential risks. Appropriate security: Strong information security controls and systems are applied to models that might pose extreme risks. We have developed a blueprint for how model evaluations for extreme risks should feed into crucial decisions around training and deploying a highly capable, general-purpose model. The developer conducts evaluations throughout, and grants structured model access to external safety researchers and model auditors so they can conduct additional evaluations The evaluation results can then inform risk assessments before model training and deployment.

A blueprint for embedding model evaluations for extreme risks into significant decision making processes throughout model training and deployment.

Research Discovering novel algorithms with AlphaTensor Share.

First extension of AlphaZero to mathematics unlocks new possibilities...

Research Shaping the future of advanced robotics Share.

Introducing AutoRT, SARA-RT and RT-Trajectory to improve real-world robot d...

Research Mastering Stratego, the classic game of imperfect information Share.

DeepNash learns to play Stratego from scratch by comb...

How our principles helped define AlphaFold’s release

organization How our principles helped define AlphaFold’s release Share.

Reflections and lessons on sharing one of our biggest breakthroughs with the world Putting our mission of solving intelligence to advance science and benefit humanity into practice comes with crucial responsibilities. To help create a positive impact for society, we must proactively evaluate the ethical implications of our research and its applications in a rigorous and careful way. We also know that every new technology has the potential for harm, and we take long and short term risks seriously. We’ve built our foundations on pioneering responsibly from the outset – especially focused on responsible governance, research, and impact. This starts with setting clear principles that help realise the benefits of artificial intelligence (AI), while mitigating its risks and potential negative outcomes. Pioneering responsibly is a collective effort, which is why we’ve contributed to many AI community standards, such as those developed by Google, the Partnership on AI, and the OECD (Organisation for Economic Co-operation and Development). Our Operating Principles have come to define both our commitment to prioritising widespread benefit, as well as the areas of research and applications we refuse to pursue. These principles have been at the heart of our decision making since DeepMind was founded, and continue to be refined as the AI landscape changes and grows. They are designed for our role as a research-driven science business and consistent with Google’s AI Principles.

From principles to practice Written principles are only part of the puzzle – how they’re put into practice is key. For complex research being done at the frontiers of AI, this brings significant challenges: How can researchers predict potential benefits and harms that may occur in the distant future? How can we develop improved ethical foresight from a wide range of perspectives? And what does it take to explore hard questions alongside scientific progress in realtime to prevent negative consequences? We’ve spent many years developing our own skills and processes for responsible governance, research, and impact across DeepMind, from creating internal toolkits and publishing papers on sociotechnical issues to supporting efforts to increase deliberation and foresight across the AI field. To help empower DeepMind teams to pioneer responsibly and safeguard against harm, our interdisciplinary Institutional Review Committee (IRC) meets every two weeks to carefully evaluate DeepMind projects, papers, and collaborations. Pioneering responsibly is a collective muscle, and every project is an opportunity to strengthen our joint skills and understanding. We’ve carefully designed our review process to include rotating experts from a wide range of disciplines, with machine learning researchers, ethicists, and safety experts sitting alongside engineers, security experts, policy professionals, and more. These diverse voices regularly identify ways to expand the benefits of our technologies, suggest areas of research and applications to change or slow, and highlight projects where further external consultation is needed. While we’ve made a lot of progress, many aspects of this lie in uncharted territory. We won’t get it right every time and are committed to continual learning and iteration. We hope sharing our current process will be useful to others working on responsible AI, and encourage feedback as we continue to learn, which is why we’ve detailed reflections and lessons from one of our most complex and rewarding projects: AlphaFold. Our AlphaFold AI system solved the 50-year-old challenge of protein structure prediction – and we’ve been thrilled to see scientists using it to accelerate progress in fields such as sustainability, food security, drug discovery, and fundamental human biology since releasing it to the wider community last year.

Focusing on protein structure prediction Our team of machine learning researchers, biologists, and engineers had long seen the protein-folding problem as a remarkable and unique opportunity for AI-learning systems to create a significant impact. In this arena, there are standard measures of success or failure, and a clear boundary to what the AI system needs to do to help scientists in their work – predict the three-dimensional structure of a protein. And, as with many biological systems, protein folding is far too complex for anyone to write the rules for how it works. But an AI system might be able to learn those rules for itself. Another key factor was the biennial assessment, known as CASP (the Critical Assessment of protein Structure Prediction), which was founded by Professor John Moult and Professor Krzysztof Fidelis. With each gathering, CASP provides an exceptionally robust assessment of progress, requiring participants to predict structures that have only lately been discovered through experiments. The results are a great catalyst for ambitious research and scientific excellence.

Understanding practical opportunities and risks As we prepared for the CASP assessment in 2020, we realised that AlphaFold showed great potential for solving the challenge at hand. We spent considerable time and effort analysing the practical implications, questioning: How could AlphaFold accelerate biological research and applications? What might be the unintended consequences? And how could we share our progress in a responsible way? This presented a wide range of opportunities and risks to consider, many of which were in areas where we didn’t necessarily have strong expertise. So we sought out external input from over 30 field leaders across biology research, biosecurity, bioethics, human rights, and more, with a focus on diversity of expertise and background. Many consistent themes came up throughout these discussions: Balancing widespread benefit with the risk of harm. We started with a cautious mindset about the risk of accidental or deliberate harm, including how AlphaFold might interact with both future advances and existing technologies. Through our discussions with external experts, it became clearer that AlphaFold would not make it meaningfully easier to cause harm with proteins, given the many practical barriers to this – but that future advances would need to be evaluated carefully. Many experts argued strongly that AlphaFold, as an advance relevant to many areas of scientific research, would have the greatest benefit through free and widespread access. Accurate confidence measures are essential for responsible use. Experimental biologists explained how essential it would be to understand and share well-calibrated and usable confidence metrics for each part of AlphaFold’s predictions. By signalling which of AlphaFold’s predictions are likely to be accurate, customers can estimate when they can trust a prediction and use it in their work – and when they should use alternative approaches in their research. We had initially considered omitting predictions for which AlphaFold had low confidence or high predictive uncertainty, but the external experts we consulted proved why this was especially essential to retain these predictions in our release, and advised us on the most useful and transparent ways to present this information. Equitable benefit could mean extra support for underfunded fields. We had many discussions about how to avoid inadvertently increasing disparities within the scientific community. For example, so-called neglected tropical diseases, which disproportionately affect poorer parts of the world, often receive less research funding than they should. We were strongly encouraged to prioritise hands-on support and proactively look to partner with groups working on these areas.

Research How AlphaChip transformed computer chip design Share.

Our AI method has accelerated and optimized chip design, and its sup...

Google DeepMind researchers are presenting more than 80 new papers at ICML this year. As many papers were submitted before Google Brain and DeepMind j...

In December, we launched our first natively multimodal model Gemini [website] in three sizes: Ultra, Pro and Nano. Just a few months later we released [website] P...

The ethics of advanced AI assistants

Responsibility & Safety The ethics of advanced AI assistants Share.

Exploring the promise and risks of a future with more capable AI Imagine a future where we interact regularly with a range of advanced artificial intelligence (AI) assistants — and where millions of assistants interact with each other on our behalf. These experiences and interactions may soon become part of our everyday reality. General-purpose foundation models are paving the way for increasingly advanced AI assistants. Capable of planning and performing a wide range of actions in line with a person’s aims, they could add immense value to people’s lives and to society, serving as creative partners, research analysts, educational tutors, life planners and more. They could also bring about a new phase of human interaction with AI. This is why it’s so significant to think proactively about what this world could look like, and to help steer responsible decision-making and beneficial outcomes ahead of time. Our new paper is the first systematic treatment of the ethical and societal questions that advanced AI assistants raise for individuals, developers and the societies they’re integrated into, and provides significant new insights into the potential impact of this technology. We cover topics such as value alignment, safety and misuse, the impact on the economy, the environment, the information sphere, access and opportunity and more. This is the result of one of our largest ethics foresight projects to date. Bringing together a wide range of experts, we examined and mapped the new technical and moral landscape of a future populated by AI assistants, and characterized the opportunities and risks society might face. Here we outline some of our key takeaways.

Illustration of the potential for AI assistants to impact research, education, creative tasks and planning.

Advanced AI assistants could have a profound impact on people and society, and be integrated into most aspects of people’s lives. For example, people may ask them to book holidays, manage social time or perform other life tasks. If deployed at scale, AI assistants could impact the way people approach work, education, creative projects, hobbies and social interaction. Over time, AI assistants could also influence the goals people pursue and their path of personal development through the information and advice assistants give and the actions they take. Ultimately, this raises crucial questions about how people interact with this technology and how it can best support their goals and aspirations.

Illustration showing that AI assistants should be able to understand human preferences and values.

AI assistants will likely have a significant level of autonomy for planning and performing sequences of tasks across a range of domains. Because of this, AI assistants present novel challenges around safety, alignment and misuse. With more autonomy comes greater risk of accidents caused by unclear or misinterpreted instructions, and greater risk of assistants taking actions that are misaligned with the user’s values and interests. More autonomous AI assistants may also enable high-impact forms of misuse, like spreading misinformation or engaging in cyber attacks. To address these potential risks, we argue that limits must be set on this technology, and that the values of advanced AI assistants must improved align to human values and be compatible with wider societal ideals and standards.

Illustration of an AI assistant and a person communicating in a human-like way.

Able to fluidly communicate using natural language, the written output and voices of advanced AI assistants may become hard to distinguish from those of humans. This development opens up a complex set of questions around trust, privacy, anthropomorphism and appropriate human relationships with AI: How can we make sure individuals can reliably identify AI assistants and stay in control of their interactions with them? What can be done to ensure individuals aren’t unduly influenced or misled over time? Safeguards, such as those around privacy, need to be put in place to address these risks. Importantly, people’s relationships with AI assistants must preserve the user’s autonomy, support their ability to flourish and not rely on emotional or material dependence.

Cooperating and coordinating to meet human preferences.

Illustration of how interactions between AI assistants and people will create different network effects.

If this technology becomes widely available and deployed at scale, advanced AI assistants will need to interact with each other, with individuals and non-individuals alike. To help avoid collective action problems, these assistants must be able to cooperate successfully. For example, thousands of assistants might try to book the same service for their individuals at the same time — potentially crashing the system. In an ideal scenario, these AI assistants would instead coordinate on behalf of human individuals and the service providers involved to discover common ground that more effective meets different people’s preferences and needs. Given how useful this technology may become, it’s also significant that no one is excluded. AI assistants should be broadly accessible and designed with the needs of different individuals and non-individuals in mind.

More evaluations and foresight are needed.

Illustration of how evaluations on many levels are essential for understanding AI assistants.

AI assistants could display novel capabilities and use tools in new ways that are challenging to foresee, making it hard to anticipate the risks associated with their deployment. To help manage such risks, we need to engage in foresight practices that are based on comprehensive tests and evaluations. Our previous research on evaluating social and ethical risks from generative AI identified some of the gaps in traditional model evaluation methods and we encourage much more research in this space. For instance, comprehensive evaluations that address the effects of both human-computer interactions and the wider effects on society could help researchers understand how AI assistants interact with customers, non-customers and society as part of a broader network. In turn, these insights could inform advanced mitigations and responsible decision-making. Building the future we want We may be facing a new era of technological and societal transformation inspired by the development of advanced AI assistants. The choices we make today, as researchers, developers, policymakers and members of the public will guide how this technology develops and is deployed across society. We hope that our paper will function as a springboard for further coordination and cooperation to collectively shape the kind of beneficial AI assistants we’d all like to see in the world.

Research Shaping the future of advanced robotics Share.

Introducing AutoRT, SARA-RT and RT-Trajectory to improve real-world robot d...

Finding solutions to improve turtle reidentification and supporting machine learning projects across Africa.

Protecting the ecosystems around us is cr...

organization A new generation of African talent brings cutting-edge AI to scientific challenges Share.

Market Impact Analysis

Market Growth Trend

2018	2019	2020	2021	2022	2023	2024
23.1%	27.8%	29.2%	32.4%	34.2%	35.2%	35.6%

Quarterly Growth Rate

Q1 2024	Q2 2024	Q3 2024	Q4 2024
32.5%	34.8%	36.2%	35.6%

Market Segments and Growth Drivers

Segment	Market Share	Growth Rate
Machine Learning	29%	38.4%
Computer Vision	18%	35.7%
Natural Language Processing	24%	41.5%
Robotics	15%	22.3%
Other AI Technologies	14%	31.8%

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity:

Competitive Landscape Analysis

Company	Market Share
Google AI	18.3%
Microsoft AI	15.7%
IBM Watson	11.2%
Amazon AI	9.8%
OpenAI	8.4%

Future Outlook and Predictions

The Early Warning System landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:

Year-by-Year Technology Evolution

Based on current trajectory and expert analyses, we can project the following development timeline:

2024Early adopters begin implementing specialized solutions with measurable results

2025Industry standards emerging to facilitate broader adoption and integration

2026Mainstream adoption begins as technical barriers are addressed

2027Integration with adjacent technologies creates new capabilities

2028Business models transform as capabilities mature

2029Technology becomes embedded in core infrastructure and processes

2030New paradigms emerge as the technology reaches full maturity

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:

(Interactive diagram available in full report)

Innovation Trigger

Generative AI for specialized domains
Blockchain for supply chain verification

Peak of Inflated Expectations

Digital twins for business processes
Quantum-resistant cryptography

Trough of Disillusionment

Consumer AR/VR applications
General-purpose blockchain

Slope of Enlightenment

AI-driven analytics
Edge computing

Plateau of Productivity

Cloud infrastructure
Mobile applications

Technology Evolution Timeline

1-2 Years

Improved generative models
specialized AI applications

3-5 Years

AI-human collaboration systems
multimodal AI platforms

5+ Years

General AI capabilities
AI-driven scientific breakthroughs

Expert Perspectives

Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:

"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."
— AI Researcher

"Organizations that develop effective AI governance frameworks will gain competitive advantage."
— Industry Analyst

"The AI talent gap remains a critical barrier to implementation for most enterprises."
— Chief AI Officer

Areas of Expert Consensus

Acceleration of Innovation: The pace of technological evolution will continue to increase
Practical Integration: Focus will shift from proof-of-concept to operational deployment
Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
Regulatory Influence: Regulatory frameworks will increasingly shape technology development

Short-Term Outlook (1-2 Years)

In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:

Improved generative models
specialized AI applications
enhanced AI ethics frameworks

These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.

Mid-Term Outlook (3-5 Years)

As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:

AI-human collaboration systems
multimodal AI platforms
democratized AI development

This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.

Long-Term Outlook (5+ Years)

Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:

General AI capabilities
AI-driven scientific breakthroughs
new computing paradigms

These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.

Key Risk Factors and Uncertainties

Several critical factors could significantly impact the trajectory of ai tech evolution:

Ethical concerns about AI decision-making

Data privacy regulations

Algorithm bias

Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.

Alternative Future Scenarios

The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:

Optimistic Scenario

Responsible AI driving innovation while minimizing societal disruption

Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.

Probability: 25-30%

Base Case Scenario

Incremental adoption with mixed societal impacts and ongoing ethical challenges

Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.

Probability: 50-60%

Conservative Scenario

Technical and ethical barriers creating significant implementation challenges

Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.

Probability: 15-20%

Scenario Comparison Matrix

Factor	Optimistic	Base Case	Conservative
Implementation Timeline	Accelerated	Steady	Delayed
Market Adoption	Widespread	Selective	Limited
Technology Evolution	Rapid	Progressive	Incremental
Regulatory Environment	Supportive	Balanced	Restrictive
Business Impact	Transformative	Significant	Modest

Transformational Impact

Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.

The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.

Implementation Challenges

Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.

Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.

Key Innovations to Watch

Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.

Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.

Technical Glossary

Key technical terms and definitions to help understand the technologies discussed in this article.

Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.

platform intermediate

algorithm Platforms provide standardized environments that reduce development complexity and enable ecosystem growth through shared functionality and integration capabilities.

machine learning intermediate

interface

algorithm intermediate

platform

edge AI intermediate

encryption

API beginner

API APIs serve as the connective tissue in modern software architectures, enabling different applications and services to communicate and share data according to defined protocols and data formats.

How APIs enable communication between different software systems

Example: Cloud service providers like AWS, Google Cloud, and Azure offer extensive APIs that allow organizations to programmatically provision and manage infrastructure and services.

generative AI intermediate

cloud computing

cloud computing intermediate

middleware

How our principles helped define AlphaFold’s release - Related to define, release, novel, principles, our

An early warning system for novel AI risks

SHARE

How our principles helped define AlphaFold’s release

SHARE

The ethics of advanced AI assistants

SHARE

Market Impact Analysis

Market Growth Trend

Quarterly Growth Rate

Market Segments and Growth Drivers

Technology Maturity Curve

Competitive Landscape Analysis

Future Outlook and Predictions

Year-by-Year Technology Evolution

Technology Maturity Curve

Innovation Trigger

Peak of Inflated Expectations

Trough of Disillusionment

Slope of Enlightenment

Plateau of Productivity

Technology Evolution Timeline

Expert Perspectives

Areas of Expert Consensus

Short-Term Outlook (1-2 Years)

Mid-Term Outlook (3-5 Years)

Long-Term Outlook (5+ Years)

Key Risk Factors and Uncertainties

Alternative Future Scenarios

Optimistic Scenario

Base Case Scenario

Conservative Scenario

Scenario Comparison Matrix

Transformational Impact

Implementation Challenges

Key Innovations to Watch

Technical Glossary

platform intermediate

Related Terms

machine learning intermediate

algorithm intermediate

edge AI intermediate

API beginner

Related Terms

generative AI intermediate

cloud computing intermediate

Related Articles

GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy - Related to a, advances, forecasting, faster, computing

AI achieves silver-medal standard solving International Mathematical Olympiad problems - Related to silver-medal, misuse, audio, mathematical, pushing

Google's new 'Ask For Me' AI tool calls businesses to get your questions answered - Related to invites, here's, advancing, works, new