Technology News from Around the World, Instantly on Oracnoos!

State-of-the-art video and image generation with Veo 2 and Imagen 3 - Related to robot, a, robotic, types, different

RoboCat: A self-improving robotic agent

RoboCat: A self-improving robotic agent

New foundation agent learns to operate different robotic arms, solves tasks from as few as 100 demonstrations, and improves from self-generated data.

Robots are quickly becoming part of our everyday lives, but they’re often only programmed to perform specific tasks well. While harnessing recent advances in AI could lead to robots that could help in many more ways, progress in building general-purpose robots is slower in part because of the time needed to collect real-world training data.

Our latest paper introduces a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.

Previous research has explored how to develop robots that can learn to multi-task at scale and combine the understanding of language models with the real-world capabilities of a helper robot. RoboCat is the first agent to solve and adapt to multiple tasks and do so across different, real robots.

RoboCat learns much faster than other state-of-the-art models. It can pick up a new task with as few as 100 demonstrations because it draws from a large and diverse dataset. This capability will help accelerate robotics research, as it reduces the need for human-supervised training, and is an essential step towards creating a general-purpose robot.

"The Thinking Part" by Daniel Warfield using MidJourney. All images by the author unless otherwise specified. Article originally made available on Int...

Responsibility & Safety The ethics of advanced AI assistants Share.

Exploring the promise and risks of a future with more capable A...

Today, we’re releasing two updated production-ready Gemini models: [website] and [website] along with: >50% reduced price on [website] P...

Scaling up learning across many different robot types

Scaling up learning across many different robot types

Research Scaling up learning across many different robot types Share.

Together with partners from 33 academic labs, we have pooled data from 22 different robot types to create the Open X-Embodiment dataset and RT-X model Robots are great specialists, but poor generalists. Typically, you have to train a model for each task, robot, and environment. Changing a single variable often requires starting from scratch. But what if we could combine the knowledge across robotics and create a way to train a general-purpose robot? Today, we are launching a new set of resources for general-purpose robotics learning across different robot types, or embodiments. Together with partners from 33 academic labs we have pooled data from 22 different robot types to create the Open X-Embodiment dataset. We also release RT-1-X, a robotics transformer (RT) model derived from RT-1 and trained on our dataset, that presents skills transfer across many robot embodiments. In this work, we show training a single model on data from multiple embodiments leads to significantly enhanced performance across many robots than those trained on data from individual embodiments. We tested our RT-1-X model in five different research labs, demonstrating 50% success rate improvement on average across five different commonly used robots compared to methods developed independently and specifically for each robot. We also showed that training our visual language action model, RT-2, on data from multiple embodiments tripled its performance on real-world robotic skills. We developed these tools to collectively advance cross-embodiment research in the robotics community. The Open X-Embodiment dataset and RT-1-X model checkpoint are now available for the benefit of the broader research community, thanks to the work of robotics labs around the world that shared data and helped evaluate our model in a commitment to openly and responsibly developing this technology. We believe these tools will transform the way robots are trained and accelerate this field of research.

Open X-Embodiment Dataset: Collecting data to train AI robots Datasets, and the models trained on them, have played a critical role in advancing AI. Just as ImageNet propelled computer vision research, we believe Open X-Embodiment can do the same to advance robotics. Building a dataset of diverse robot demonstrations is the key step to training a generalist model that can control many different types of robots, follow diverse instructions, perform basic reasoning about complex tasks, and generalize effectively. However, collecting such a dataset is too resource-intensive for any single lab. To develop the Open X-Embodiment dataset, we partnered with academic research labs across more than 20 institutions to gather data from 22 robot embodiments, demonstrating more than 500 skills and 150,000 tasks across more than 1 million episodes. This dataset is the most comprehensive robotics dataset of its kind.

Samples from the Open X-Embodiment Dataset demonstrating more than 500 skills and 150,000 tasks.

The Open X-Embodiment dataset combines data across embodiments, datasets and skills.

RT-X: A general-purpose robotics model RT-X builds on two of our robotics transformer models. We trained RT-1-X using RT-1, our model for real-world robotic control at scale, and we trained RT-2-X on RT-2, our vision-language-action (VLA) model that learns from both web and robotics data. Through this, we show that given the same model architecture, RT-1-X and RT-2-X are able to achieve greater performance thanks to the much more diverse, cross-embodiment data they are trained on. We also show that they improve on models trained in specific domains, and exhibit more effective generalization and new capabilities. To evaluate RT-1-X in partner academic universities, we compared how it performed against models developed for their specific task, like opening a door, on corresponding dataset. RT-1-X trained with the Open X-Embodiment dataset outperformed the original model by 50% on average.

RT-1-X mean success rate is 50% higher than the corresponding Original method.

Videos of RT-1-X evaluations run at different partner universities.

Emergent skills in RT-X To investigate the transfer of knowledge across robots, we conduct experiments with our helper robot on tasks that involve objects and skills that are not present in the RT-2 dataset but exist in another dataset for a different robot. Specifically, RT-2-X was three times as successful as our previous best model, RT-2, for emergent skills. Our results suggest that co-training with data from other platforms imbues RT-2-X with additional skills that were not present in the original dataset, enabling it to perform novel tasks.

RT-2-X demonstrates understanding of spatial relationships between objects.

RT-2-X demonstrates skills that the RT-2 model was not capable of previously, including advanced spatial understanding. For example, if we ask the robot to "move apple near cloth" instead of "move apple on cloth" the trajectories are quite different. By changing the preposition from "near" to "on", we can modulate the actions that robot takes. RT-2-X exhibits that combining data from other robots into the training improves the range of tasks that can be performed even by a robot that already has large amounts of data available – but only when utilizing a sufficiently high-capacity architecture.

RT-2-X (55B): one of the biggest models to date performing unseen tasks in an academic lab.

Responsibly advancing robotics research Robotics research is at an exciting, but early, juncture. New research exhibits the potential to develop more useful helper robots by scaling learning with more diverse data, and enhanced models. Working collaboratively with labs around the world and sharing resources is crucial to advancing robotics research in an open and responsible way. We hope that open sourcing the data and providing safe but limited models will reduce barriers and accelerate research. The future of robotics relies on enabling robots to learn from each other, and most importantly, allowing researchers to learn from one another. This work demonstrates that models that generalize across embodiments are possible, with dramatic improvements in performance both with robots here at Google DeepMind and on robots at different universities around the world. Future research could explore how to combine these advances with the self-improvement property of RoboCat to enable the models to improve with their own experience. Another future direction could be to further probe how different dataset mixtures might affect cross-embodiment generalization and how the improved generalization materializes. Partner with us: [website].

Since founding Towards Data Science in 2016, we’ve built the largest publication on Medium with a dedicated community of readers and contributors focu...

How we’re applying the latest AI developments to help fight climate change and build a more sustainable, low-carbon world.

During the telecommunication boom, Claude Shannon, in his seminal 1948 paper¹, posed a question that would revoluti...

State-of-the-art video and image generation with Veo 2 and Imagen 3

State-of-the-art video and image generation with Veo 2 and Imagen 3

Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise individuals are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories. Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies.

Today we're introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and our newest Labs experiment, Whisk.

For years, we’ve been investing deeply in AI as the single best way to improve Search and all of our products. We’re excited by the progress, for exam...

Real-time communication is everywhere – live chatbots, data streams, or instant messaging. WebSockets are a powerful ena...

AI is revolutionizing the landscape of scientific research, enabling advancements at a pace that was once unimaginable — from accelerating drug discov...

Market Impact Analysis

Market Growth Trend

2018201920202021202220232024
23.1%27.8%29.2%32.4%34.2%35.2%35.6%
23.1%27.8%29.2%32.4%34.2%35.2%35.6% 2018201920202021202220232024

Quarterly Growth Rate

Q1 2024 Q2 2024 Q3 2024 Q4 2024
32.5% 34.8% 36.2% 35.6%
32.5% Q1 34.8% Q2 36.2% Q3 35.6% Q4

Market Segments and Growth Drivers

Segment Market Share Growth Rate
Machine Learning29%38.4%
Computer Vision18%35.7%
Natural Language Processing24%41.5%
Robotics15%22.3%
Other AI Technologies14%31.8%
Machine Learning29.0%Computer Vision18.0%Natural Language Processing24.0%Robotics15.0%Other AI Technologies14.0%

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity:

Innovation Trigger Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity AI/ML Blockchain VR/AR Cloud Mobile

Competitive Landscape Analysis

Company Market Share
Google AI18.3%
Microsoft AI15.7%
IBM Watson11.2%
Amazon AI9.8%
OpenAI8.4%

Future Outlook and Predictions

The Robocat Self Improving landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:

Year-by-Year Technology Evolution

Based on current trajectory and expert analyses, we can project the following development timeline:

2024Early adopters begin implementing specialized solutions with measurable results
2025Industry standards emerging to facilitate broader adoption and integration
2026Mainstream adoption begins as technical barriers are addressed
2027Integration with adjacent technologies creates new capabilities
2028Business models transform as capabilities mature
2029Technology becomes embedded in core infrastructure and processes
2030New paradigms emerge as the technology reaches full maturity

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:

Time / Development Stage Adoption / Maturity Innovation Early Adoption Growth Maturity Decline/Legacy Emerging Tech Current Focus Established Tech Mature Solutions (Interactive diagram available in full report)

Innovation Trigger

  • Generative AI for specialized domains
  • Blockchain for supply chain verification

Peak of Inflated Expectations

  • Digital twins for business processes
  • Quantum-resistant cryptography

Trough of Disillusionment

  • Consumer AR/VR applications
  • General-purpose blockchain

Slope of Enlightenment

  • AI-driven analytics
  • Edge computing

Plateau of Productivity

  • Cloud infrastructure
  • Mobile applications

Technology Evolution Timeline

1-2 Years
  • Improved generative models
  • specialized AI applications
3-5 Years
  • AI-human collaboration systems
  • multimodal AI platforms
5+ Years
  • General AI capabilities
  • AI-driven scientific breakthroughs

Expert Perspectives

Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:

"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."

— AI Researcher

"Organizations that develop effective AI governance frameworks will gain competitive advantage."

— Industry Analyst

"The AI talent gap remains a critical barrier to implementation for most enterprises."

— Chief AI Officer

Areas of Expert Consensus

  • Acceleration of Innovation: The pace of technological evolution will continue to increase
  • Practical Integration: Focus will shift from proof-of-concept to operational deployment
  • Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
  • Regulatory Influence: Regulatory frameworks will increasingly shape technology development

Short-Term Outlook (1-2 Years)

In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:

  • Improved generative models
  • specialized AI applications
  • enhanced AI ethics frameworks

These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.

Mid-Term Outlook (3-5 Years)

As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:

  • AI-human collaboration systems
  • multimodal AI platforms
  • democratized AI development

This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.

Long-Term Outlook (5+ Years)

Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:

  • General AI capabilities
  • AI-driven scientific breakthroughs
  • new computing paradigms

These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.

Key Risk Factors and Uncertainties

Several critical factors could significantly impact the trajectory of ai tech evolution:

Ethical concerns about AI decision-making
Data privacy regulations
Algorithm bias

Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.

Alternative Future Scenarios

The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:

Optimistic Scenario

Responsible AI driving innovation while minimizing societal disruption

Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.

Probability: 25-30%

Base Case Scenario

Incremental adoption with mixed societal impacts and ongoing ethical challenges

Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.

Probability: 50-60%

Conservative Scenario

Technical and ethical barriers creating significant implementation challenges

Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.

Probability: 15-20%

Scenario Comparison Matrix

FactorOptimisticBase CaseConservative
Implementation TimelineAcceleratedSteadyDelayed
Market AdoptionWidespreadSelectiveLimited
Technology EvolutionRapidProgressiveIncremental
Regulatory EnvironmentSupportiveBalancedRestrictive
Business ImpactTransformativeSignificantModest

Transformational Impact

Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.

The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.

Implementation Challenges

Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.

Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.

Key Innovations to Watch

Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.

Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.

Technical Glossary

Key technical terms and definitions to help understand the technologies discussed in this article.

Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.

Filter by difficulty:

platform intermediate

algorithm Platforms provide standardized environments that reduce development complexity and enable ecosystem growth through shared functionality and integration capabilities.

neural network intermediate

interface

transformer model intermediate

platform

computer vision intermediate

encryption