Technology News from Around the World, Instantly on Oracnoos!

[VIDÉO]  GROK 3 est gratuit (pour le moment) : Testez-le maintenant ! - Related to encoder, 3, videoprism:, graphs, graph:

Talk like a graph: Encoding graphs for large language models

Talk like a graph: Encoding graphs for large language models

Imagine all the things around you — your friends, tools in your kitchen, or even the parts of your bike. They are all connected in different ways. In computer science, the term graph is used to describe connections between objects. Graphs consist of nodes (the objects themselves) and edges (connections between two nodes, indicating a relationship between them). Graphs are everywhere now. The internet itself is a giant graph of websites linked together. Even the knowledge search engines use is organized in a graph-like way.

Furthermore, consider the remarkable advancements in artificial intelligence — such as chatbots that can write stories in seconds, and. Even software that can interpret medical reports. This exciting progress is largely thanks to large language models (LLMs). New LLM technology is constantly being developed for different uses.

Since graphs are everywhere and LLM technology is on the rise, in “Talk like a Graph: Encoding Graphs for Large Language Models”. Presented at ICLR 2024, we present a way to teach powerful LLMs how to improved reason with graph information. Graphs are a useful way to organize information, but LLMs are mostly trained on regular text. The objective is to test different techniques to see what works best and gain practical insights. Translating graphs into text that LLMs can understand is a remarkably complex task. The difficulty stems from the inherent complexity of graph structures with multiple nodes and the intricate web of edges that connect them. Our work studies how to take a graph and translate it into a format that an LLM can understand. We also design a benchmark called GraphQA to study different approaches on different graph reasoning problems and. Show how to phrase a graph-related problem in a way that enables the LLM to solve the graph problem. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: 1) the graph encoding method, 2) the nature of the graph task itself, and. 3) interestingly, the very structure of the graph considered. These findings give us clues on how to best represent graphs for LLMs. Picking the right method can make the LLM up to 60% improved at graph tasks!

Language models (LMs) trained to predict the next word given input text are the key technology for many applications [1. 2]. In Gboard, LMs are used t...

For the past few months, Meta has been sending recipes to a Dutch scaleup called VSParticle (VSP). These are not food recipes — they’re AI-generated i...

Google DeepMind researchers have built an AI weather forecasting tool that makes faster and. More accurate predictions than the best system available t...

VideoPrism: A foundational visual encoder for video understanding

VideoPrism: A foundational visual encoder for video understanding

With the goal of building a single model for general-purpose video understanding, we introduce “ VideoPrism: A Foundational Visual Encoder for Video Understanding ”. VideoPrism is a ViFM designed to handle a wide spectrum of video understanding tasks, including classification, localization, retrieval, captioning, and question answering (QA). We propose innovations in both the pre-training data as well as the modeling strategy. We pre-train VideoPrism on a massive and diverse dataset: 36 million high-quality video-text pairs and 582 million video clips with noisy or machine-generated parallel text. Our pre-training approach is designed for this hybrid data, to learn both from video-text pairs and the videos themselves. VideoPrism is incredibly easy to adapt to new video understanding challenges, and achieves state-of-the-art performance using a single frozen model.

Videos offer dynamic visual content far more rich than static images, capturing movement, changes. And dynamic relationships between entities. Analyzing this complexity, along with the immense diversity of publicly available video data, demands models that go beyond traditional image understanding. Consequently, many of the approaches that best perform on video understanding still rely on specialized models tailor-made for particular tasks. in recent times, there has been exciting progress in this area using video foundation models (ViFMs), such as VideoCLIP , InternVideo , VideoCoCa , and UMT . However, building a ViFM that handles the sheer diversity of video data remains a challenge.

An astounding number of videos are available on the Web, covering a variety of content from everyday moments people share to historical moments to scientific observations. Each of which contains a unique record of the world. The right tools could help researchers analyze these videos, transforming how we understand the world around us.

A powerful ViFM needs a very large collection of videos on which to train — similar to other foundation models (FMs). Such as those for large language models (LLMs). Ideally, we would want the pre-training data to be a representative sample of all the videos in the world. While naturally most of these videos do not have perfect captions or descriptions, even imperfect text can provide useful information about the semantic content of the video.

To give our model the best possible starting point, we put together a massive pre-training corpus consisting of several public and private datasets, including YT-Temporal-180M. InternVid, VideoCC, WTS-70M, etc. This includes 36 million carefully selected videos with high-quality captions. Along with an additional 582 million clips with varying levels of noisy text (like auto-generated transcripts). To our knowledge, this is the largest and most diverse video training corpus of its kind.

Statistics on the video-text pre-training data. The large variations of the CLIP similarity scores (the higher, the enhanced) demonstrate the diverse caption quality of our pre-training data, which is a byproduct of the various ways used to harvest the text.

The VideoPrism model architecture stems from the standard vision transformer (ViT) with a factorized design that sequentially encodes spatial and. Temporal information following ViViT. Our training approach leverages both the high-quality video-text data and the video data with noisy text mentioned above. To start, we use contrastive learning (an approach that minimizes the distance between positive video-text pairs while maximizing the distance between negative video-text pairs) to teach our model to match videos with their own text descriptions. Including imperfect ones. This builds a foundation for matching semantic language content to visual content.

After video-text contrastive training, we leverage the collection of videos without text descriptions. Here, we build on the masked video modeling framework to predict masked patches in a video, with a few improvements. We train the model to predict both the video-level global embedding and. Token-wise embeddings from the first-stage model to effectively leverage the knowledge acquired in that stage. We then randomly shuffle the predicted tokens to prevent the model from learning shortcuts.

What is unique about VideoPrism’s setup is that we use two complementary pre-training signals: text descriptions and. The visual content within a video. Text descriptions often focus on what things look like, while the video content provides information about movement and visual dynamics. This enables VideoPrism to excel in tasks that demand an understanding of both appearance and motion.

We conduct extensive evaluation on VideoPrism across four broad categories of video understanding tasks, including video classification and localization, video-text retrieval. Video captioning, question answering, and scientific video understanding. VideoPrism achieves state-of-the-art performance on 30 out of 33 video understanding benchmarks — all with minimal adaptation of a single, frozen model.

VideoPrism compared to the previous best-performing FMs.

We evaluate VideoPrism on an existing large-scale video understanding benchmark (VideoGLUE) covering classification and. Localization tasks. We find that (1) VideoPrism outperforms all of the other state-of-the-art FMs, and (2) no other single model consistently came in second place. to effectively pack a variety of video signals into one encoder — from semantics at different granularities to appearance and motion cues — and. It works well across a variety of video insights.

We further explore combining VideoPrism with LLMs to unlock its ability to handle various video-language tasks. In particular, when paired with a text encoder (following LiT) or a language decoder (such as PaLM-2), VideoPrism can be utilized for video-text retrieval, video captioning. And video QA tasks. We compare the combined models on a broad and challenging set of vision-language benchmarks. VideoPrism sets the new state of the art on most benchmarks. From the visual results, we find that VideoPrism is capable of understanding complex motions and appearances in videos (. The model can recognize the different colors of spinning objects on the window in the visual examples below). These results demonstrate that VideoPrism is strongly compatible with language models.

VideoPrism achieves competitive results compared with state-of-the-art approaches (including VideoCoCa, UMT and Flamingo) on multiple video-text retrieval (top) and. Video captioning and video QA (bottom) benchmarks. We also show the absolute score differences compared with the previous best model to highlight the relative improvements of VideoPrism. We research the Recall@1 on MASRVTT, VATEX, and ActivityNet, CIDEr score on MSRVTT-Cap, VATEX-Cap, and YouCook2, top-1 accuracy on MSRVTT-QA and MSVD-QA, and WUPS index on NExT-QA.

At the Mobile World Congress 2025. Honor revealed a series of new hardware, but that was arguably the least crucial thing. In...

Differential privacy (DP) is a property of randomized mechanisms that limit the influence of any individual user’s information while processing and an...

Amazon's early-year Devices and Services event took place just days ago. And the firm made it clear that AI will continue to b...

[VIDÉO]  GROK 3 est gratuit (pour le moment) : Testez-le maintenant !

[VIDÉO]  GROK 3 est gratuit (pour le moment) : Testez-le maintenant !

Grok 3 est dispo gratuitement, mais pour combien de temps ? L’IA développée par xAI, la boîte d’Elon Musk, promet pas mal d’évolutions. C’est le moment d’en profiter et de voir ce qu’elle a dans le ventre !

Dans cette vidéo. Nous explorons les principales fonctionnalités de Grok 3 et ses capacités. Nous avons soumis l’IA à une série de tests pour évaluer ses performances en matière de recherche avancée, de génération d’images ou encore de codage. Ces essais nous ont permis de mesurer son efficacité et de déterminer si elle est à la hauteur des performances annoncées par xAI.

L’une des caractéristiques notables de Grok 3 est son intégration avec la plateforme X (anciennement Twitter). Offrant un accès en temps réel aux informations et tendances actuelles. Cette fonctionnalité distingue Grok 3 des autres chatbots du marché, en enrichissant les interactions utilisateur par une contextualisation immédiate.

Regardez notre vidéo pour découvrir Grok 3 à travers une présentation fluide et quelques tests rapides de ses fonctionnalités. Testez-le aussi de votre côté et dites-moi ensuite ce que vous en pensez dans les commentaires. Attention, Elon Musk avait précisé que Glok 3 est gratuit, mais temporairement. Donc, profitez dès maintenant. Pensez aussi à nous laisser un j’aime et à vous abonner à la chaîne pour suivre avec nous l’évolution de l’IA.

Lung cancer is the leading cause of cancer-related deaths globally with million deaths reported in 2020. Late diagnosis dramatically reduces the c...

People use tables every day to organize and interpret complex information in a structured, easily accessible format. Due to the ubiquity of such table...

Machine learning models in the real world are often trained on limited data that may contain unintended statistical biases. For example, in the CELEBA...

Market Impact Analysis

Market Growth Trend

2018201920202021202220232024
23.1%27.8%29.2%32.4%34.2%35.2%35.6%
23.1%27.8%29.2%32.4%34.2%35.2%35.6% 2018201920202021202220232024

Quarterly Growth Rate

Q1 2024 Q2 2024 Q3 2024 Q4 2024
32.5% 34.8% 36.2% 35.6%
32.5% Q1 34.8% Q2 36.2% Q3 35.6% Q4

Market Segments and Growth Drivers

Segment Market Share Growth Rate
Machine Learning29%38.4%
Computer Vision18%35.7%
Natural Language Processing24%41.5%
Robotics15%22.3%
Other AI Technologies14%31.8%
Machine Learning29.0%Computer Vision18.0%Natural Language Processing24.0%Robotics15.0%Other AI Technologies14.0%

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity:

Innovation Trigger Peak of Inflated Expectations Trough of Disillusionment Slope of Enlightenment Plateau of Productivity AI/ML Blockchain VR/AR Cloud Mobile

Competitive Landscape Analysis

Company Market Share
Google AI18.3%
Microsoft AI15.7%
IBM Watson11.2%
Amazon AI9.8%
OpenAI8.4%

Future Outlook and Predictions

The Talk Like Graph landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:

Year-by-Year Technology Evolution

Based on current trajectory and expert analyses, we can project the following development timeline:

2024Early adopters begin implementing specialized solutions with measurable results
2025Industry standards emerging to facilitate broader adoption and integration
2026Mainstream adoption begins as technical barriers are addressed
2027Integration with adjacent technologies creates new capabilities
2028Business models transform as capabilities mature
2029Technology becomes embedded in core infrastructure and processes
2030New paradigms emerge as the technology reaches full maturity

Technology Maturity Curve

Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:

Time / Development Stage Adoption / Maturity Innovation Early Adoption Growth Maturity Decline/Legacy Emerging Tech Current Focus Established Tech Mature Solutions (Interactive diagram available in full report)

Innovation Trigger

  • Generative AI for specialized domains
  • Blockchain for supply chain verification

Peak of Inflated Expectations

  • Digital twins for business processes
  • Quantum-resistant cryptography

Trough of Disillusionment

  • Consumer AR/VR applications
  • General-purpose blockchain

Slope of Enlightenment

  • AI-driven analytics
  • Edge computing

Plateau of Productivity

  • Cloud infrastructure
  • Mobile applications

Technology Evolution Timeline

1-2 Years
  • Improved generative models
  • specialized AI applications
3-5 Years
  • AI-human collaboration systems
  • multimodal AI platforms
5+ Years
  • General AI capabilities
  • AI-driven scientific breakthroughs

Expert Perspectives

Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:

"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."

— AI Researcher

"Organizations that develop effective AI governance frameworks will gain competitive advantage."

— Industry Analyst

"The AI talent gap remains a critical barrier to implementation for most enterprises."

— Chief AI Officer

Areas of Expert Consensus

  • Acceleration of Innovation: The pace of technological evolution will continue to increase
  • Practical Integration: Focus will shift from proof-of-concept to operational deployment
  • Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
  • Regulatory Influence: Regulatory frameworks will increasingly shape technology development

Short-Term Outlook (1-2 Years)

In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:

  • Improved generative models
  • specialized AI applications
  • enhanced AI ethics frameworks

These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.

Mid-Term Outlook (3-5 Years)

As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:

  • AI-human collaboration systems
  • multimodal AI platforms
  • democratized AI development

This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.

Long-Term Outlook (5+ Years)

Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:

  • General AI capabilities
  • AI-driven scientific breakthroughs
  • new computing paradigms

These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.

Key Risk Factors and Uncertainties

Several critical factors could significantly impact the trajectory of ai tech evolution:

Ethical concerns about AI decision-making
Data privacy regulations
Algorithm bias

Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.

Alternative Future Scenarios

The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:

Optimistic Scenario

Responsible AI driving innovation while minimizing societal disruption

Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.

Probability: 25-30%

Base Case Scenario

Incremental adoption with mixed societal impacts and ongoing ethical challenges

Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.

Probability: 50-60%

Conservative Scenario

Technical and ethical barriers creating significant implementation challenges

Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.

Probability: 15-20%

Scenario Comparison Matrix

FactorOptimisticBase CaseConservative
Implementation TimelineAcceleratedSteadyDelayed
Market AdoptionWidespreadSelectiveLimited
Technology EvolutionRapidProgressiveIncremental
Regulatory EnvironmentSupportiveBalancedRestrictive
Business ImpactTransformativeSignificantModest

Transformational Impact

Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.

The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.

Implementation Challenges

Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.

Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.

Key Innovations to Watch

Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.

Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.

Technical Glossary

Key technical terms and definitions to help understand the technologies discussed in this article.

Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.

Filter by difficulty:

API beginner

algorithm APIs serve as the connective tissue in modern software architectures, enabling different applications and services to communicate and share data according to defined protocols and data formats.
API concept visualizationHow APIs enable communication between different software systems
Example: Cloud service providers like AWS, Google Cloud, and Azure offer extensive APIs that allow organizations to programmatically provision and manage infrastructure and services.

platform intermediate

interface Platforms provide standardized environments that reduce development complexity and enable ecosystem growth through shared functionality and integration capabilities.

large language model intermediate

platform

embeddings intermediate

encryption

machine learning intermediate

API