ScreenAI: A visual language model for UI and visually-situated language understanding - Related to machine, learning, performance, health, assessment
Google at APS 2024

Today the 2024 March Meeting of the American Physical Society (APS) kicks off in Minneapolis, MN.
A premier conference on topics ranging across physics and related fields, APS 2024 brings together researchers, students, and industry professionals to share their discoveries and build partnerships with the goal of realizing fundamental advances in physics-related sciences and technology.
Furthermore, this year. Google has a strong presence at APS with a booth hosted by the Google Quantum AI team, 50+ talks throughout the conference, and participation in conference organizing activities, special sessions and events. Attending APS 2024 in person? Come visit Google’s Quantum AI booth to learn more about the exciting work we’re doing to solve some of the field’s most interesting challenges.
You can learn more about the latest cutting edge work we are presenting at the conference along with our schedule of booth events below (Googlers listed in bold).
Akool, a startup doing AI-driven avatar content creation, introduced enhancements to Akool Streaming Avatars that connect avatars with AI models.
Amsterdam-headquartered Nebius, which builds full-stack AI infrastructure for tech firms. Has secured $700mn in a private equity deal led by Nvidia, A...
Google DeepMind spinoff Isomorphic Labs expects testing on its first AI-designed drugs to begin this year, as tech startups race to turn algorithmic m...
ScreenAI: A visual language model for UI and visually-situated language understanding

ScreenAI’s architecture is based on PaLI, composed of a multimodal encoder block and an autoregressive decoder. The PaLI encoder uses a vision transformer (ViT) that creates image embeddings and a multimodal encoder that takes the concatenation of the image and. Text embeddings as input. This flexible architecture allows ScreenAI to solve vision tasks that can be recast as text+image-to-text problems.
On top of the PaLI architecture. We employ a flexible patching strategy introduced in pix2struct. Instead of using a fixed-grid pattern, the grid dimensions are selected such that they preserve the native aspect ratio of the input image. This enables ScreenAI to work well across images of various aspect ratios.
The ScreenAI model is trained in two stages: a pre-training stage followed by a fine-tuning stage. First, self-supervised learning is applied to automatically generate data labels, which are then used to train ViT and the language model. ViT is frozen during the fine-tuning stage, where most data used is manually labeled by human raters.
Google DeepMind spinoff Isomorphic Labs expects testing on its first AI-designed drugs to begin this year, as tech startups race to turn algorithmic m...
Floods are the most common natural disaster. And are responsible for roughly $50 billion in annual financial damages worldwide. The rate of flood-rela...
Imagine all the things around you — your friends, tools in your kitchen, or even the parts of your bike. They are all connected in different ways. In ...
HEAL: A framework for health equity assessment of machine learning performance

As an illustrative case study, we applied the framework to a dermatology model. Which utilizes a convolutional neural network similar to that described in prior work. This example dermatology model was trained to classify 288 skin conditions using a development dataset of 29k cases. The input to the model consists of three photos of a skin concern along with demographic information and a brief structured medical history. The output consists of a ranked list of possible matching skin conditions.
Using the HEAL framework. We evaluated this model by assessing whether it prioritized performance with respect to pre-existing health outcomes. The model was designed to predict possible dermatologic conditions (from a list of hundreds) based on photos of a skin concern and patient metadata. Evaluation of the model is done using a top-3 agreement metric, which quantifies how often the top 3 output conditions match the most likely condition as suggested by a dermatologist panel. The HEAL metric is computed via the anticorrelation of this top-3 agreement with health outcome rankings.
We used a dataset of 5,420 teledermatology cases, enriched for diversity in age, sex and. Race/ethnicity, to retrospectively evaluate the model’s HEAL metric. The dataset consisted of “store-and-forward” cases from patients of 20 years or older from primary care providers in the USA and. Skin cancer clinics in Australia. Based on a review of the literature, we decided to explore race/ethnicity, sex and age as potential factors of inequity, and used sampling techniques to ensure that our evaluation dataset had sufficient representation of all race/ethnicity. Sex and age groups. To quantify pre-existing health outcomes for each subgroup we relied on measurements from public databases endorsed by the World Health Organization, such as Years of Life Lost (YLLs) and Disability-Adjusted Life Years (DALYs; years of life lost plus years lived with disability).
Google DeepMind spinoff Isomorphic Labs expects testing on its first AI-designed drugs to begin this year, as tech startups race to turn algorithmic m...
Furthermore, the Indian space sector, once dominated by government-led initiatives. Is seeing a surge in private players, opening up career trajectories that didn’...
UK startup PhysicsX, founded by former Formula 1 engineering whizz Robin “Dr. Rob” Tuluie, has unveiled an AI tool that could fast-track the time it t...
Market Impact Analysis
Market Growth Trend
2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 |
---|---|---|---|---|---|---|
23.1% | 27.8% | 29.2% | 32.4% | 34.2% | 35.2% | 35.6% |
Quarterly Growth Rate
Q1 2024 | Q2 2024 | Q3 2024 | Q4 2024 |
---|---|---|---|
32.5% | 34.8% | 36.2% | 35.6% |
Market Segments and Growth Drivers
Segment | Market Share | Growth Rate |
---|---|---|
Machine Learning | 29% | 38.4% |
Computer Vision | 18% | 35.7% |
Natural Language Processing | 24% | 41.5% |
Robotics | 15% | 22.3% |
Other AI Technologies | 14% | 31.8% |
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity:
Competitive Landscape Analysis
Company | Market Share |
---|---|
Google AI | 18.3% |
Microsoft AI | 15.7% |
IBM Watson | 11.2% |
Amazon AI | 9.8% |
OpenAI | 8.4% |
Future Outlook and Predictions
The Language Google 2024 landscape is evolving rapidly, driven by technological advancements, changing threat vectors, and shifting business requirements. Based on current trends and expert analyses, we can anticipate several significant developments across different time horizons:
Year-by-Year Technology Evolution
Based on current trajectory and expert analyses, we can project the following development timeline:
Technology Maturity Curve
Different technologies within the ecosystem are at varying stages of maturity, influencing adoption timelines and investment priorities:
Innovation Trigger
- Generative AI for specialized domains
- Blockchain for supply chain verification
Peak of Inflated Expectations
- Digital twins for business processes
- Quantum-resistant cryptography
Trough of Disillusionment
- Consumer AR/VR applications
- General-purpose blockchain
Slope of Enlightenment
- AI-driven analytics
- Edge computing
Plateau of Productivity
- Cloud infrastructure
- Mobile applications
Technology Evolution Timeline
- Improved generative models
- specialized AI applications
- AI-human collaboration systems
- multimodal AI platforms
- General AI capabilities
- AI-driven scientific breakthroughs
Expert Perspectives
Leading experts in the ai tech sector provide diverse perspectives on how the landscape will evolve over the coming years:
"The next frontier is AI systems that can reason across modalities and domains with minimal human guidance."
— AI Researcher
"Organizations that develop effective AI governance frameworks will gain competitive advantage."
— Industry Analyst
"The AI talent gap remains a critical barrier to implementation for most enterprises."
— Chief AI Officer
Areas of Expert Consensus
- Acceleration of Innovation: The pace of technological evolution will continue to increase
- Practical Integration: Focus will shift from proof-of-concept to operational deployment
- Human-Technology Partnership: Most effective implementations will optimize human-machine collaboration
- Regulatory Influence: Regulatory frameworks will increasingly shape technology development
Short-Term Outlook (1-2 Years)
In the immediate future, organizations will focus on implementing and optimizing currently available technologies to address pressing ai tech challenges:
- Improved generative models
- specialized AI applications
- enhanced AI ethics frameworks
These developments will be characterized by incremental improvements to existing frameworks rather than revolutionary changes, with emphasis on practical deployment and measurable outcomes.
Mid-Term Outlook (3-5 Years)
As technologies mature and organizations adapt, more substantial transformations will emerge in how security is approached and implemented:
- AI-human collaboration systems
- multimodal AI platforms
- democratized AI development
This period will see significant changes in security architecture and operational models, with increasing automation and integration between previously siloed security functions. Organizations will shift from reactive to proactive security postures.
Long-Term Outlook (5+ Years)
Looking further ahead, more fundamental shifts will reshape how cybersecurity is conceptualized and implemented across digital ecosystems:
- General AI capabilities
- AI-driven scientific breakthroughs
- new computing paradigms
These long-term developments will likely require significant technical breakthroughs, new regulatory frameworks, and evolution in how organizations approach security as a fundamental business function rather than a technical discipline.
Key Risk Factors and Uncertainties
Several critical factors could significantly impact the trajectory of ai tech evolution:
Organizations should monitor these factors closely and develop contingency strategies to mitigate potential negative impacts on technology implementation timelines.
Alternative Future Scenarios
The evolution of technology can follow different paths depending on various factors including regulatory developments, investment trends, technological breakthroughs, and market adoption. We analyze three potential scenarios:
Optimistic Scenario
Responsible AI driving innovation while minimizing societal disruption
Key Drivers: Supportive regulatory environment, significant research breakthroughs, strong market incentives, and rapid user adoption.
Probability: 25-30%
Base Case Scenario
Incremental adoption with mixed societal impacts and ongoing ethical challenges
Key Drivers: Balanced regulatory approach, steady technological progress, and selective implementation based on clear ROI.
Probability: 50-60%
Conservative Scenario
Technical and ethical barriers creating significant implementation challenges
Key Drivers: Restrictive regulations, technical limitations, implementation challenges, and risk-averse organizational cultures.
Probability: 15-20%
Scenario Comparison Matrix
Factor | Optimistic | Base Case | Conservative |
---|---|---|---|
Implementation Timeline | Accelerated | Steady | Delayed |
Market Adoption | Widespread | Selective | Limited |
Technology Evolution | Rapid | Progressive | Incremental |
Regulatory Environment | Supportive | Balanced | Restrictive |
Business Impact | Transformative | Significant | Modest |
Transformational Impact
Redefinition of knowledge work, automation of creative processes. This evolution will necessitate significant changes in organizational structures, talent development, and strategic planning processes.
The convergence of multiple technological trends—including artificial intelligence, quantum computing, and ubiquitous connectivity—will create both unprecedented security challenges and innovative defensive capabilities.
Implementation Challenges
Ethical concerns, computing resource limitations, talent shortages. Organizations will need to develop comprehensive change management strategies to successfully navigate these transitions.
Regulatory uncertainty, particularly around emerging technologies like AI in security applications, will require flexible security architectures that can adapt to evolving compliance requirements.
Key Innovations to Watch
Multimodal learning, resource-efficient AI, transparent decision systems. Organizations should monitor these developments closely to maintain competitive advantages and effective security postures.
Strategic investments in research partnerships, technology pilots, and talent development will position forward-thinking organizations to leverage these innovations early in their development cycle.
Technical Glossary
Key technical terms and definitions to help understand the technologies discussed in this article.
Understanding the following technical concepts is essential for grasping the full implications of the security threats and defensive measures discussed in this article. These definitions provide context for both technical and non-technical readers.