Technology

Microsoft’s new AI agent can control software and robots

Magma could enable AI agents to take multistep actions in the real and digital worlds.

Ars Technica

published: Feb 21, 2025

Blog Image

On Wednesday, Microsoft Research introduced Magma, an integrated AI foundation model that combines visual and language processing to control software interfaces and robotic systems. If the results hold up outside of Microsoft's internal testing, it could mark a meaningful step forward for an all-purpose multimodal AI that can operate interactively in both real and digital spaces.

Microsoft claims that Magma is the first AI model that not only processes multimodal data (like text, images, and video) but can also natively act upon it—whether that’s navigating a user interface or manipulating physical objects. The project is a collaboration between researchers at Microsoft, KAIST, the University of Maryland, the University of Wisconsin-Madison, and the University of Washington.

We've seen other large language model-based robotics projects like Google's PALM-E and RT-2 or Microsoft's ChatGPT for Robotics that utilize LLMs for an interface. However, unlike many prior multimodal AI systems that require separate models for perception and control, Magma integrates these abilities into a single foundation model.

Read full article

Comments

Read More
AI
agentic AI
google
Google Gemini
machine learning
microsoft
Microsoft Magma
multimodal AI
openai
Operator
robot AI
robots

Stay in the loop

Never miss out on the latest insights, trends, and stories from Cedi Life! Be the first to know when we publish new articles by subscribing to our alerts.