Thinking Machines Lab Challenges the Sequential AI Paradigm with Full-Duplex Interaction Models

The Shift from Sequential to Simultaneous Processing

Former OpenAI CTO Mira Murati has officially entered the AI race with her new venture, Thinking Machines Lab. The startup is challenging the current standard of AI interaction by introducing 'interaction models' designed to process input and generate responses simultaneously, effectively mimicking the fluidity of a phone call rather than a text-based chat.

The Breakthrough in Full-Duplex AI

Unlike traditional Large Language Models (LLMs) that operate on a sequential loop—listen, wait, respond—Thinking Machines Lab is building models capable of 'full duplex' processing. This allows the AI to interrupt, interject, and converse in real-time, moving away from the rigid 'user speaks, AI listens' structure.

Model Name: TML-Interaction-Small
Status: Research preview (limited release coming in the next few months)
Founder: Mira Murati (ex-OpenAI CTO)

Speeding Up the Conversation

The technical claims are centered on latency. The company states that TML-Interaction-Small responds in 0.40 seconds. This is roughly the speed of natural human conversation and significantly faster than the current benchmarks seen in models from OpenAI and Google.

From Text Chains to Phone Calls

This technology represents a fundamental shift in user experience. By removing the 'wait time' between turns, the AI becomes a conversational partner rather than a static tool. This moves the industry toward voice-first interfaces that feel less like software and more like human communication.

The Future of Native Interactivity

While benchmarks are promising, the real test will be real-world usability. If Thinking Machines can deliver on this 'native interactivity,' we may see a rapid decline in text-based chat interfaces in favor of voice-first AI assistants that can truly interrupt and engage dynamically.