May 10, 2026 — OpenAI has unveiled a suite of upgraded voice intelligence models via its developer API, delivering enhanced real-time audio capabilities to power more natural, intelligent voice-driven applications for global developers.
The updated lineup includes three optimized real-time audio models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
Leading the upgrade, GPT-Realtime-2 features GPT-5-class reasoning, marking a major leap from earlier voice models.
It supports complex spoken interactions, sustained contextual conversations, and in-dialogue tool use while maintaining ultra-low latency for smooth real-time interactions.
Complementing the core conversational model, Realtime-Translate enables accurate live multilingual speech translation across dozens of languages, ideal for cross-border communication and real-time multilingual services.
Realtime-Whisper delivers high-fidelity streaming transcription, optimizing accuracy and stability for continuous live audio conversion.
Tailored for developer flexibility, the new API models cater to diverse scenarios, including smart voice agents, customer support systems, live event transcription, and multilingual communication tools.
They address key limitations of legacy voice AI, such as rigid context understanding and delayed responses.
This release signifies OpenAI’s ongoing effort to advance accessible voice intelligence.
By integrating high-level reasoning with real-time audio processing, the API empowers developers to build more responsive, human-like voice experiences while expanding the boundaries of practical conversational AI.

