OpenAI just announced GPT-4o, its new flagship model.
With GPT-4o, you can now engage with ChatGPT via “any combination of text, audio, and image,” according to OpenAI.
What really sets GPT-4o apart is that it was trained end-to-end on text, vision and audio together. (The “o” in its name stands for “omni.”) This unified multimodal training allows GPT-4o to grasp nuance and context that gets lost when using separate models for each input/output type, as in ChatGPT’s Voice Mode.
The model also matches the text and code capabilities of GPT-4 while greatly improving performance on non-English languages. And it blows past existing models in its ability to understand images and audio.
During its announcement event, OpenAI showed off what GPT-4o can do.
In one instance, OpenAI engineers had a live back-and-forth conversation with GPT-4o in real-time with very little delay. (OpenAI says the model responds to audio inputs in as little as 232 milliseconds—about the same response time as a human in a conversation.)
During audio chats, the model was also able to display a range of tones and react naturally to being interrupted, picking back up where it left off—just like a human in a conversation would.
In another demo, the engineers streamed video for the model in real-time. One of the engineers streamed himself writing out a math problem, then asked GPT-4o to offer advice on how to solve it while he wrote.
The model also displayed impressive capabilities speaking in different languages, conversing with CTO Mira Murati in fluent Italian at one point during the demonstration.
You can check out all the demos in the full launch event video below.
This seems to be a move towards a single, generally useful AI assistant that you can interact with seamlessly in your everyday life. (Think: what Siri or Google Assistant should have been.)
What’s just as exciting is that GPT-4o is now available to all ChatGPT users, not just paid users. (Though ChatGPT Plus users will get higher usage limits for the new model.)
OpenAI has been teasing a big release like this for a couple of weeks.
“I started noticing it early last week that Sam [Altman] and Greg [Brockman] were unusually active on Twitter, as well as some other OpenAI people,” said Paul Roetzer, founder and CEO of Marketing AI Institute, told me on Episode 97 of The Marketing AI Show. “That’s normally a tip off of some sort.”
However, many media outlets initially got it wrong.
There were several reports, including from Reuters, that OpenAI planned to announce a search competitor to Google. (They didn’t.)
OpenAI did, however, quietly make a couple of other announcements in the lead up to GPT-4o that may also prove consequential.
AdWeek got hold of a leaked OpenAI deck that outlines a possible “Preferred Publisher Program,” where OpenAI pays publications to use and cite their content in ChatGPT results.
And, OpenAI released a first draft of its Model Spec, a document that details how they want their models to behave.
How ParrotGPT can help: ParrotGPT offers advanced AI Chatbot solutions that can facilitate seamless text, audio, and image interactions. With ParrotGPT, users can benefit from sophisticated conversational AI capabilities across various communication modes.