OpenAI’s newest model is GPT-4o

May 13, 2024

OpenAI is releasing a new flagship generative AI model called GPT-4o, set to roll out “iteratively” across the company’s developer and consumer-facing products over the next few weeks. The “o” in GPT-4o stands for “omni” — referring to GPT-4o’s multimodality.

OpenAI CTO Mira Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across text and vision as well as audio.

“GPT-4o reasons across voice, text and vision,” Murati said during a keynote presentation at OpenAI’s offices in San Francisco on Monday. “And this is incredibly important, because we’re looking at the future of interaction between ourselves and machines.”

GPT-4 Turbo — OpenAI’s previous leading model, an enhanced version of GPT-4 — was trained a combination of images and text, and could analyze images and text to accomplish tasks like extracting text from images or even describing the content of those images. But GPT-4o adds speech to the mix.

What does this enable? A variety of things.

GPT-4o greatly improves the ChatGPT experience — ChatGPT being OpenAI’s viral AI-powered chatbot. ChatGPT has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model. But GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant.

For example, users can ask ChatGPT — powered by GPT-4o — a question and interrupt ChatGPT while it’s answering. The model delivers “real time” responsiveness, OpenAI says, and can even pick up on the emotion in a user’s voice, in response generating voices in “a range of different emotive styles.”

GPT-4o improves ChatGPT’s vision capabilities in addition. Given a photo — or a desktop screen — ChatGPT can now quickly answer related questions, from things ranging from “What’s going on in this software code” to “What brand of shirt is this person wearing?”

This will evolve and get even better in the future, Murati says. While today GPT-4o lets you do things like take a picture of a menu in a different language and translate it, in the future, it could allow ChatGPT to “watch” a live sports game and explain the rules to you, she said.

“We know that these models [are getting] more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with [GPTs],” Murati said.

GPT-4o is available in the free tier of ChatGPT starting today, and to subscribers to OpenAI’s premium ChatGPT Plus and Team plans with “5x higher” message limits. (OpenAI notes that ChatGPT will automatically switch to GPT-3.5 when users hit the usage threshold.) OpenAI says that it’ll roll out the improved voice experience underpinned by GPT-4o in alpha to Plus users in the next month or so, alongside Enterprise options with GPT-4o.

GPT-4o is more multilingual as well, OpenAI claims, with improved performance in 50 different languages. In OpenAI’s API, GPT-4o is twice as fast as GPT-4 (specifically GPT-4 Turbo), half the price and has higher rate limits.

Voice isn’t a part of the GPT-4o API for all customers at present. OpenAI, citing the risk of misuse, says that it plans to first launch support for GPT-4o’s new audio capabilities to “a small group of trusted partners” in the coming weeks.

In other news, OpenAI is releasing a refreshed ChatGPT UI on the web with a new, “more conversational” home screen and message layout and a desktop version of ChatGPT for macOS, which lets users ask ChatGPT a question via a keyboard shortcut or take and discuss screenshots either by typing or speaking. (Plus users will get access first, starting today, and a Windows version of the app will arrive later this year.)

Elsewhere, access to the GPT Store, OpenAI’s library of third-party chatbots built on its AI models, is now available to users of ChatGPT’s free tier. And free users can take advantage of features that were formerly paywalled, like a memory capability that lets ChatGPT “remember” preferences for future interactions.

techcrunch.com

Memecoin trader loses over $1M following Normie exploit

Astar & Startale: Illuminating the Path to Web3 Mass Adoption

Unveiling Polygon's AggLayer: Insights from co-founder Brendan Farmer

Conflux (CFX) v2.4.0-testnet Hardfork Upgrade Announcement

Elon Musk's xAI secures $6B in bid to rival ChatGPT by…