Google Bard has Surpassed GPT-4

Google Bard just made a stunning leap in capabilities…

It just beat GPT-4 on a top leaderboard that evaluates AI models.

The leaderboard, called Chatbot Arena, comes from the Large Model Systems Organization. And it now shows Google Bard (powered by Google’s Gemini Pro model) now in 2nd place in terms of performance.

The leaderboard takes into account 200,000+ human votes on which models users prefer.

It also assigns an “Elo” rating to each model, which is a method of calculating how good players are at zero-sum games like chess.

Bard still trails behind GPT-4 Turbo, but now surpasses other versions of GPT-4 and other popular models like Claude and Mistral.

What should you do now that Bard is climbing the rankings?

In Episode 81 of The Marketing AI Show, I got the answer from Marketing AI Institute founder/CEO Paul Roetzer.

Here’s what you need to know…

This Is a Trustworthy Leaderboard

Chatbot Arena isn’t just a random online ranking site, says Roetzer. It’s the real deal.

It’s trusted by some of the top players in AI, including Andrej Karpathy, a leading AI researcher at OpenAI. (In fact, Karpathy says it’s one of only two evaluation sites he trusts.)

It Works By Pitting Models Against Each Other

The human evaluation component of Chatbot Arena works by having you pit two models against each other for the same prompt. (Hence the name.)

For instance, you can give Bard (powered by Gemini Pro) and GPT-4 the same prompt, get two different outputs, and rate which one is best.

When pitted against several versions of GPT-4, Bard comes out the winner. However, it still falls short when matched against GPT-4 Turbo, the latest version of OpenAI’s most advanced model.

Not to mention, Gemini Pro, which now powers Bard after a December 2023 update, isn’t even the most powerful version of Google’s new models.

Gemini Ultra is the most powerful version of Google’s family of advanced models—and Google plans to incorporate it into its services and AI tools moving forward. Which means Ultra may be an even bigger leap forward.

This Is a Trustworthy Leaderboard

It Works By Pitting Models Against Each Other

Leave a Reply Cancel reply