Google, hot on the heels of releasing Gemini Ultra 1.0, has surprised the AI world with the announcement of Gemini 1.5, including Gemini 1.5 Pro.
This isn’t just a minor update. It appears to be a powerhouse offering with significant improvements over the previous model.
Said Google CEO Sundar Pichai in the announcement blog post:
“It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute.”
What do you need to know about the announcement?
I got the breakdown from Marketing AI Institute founder/CEO Paul Roetzer on Episode 84 of The Artificial Intelligence Show.
It can handle 1 million tokens.
Gemini 1.5 delivers what Google calls a “breakthrough” in long-context understanding. It can use up to 1 million tokens consistently, which is far longer than any large-scale foundation model out there.
That means Gemini can handle “vast amounts of information in one go” like:
- 1 hour of video…
- 11 hours of audio…
- 30,000 lines of code…
- Or 700,000+ words.
That’s a huge deal. Gemini 1.5 can learn from and use so much more material than other models.
Not to mention, Google says it got up to 10 million tokens in its research.
“The bigger the context window, the more information you can put into the prompt and the output can become more consistent, relevant, and useful,” says Roetzer.
It has groundbreaking learning abilities.
Google says Gemini 1.5 displays in-context learning, which means it can learn new skills from information given in a long prompt without additional fine-tuning.
One example given by the company shows how Gemini 1.5 learned from information it has never seen before:
“When given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content.”
The implications for business are profound.
This opens up tons of valuable use cases across marketing and business. Google offers a few examples of possible use cases:
- Accurately analyze entire code bases…
- Reason, synthesize, and make comparisons across very long documents like contracts, analyst reports, research studies, or books…
- Analyze and compare content across hours of video and find specific details in footage…
- And enable chatbots to have long conversations without forgetting details.
With some models, you can already so some of these things. But what traditionally has happened is that existing models become less accurate and reliable the more tokens you give them, says Roetzer.
“What it seems like Google is saying is they’re finding ways to maintain accuracy and reliability.”
If true, it has profound implications for knowledge work.
What if chatbots became totally reliable and accurate, and remembered everything you’d previously talked about? What if you could find specific details in any length of video footage instantly? What if you could get detailed, accurate, and reliable info from any type of recording, text, or code?
Any one of these—and countless other—use cases could transform knowledge work if they were able to be done accurately.
“When this stuff becomes truly reliable, it’s kind of crazy to step back and think about,” says Roetzer.