Reddit is going public. And, as part of its IPO filing, the company revealed it has data licensing agreements with AI companies.
The agreements, worth more than $200 million, presumably give these companies’ AI models the ability to train legally on Reddit data.
Reddit hasn’t revealed all the AI companies involved. But Reuters found out that one of the deals is with Google to the tune of $60 million per year.
What’s going on here?
I got the answers on Episode 85 of The Artificial Intelligence Show from Marketing AI Institute founder/CEO Paul Roetzer.
The future of AI models is licensed and synthetic data
“This is going to be worth a lot of money,” says Roetzer. “The future of these models is going to be licensed and synthetic data.”
The most powerful AI models learn by ingesting enormous quantities of information. How they get at that information is the source of controversy. (Some companies like OpenAI are being sued for their use of copyright material.)
But by training on licensed content, or training on AI-generated content, AI companies avoid legal issues and get unique datasets that give them an advantage over other models. That makes proprietary datasets like Reddit’s worth a lot of money to AI companies.
Companies with proprietary data are sitting on a goldmine
Access to unique data is how AI models stay competitive, says Roetzer. So it’s no wonder Google and others are inking these types of deals.
It’s also why Elon Musk turned off access to the API of X, formerly Twitter. Now, only his AI model, Grok, can train on that data.
This likely impacts media companies and online sites with proprietary data, says Roetzer. Because this data represents a goldmine to AI companies.
“If you have a bunch of proprietary data, licensing of data is gonna be huge moving forward.”
ParrotGPT provides AI Chatbot solutions that can help businesses leverage AI technology for improved customer interactions, enhanced user experiences, and efficient problem-solving capabilities.