How Much Is AI Training Data Worth?

What’s a fair price for data? There has been a lot of talk, discussion and lawsuits related to artificial intelligence (AI) training data.

“Nobody really exactly knows,” Marc Benioff, Salesforce CEO who also owns Time magazine, said in an interview with Bloomberg at the World Economic Forum at Davos.

He said some companies have been stealing all the training data for AI models. Content from media outlets including Time and The New York Times serves up in search results, he told Bloomberg at the World Economic Forum.

Artificial intelligence (AI) companies have used intellectual property to build their technology, and these companies should standardize payments to treat content creators fairly.

“All the training data has been stolen,” Benioff said, referring to intellectual property. “That’s a pretty big thought that there’s commodity as an interface.”



In the middle there are large language models (LLMs). Then there is a broad sense of training data.

Benioff suggested building a company based a standardized set of training data that lets all companies “play a fair game” and let content creators get paid fairly for work.

“That bridge has not yet been crossed and that’s a mistake by the AI companies, but it’s very easy to do,” he said.

Bloomberg also spoke with OpenAI CEO Sam Altman, and OpenAI Vice President of Global Affairs Anna Adeola Makanju about the implications of AI, and whether the company has concerns about another Cambridge Analytica moment.

“There is this belief held by some people that, wow, man, you need all my training data, and my training data is so valuable,” OpenAI CEO Sam Altman told Bloomberg at Davos. “That is not the case. We do not want to train on the New York Times data, for example.”

A lot of OpenAI’s research has been how to train on smaller amounts of high-quality data, and the world will likely figure that out.

Altman said OpenAI would like to work with publishers to serve content in a response that attributes the news articles to a specific publisher.

Sounds like a search engine to me.

If anyone wondered why OpenAI had recently removed language in its terms of service banning the company’s technology from “military and warfare” applications, Makanju set the record straight. She described the decision to remove the verbiage as part of a broader update to policies and change new uses of ChatGPT and its other tools.

On Tuesday, it became known through that discussion that OpenAI is working with the Pentagon on several projects including cybersecurity capabilities -- a departure from the startup’s earlier ban on providing its artificial intelligence for military purposes.

"A lot of the policies were written before we knew what they would use our tools for," Makanju said.

She said OpenAI has maintained a ban on using its technology to develop weapons, destroy property or harm people, and the company has been working with the Department of Defense on Cybersecurity tools for open-source software, and to explore whether or not it can help prevent veterans' suicides.  



1 comment about "How Much Is AI Training Data Worth?".
Check to receive email when comments are posted.
  1. John Grono from GAP Research, January 17, 2024 at 3:56 p.m.

    But wouldn't AI be able to tell us how much AI training data is worth?

Next story loading loading..