Attributing search results to source documents could help verify that the information is accurate.
Researchers are working toward building transparent language models, which are designed to evaluate the source, such as Reddit or Wikipedia — or any other site from which the technology might pull the information to answer a specific question.
The source would provide evidence for the claim about the accuracy of information.
A paper released last week and coauthored by researchers at Google outlines a potential partial solution. The framework is called Attributable to Identified Sources.
Venture Beat initially identified the document. “A language model learns the probability of how often a word occurs based on sets of example text,” according to a report. “Simpler models look at the context of a short sequence of words, whereas larger models work at the level of phrases, sentences, or paragraphs. Most commonly, language models deal with words — sometimes referred to as tokens.”
Larger language models learn to write human-like text by internalizing billions of examples from the public web. They pull from resources such as ebooks, Wikipedia, and social media platforms like Reddit, making inferences in near-real-time.
One major challenge is language models. They struggle with prejudice across race, ethnic, religion and gender, and they do not understand language the way humans do because they pick up on only a few keywords in a sentence.
They can't tell when a new order of the words changes the meaning of the sentence.
Tools like Google’s Language Interpretability Tool can lead to incorrect assumptions about the dataset and models, even when the output is manipulated to show explanations that make no sense.
“It's what's known as the automation bias — the propensity for people to favor suggestions from automated decision-making systems,” according to the report.