Google TW-BERT Demonstrates Improvements On Search

A Google research paper on Term Weighting Bidirectional Encoder Representations from Transformers (TW-BERT) describes how the new framework improves search rankings without requiring major changes because it integrates with existing query expansion models and improves performance.

BERT refers to artificial intelligence (AI) language models. TW-BERT learns to predict the weight for individual n-grams such as uni-grams and bi-grams query input terms.

These inferred weights and terms can be used directly by a retrieval system to perform a search query. 

Google has not confirmed that it is using TW-BERT, but the new framework is a breakthrough that improves ranking processes and is easy to deploy, as Search Engine Journal points out.

TW-BERT, to optimize the term weights, incorporates a scoring function used by the search engine such as BM25 to score query-document pairs.



Google's technology then computes a ranking loss, comparing it to the matching scores and optimizes the learned query term weights from end-to-end. It helps to more accurately determine what documents are relevant for that search query.

The research paper explains that aligning TW-BERT with search engine scorers minimizes changes needed to integrate it into existing production applications, whereas existing deep learning-based search methods would require further infrastructure optimization and hardware requirements.

TW-BERT expands the query. The technique is called Query Expansion -- a process that restates a search query or adds words to it such as adding "soup" to "chicken" when looking for a recipe. It helps to better match the search query to documents.

There are two methods of search: statistics based, and deep learning models. The research paper discusses both the benefits and challenges. It suggests that TW-BERT is a way to bridge the two approaches without any shortcomings.

The paper's conclusion describe the current work of the researchers as modifying parts of an information retrieval (IR) system that are flexible by default, such as input search query weights and n-gram terms, but in the future, the researchers explain softening constraints to further improve performance.

"One direction is to consider weighing the document side terms as well," according to the paper. "Similarly, if both query and document sides were trained with expansion terms, we can perform finer-grained matching. This can be seen as SPLADE but from the perspective of retaining IR scoring functions and operating on term-level tokens. Another aspect to investigate is out-of-domain retrieval performance," explained in the research.

Next story loading loading..