Commentary

Verizon's Oath Open Sources Yahoo's Vespa Search Technology

Oath, the Verizon division that combines AOL and Yahoo, has released source code from Vespa, a tool acquired by Yahoo with the acquisition of the search engine AlltheWeb. The technology crunches data and is used to power Yahoo search services. The idea is to build out a network of developers to use the technology.

While Yahoo search is powered by Bing, the company still uses Vespa for search results and to serve a list of recommended articles across Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr and others. It processes and serves billions of daily requests for billions of documents while responding to search queries, making recommendations, and providing personalized content and advertisements.

"To deliver a search result or a list of recommended articles to a user, you need to find all the items matching the query, determine how good each item is for the particular request using a relevance/recommendation model, organize the matches to remove duplicates, add navigation aids, and then return a response to the user," Jon Bratseth, distinguished architect, Vespa, wrote in a blog post. 

advertisement

advertisement

Vespa was developed following Yahoo's acquisition of AlltheWeb, but soon after engineers began rewriting the search technology into more of a general purpose tool.

It's interesting to note that Vespa processes and serves content and ads almost 90,000 times every second with latencies in the tens of milliseconds. 

Since the code is now open source, any company or individual can use or modify Vespa. "By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale -- capabilities that up until now, have been within reach of only a few large companies," Bratseth wrote.

Bratseth explains that by using Vespa, Oath's team has built a variety of applications to organize matches that generate data-driven pages, serves results with responses times in the low milliseconds, and writes data in real-time that equates to about thousands of times per second per node.  

Next story loading loading..