Around the Net

Google, Yahoo And The Library of Babel

Google's Tower of Babel-like goal to capture all the world's information in a searchable database conjures up images of Borges' "Library of Babel" from his masterpiece The Fictions, which contained not only all of the books that have been written in every language, but also those that had yet to be completed. With more than half of its projects yet to be completed, that sounds something like Google, all right. But what will it take for Google (or Yahoo, Microsoft and the Open Content Alliance) to truly assemble a modern-day "Library of Babel?" According to a recent New York Times Magazine article, these projects are scanning about a million books a year, which currently amounts to 5 percent of the books in print. And what about Web search? Search Engine Watch asks. It's far from complete; there are millions of lost Web pages out there, and others that are still invisible to the search engines. How will the major engines recoup this "ephemeral literature?" Recent studies suggest that the half-life of a given Web page is just under two years, after which time many of these pages disappear if they are no longer linked to. While Google and Yahoo most likely haven't thrown any pages away, the Internet Archive--the most complete, publicly accessible archive of the Web--only has about 55 billion pages, a fraction of the content that's been posted to the Web. If they really wanted to, could Google and Yahoo restore all those potentially lost Web pages? It's "not unthinkable," Search Engine Watch says, but it would certainly take greater resources than either company has or is prepared to devote to such a project at-present.

Read the whole story at Search Engine Watch »

Next story loading loading..