Found in Cyberspace

  • by July 28, 2000
How many times have you searched the World Wide Web in vain, trying to find a document that you knew existed — or waded through a myriad of irrelevant search results hoping to find that one critical piece of data? The information you’ve been looking for is there, but it is buried beneath the surface of the Web, out of reach of current search technology.

The Internet has become so large so fast that sophisticated search engines are just scratching the surface of the Web's vast information reservoir, according to a new study released Wednesday by BrightPlanet, a South Dakota based Internet content company.

According to this first-ever study documenting the "deep" Web - a massive storehouse of databases and information that is unseen to existing search engines - cyberspace is 500 times larger than we have previously thought.

Mike Bergman, BrightPlanet's Chairman said, "frankly, what's been missed until now is the absolutely huge scale, importance and quality of information within the deep Web."

The BrightPlanet study estimates there are more than 100,000 content-rich searchable databases publicly available within the deep Web. Bergman said these sites collectively have information relevant to any need, citing as examples IBM's patent site, 10KWizard's database of SEC company filings, genome databases, the Costa Rica Supersite, genealogy records, historical sports statistics, NIH PubMed biomedical publications, and law cases and decisions.

Other findings from BrightPlanet's 41-page white paper, The "Deep" Web: Surfacing Hidden Value are:

- The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the "surface" Web indexed by search engines

- The deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the surface Web

- The deep Web is the fastest growing category of new information on the Internet

- Total quality content of the deep Web is at least 1,000 to 2,000 times greater than that of the surface Web

- Deep Web content is highly relevant to every information need, market and domain

- A full 95% of the deep Web is publicly accessible information - not subject to fees or subscriptions.

The reason the deep Web has been hidden in plain sight is today's reliance on search engines for content discovery on the Web. Existing search engines catalog the surface Web using spiders or crawlers that follow links on static web pages, akin to ripples spreading across a pond.

The deep Web is made up of searchable databases, with results that are only served up dynamically in answer to a direct query. Though search engines may point to the doorways of these databases, they can not find or search the contents housed inside.

Thane Paulsen, BrightPlanet's General Manager, likens traditional search engines to trawlers moving through the ocean, using coarse nets that are wide, but only reach a few feet deep.

Not surprisingly, relativ

Next story loading loading..