Commentary

Web Grepper Lets Marketers Search, Grab Hidden Data

Mike-Markson

The Web is filled with interesting data, but for most it's not easily mined. The search engine Blekko launched a tool Wednesday that reveals coded information embedded in Web pages. It reveals patterns or strings, such as sites that discuss and review Lady Antebellum's latest album, as well as the number of Google +1s, Facebook Likes or Twitter tweets related to Apple's rumored intention to spice up MacBook for the holiday, for example.

Vice President of Marketing Mike Markson calls the tool Web Grepper. It works by searching lines of code in Web files to identify or match domains based on specific topics and search terms.

Blekko engineers created the tool after getting calls from brands and ad networks asking for information, such as a list of sites using Facebook Connect in rank order or sites running a specific ad network. The decision to open the tool to the public came after developing a democratic process to determine the list of jobs the search engine would run.

While having access to more data through an open infrastructure may become the biggest benefit of the tool, there are concerns about Web Grepper being used to hack sites to obtain personal and private information. That's why Blekko put a "velvet rope" around processes, Markson said.

Someone who requests data must submit a job to the community, which votes up the request. Markson said this will help prevent Blekko from running a job for someone to obtain malicious or private data, such as social security numbers.

When I asked Markson why Google has not offered this data, especially since the search engine prides its work on providing transparency to search marketing and advertising, he pointed to a recent Webmaster video from Google's Matt Cutts. In the video, published last month, Cutts calls the data found in Web pages "regular expressions" -- and explains that we should not expect that from Google anytime soon because of a lack of requests for the feature.

Next story loading loading..