Commentary

Just an Online Minute... Inflated Traffic

It's no secret that activity from Web crawlers, robots, and spiders inflates website traffic figures. And because Web crawler activity is difficult to distinguish from human activity recorded on Web server log files, Web operators and advertisers don't often get accurate site traffic data from log file reports, which results in lots of wasted time and advertising dollars.

According to website auditor ABC Interactive, an average of 7% of page requests on every website are from robots and the figure can range higher than 30% depending on the site.

WebSideStory, the company that specializes in website visitor behavior analysis, published a somewhat self-serving white paper on this very subject recently, entitled "When Web Traffic Statistics Do Not Compute." WebSideStory is the maker of HitBox Enterprise - www.hitboxenterprise.com, a real-time visitor analysis service I've praised on these pages in the past - which could be a solution to the inflated traffic figure problem.

The paper does a good job of explaining that Web crawlers, notably spiders and robots, are software tools that automatically retrieve requested data, often through the use of a search engine, by "crawling" multiple Web servers looking for content to include in a database index. Server log files, which many companies rely on for gathering Web visitor data, record traffic from robots and spiders in page view counts.

When these reports are not properly filtered or audited, the information does not accurately reflect human activity. Software that identifies and discounts activity from specified individual robots is available, but to be effective, it must be properly configured, run, and updated on an ongoing basis.

WebSideStory says their HitBox Enterprise collects data directly from the browsers of visitors to a site. Most robots - of which there are about 750 - harvest only textual content and do not download images or execute JavaScript code. By using JavaScript and an image request, HitBox Enterprise filters robots with a fairly high success rate.

Bottom line here? I recommend you request HitBox numbers from the websites you buy ads or risk paying for ad impressions seen by robots.

Next story loading loading..