Home > Online Media Daily > Friday, Aug 20, 2010

IBM Builds Search Engine, Redefines Analytics Tools

by Laurie Sullivan, Aug 19, 2010, 6:12 PM
  • Comment
  • Recommend

Subscribe to Online Media Daily


TAGS
analytics, research, search


numbers

IBM has created a search tool that allows North Carolina State University to crawl through massive amounts of Web data on blogs, forums, reports, industry related news portals and government Web sites. Similar to a search engine bot, the query gathers data and produces a short list of potential investors for projects.

The NC State's Office of Technology Transfer manages more than 3,000 technologies invented by students, facility and staff. The seven member staff typically manually searches the Internet looking for potential investors for projects to bring technologies to market.

"The analytics and the language tools take a user defined set of criteria and searches the Web," explains Billy Houghteling, director of NC State's Office of Technology Transfer. "For both pilots, we identified the sites and resources the tools needed to crawl. Both searched more than 1.4 million sites to find contacts. I can't fathom how long it would take for a member of my staff to do that type of exercise."

Historically, it would take between two and four months to identify a short list of potential investors. IBM's newly defined analytics "search engine" cuts that down to between 10 days and a couple of weeks. While the analytics tools validated the process, they also identified many new possible partners.

Developed in IBM Labs, the analytics technology-- BigSheets and Content Analyzer in the IBM Cognos analytics suite--used in the pilot crawls the Web and mines large amounts of unstructured data. The analysis, based on factors such as business relevancy, government policies, market needs and trends, cuts a time intensive and inefficient process.

While BigSheets, built on Hadoop technology, supports high-level ad hoc exploration of very large data sets, Content Analyzer provides sophisticated data analysis. Both tools offer a Web-based interface, but BigSheets provides a visualization feature highlighting the relationships between the data. Simply put, BigSheets keeps the data in its original format; ICA creates a data index while it scans the information.

Those using BigSheets would point the tool toward Web sites or data sources they want to mine and allow the application to collect the information. The person could then explore the data similar to a spreadsheet. Both tools crawl Web sites to find and collect data, but Content Analyzer indexes data and BigSheets parses it, storing bits and bytes in their original form.

Chris Spencer, emerging technology strategist at IBM, made it clear both tools follow Web site search and index guidelines presented in the site's robots.txt directives, so the tools are "friendly" crawls that follows the rules set by Web site owners.

NC State's has full use of the tool as they evaluate it, Spencer says. At the request of NC State, IBM continues to work with the university to determine other uses beyond requirements from the Office of Technology Transfer.



  • Comment
  • Recommend

0 comments on "IBM Builds Search Engine, Redefines Analytics Tools".

  1. Ian Gilyeat from I.R. Gilyeat & Company
    commented on: August 20, 2010 at 8:06 p.m.

    This looks like a piece of technology that could be useful to those doing family history research, criminal search or defense. Crawl the web for names looking for contextual information that relates to the those names.

Leave a Comment

You must be a member to comment. Become a Member




MOST READ

FOLLOW MEDIAPOST
  • Join
    Join over 100,000 media, advertising and marketing professionals for Free MediaPost Membership. Member Benefits »
  • Follow MediaPost News on LinkedIn Today

ARCHIVES
Recent Online Media Daily Articles
ReDigi Can Keep Selling 'Used' Tracks  
The startup ReDigi, which allows consumers to sell their unwanted "used" iTunes tracks, can continue operating ...
Google Unveils Solve For X Think Tank  
Google unveiled hints Monday to a project called Solve For X, which sources suggest will present ...
Online Retail Spend Hits Nearly $50B In Q4  
For the entire fourth quarter, online retail spending reached $49.7 billion -- up 14% year-over-year, according ...
Branded Multichannel Ad Campaigns Emerge From Super Bowl Spots  
Football fans didn't tap into mobile devices to catch the New York Giants win over the ...
Congress Urged To Seek Broad Input On Piracy Laws  
Amnesty International, Public Knowledge, Internet Archive and Mozilla are among a broad coalition of 70 groups ...
JetBlue Flies App, Updates Sites To Personalize Travel  
JetBlue Airways has launched an iPhone app and updated its mobile and PC-based Web sites in an effort ...
Doritos, M&Ms Trump Celebrities To Deliver Top Super Bowl Ad Ratings  
Doritos’ consumer-generated ads featuring a dastardly dog and a flying baby triumphed over a parade of ...
Super Bowl Ads Score Digital Response, H&M, 'Voice' Big Draws  
When it comes to consumers' digital responses to Super Bowl XLVI's TV advertising, there are two ...
Free Digital Lockers Have Wide Appeal  
Tech giants, including Apple, Amazon and Google, have all rolled out their own versions of digital ...
Internet Growth, e-Commerce Expands Globally  
Interest in Facebook’s pending IPO is huge -- and for good reason: New research from GlobalWebIndex ...
>> Online Media Daily Archives 
ABOUT MEDIAPOST • CONTACT EDITORIAL • MEDIA KIT • RSS FEEDS • PRIVACY/TERMS & CONDITIONS
©2012 MediaPost Communications. All rights reserved.
15 East 32nd Street, 7th Floor, New York, NY 10016
feedback@mediapost.com