Can Rare Words Identify Search Index Size?
Providing a table of rare words and a lot of context, Bill Slawski works through the process of approximating the size of a search engine's index based on the rare words that appear in queries. He writes that knowing the size of the index allows him to compare Bing with Google, for example.
Slawski explains that the rare word count works better than looking at the most frequently appearing words. So, he identifies English language words to use from less than 1,000 search results on Google Caffeine, Google, Yahoo, Bing, Ask and Cuil by looking at the phrontistery's Compendium of Lost Words, and then searching for the terms to get started.