The first state governments that Google will partner with include Arizona, California, Utah and Virginia, and the charge is to increase the crawlability of their respective Web sites and invisible databases for all major search engines - not just Google - via the Sitemaps.org protocol. The company also plans to educate government CTOs and CIOs on basic crawlability principles.
In effect, Google's efforts will result in the wider broadcasting of public data into the results pages of the major search engines, representing one of the biggest steps in the accessibility and distribution of pubic information since governments began making data accessible on Web sites.
J.L. Needham, manager of the Public Sector initiative, elaborated on the types of data the company is looking for: "Google is not seeking to make any specific type of information in a public government database accessible to search engine users -- we're intent on making all such information accessible.," said Needham. "So, this includes property records, court records, reports from a department of education on school performance, a health department's licensing records from medical practitioners, RFPs from a housing authority, a workforce services agency's job postings service, and on down [the] list of the dozens of governmental information services. If it's public information and not in a search engine, it's the target of Google's initiative."
Needham also pointed out that Google is only seeking data intended by the government to be truly public. He says, "Where the laws of a jurisdiction or procedures of a given agency render a particular type of information inaccessible to the public (such as California's law on driver's license data), such information falls outside the scope of this effort."
The implications are tremendous on a variety of fronts, including the potential for access to previously obscured databases, increased transparency of government data, and the provision of stiff competition from search engines to companies that profit from fee-based public data access. There is also potential for citizens to collaboratively assist the government in helping to scrub incorrect information, or remove data that is not intended to be public, such as Social Security numbers.
While it is every citizen's right to have open access to public data, the idea of having certain types of information exposed on the Web will likely make many feel a bit squeamish, especially when it is tied into common-name- and corporate-identity-related searches.
I ran a few sample queries related to some of the public records that have been made available, and found many to rank highly for proper name and corporate name queries. For one person who was a real estate broker, a search of his proper name yielded a #1 result in Google, suggesting a "consumer complaint," a "disciplinary action," and a two-month "suspension of broker license "-- all included in the snippet of text in the SERP.
There are two basic dynamics to this example search. The first is that if I happened to be researching this person in the interest of listing my house for sale, the information would be invaluable in making a decision whether or not to do business with him. Before search engines, this type of access might have taken days or weeks to access, or required a trip down to the public records building. But instead, as Needham put it, "[we would like to] ensure that citizens are just one search away from the information in public databases."
The second dynamic is of the real estate broker's perception of this results set. Upon becoming aware of his listing in the search results, he may believe that the sum of his identity, as inferred by a #1 ranking in Google, is unfair and not representative of his life and work. As this result is now a threat to his livelihood, he has a strong motivation to have this listing removed, or at least pushed down. The same might go for anyone else who did not like the perception of the way the results are stacked against their name, whether it is perceived to be a privacy violation, incorrect data or confusion of identity.
If the idea of Google and other search engines making this data easier to access seems outrageous, stop and take a deep breath, and remember: it's all public data. Open accessibility to public data is what the U.S. Government fully intended at its conception, partially in order to maintain the honesty and integrity of those who govern.
A generation ago, the main forms of accessing government documents were either via snail mail, phone, or a trip down to your local, state or federal archives - the latter more than average citizens could afford to exercise their rights. As Needham also said, "We should expect all information made public by our government to be truly public, which means accessiblity through search engines."
Google's goals and scope of the initiative are clear, and there's no question that the SERPs are about to be silently shaken up a bit. And it's going to be a wild ride for reputation management, accountability, open accessibility and search engine marketers alike.
Correction: Monday's Search Insider cited eBay's Google Search ad spend as $26 million monthly (prior to pulling out of Google Search advertising). The correct amount was roughly $25 million, quarterly.