February 09, 2011

Autocomplete Part I: Mapping the World of Autocomplete

Building on the recent fascination with the United States of Autocomplete map, we thought we'd expand its premise to look at the entire world. In short, we'd type the name of every country into Google and record the top ranked autocomplete, i.e., Google's guess on what you are looking for. Once we started working, it quickly became apparent that the results we were getting in the U.S. sometimes differed dramatically from the results we found in the United Kingdom.

Suddenly what had been a simple mapping exercise became an exciting means of better understanding the geographic differences in search patterns. Cool! You gotta love it when stuff like that happens.

Because it's hard to fit so much data in a static map, we've created a mashup that you can download as a KMZ file and view in Google Earth. (By the way, we hope you like the iconography. We've been looking for a good excuse to use it). As the map is a bit complicated a few words of explanation.
  • We used a list of countries maintained by the CIA World Factbook. Obviously this exercise can be replicated with any other list of place names.
  • We conducted the searches in January 2011.
  • The icons are generally centered over the capital city of a country.
  • The blue icons represent the autocomplete results obtained in the U.S. (specifically Lexington, KY) and the red icons (offset a bit for readability) represent the results from Oxford, UK.
  • The label for each icon contains the search term, the location of the search and the top ranked autocomplete result. For example, the label "India (UK): indian visa" indicates that the first autocomplete entry in Oxford was "Indian visa".



Take it out for a spin and see what you find. What we've noticed from this exercise is that the location of the searcher clearly matters. We're not exactly sure how Google decides what other searches to include in its autocomplete (nor do we think they will tell us) the differences in our results provide some clues.

  1. Google autocomplete is incorporating geocoded data. The best example of this is that in Lexington searches on terms China and Nicaragua return "China Star Lexington KY" and "Nicaraguan Grill Lexington KY"; two local restaurants in the city (by the way, the Nicaraguan Grill makes a great Nacatamale). This same geocoded effect does not show up in Oxford but the Lexington results show that there is a blending of regular search and spatial search.
  2. Second, the autocomplete suggestions appear to be shaped (in part) on the makeup of other user searches in geographic proximity. The example of the restaurants above support this idea as well as does the results for India in the U.K. and U.S. Whereas, "indian visa" is the first suggestion in the U.K. (reflecting the long colonial and migration connections) the first suggestion in Lexington is "Indianapolis Colts", a football team based only a few hundred miles away. Likewise a search for Panama in Lexington results in "Panama City Beach" (located in Florida) rather than "canal" as found in the U.K.
  3. Third, and perhaps most intriguing, is the way these differences illuminate the varying ways in which countries are conceived of (at least in terms of search queries) in separate locations. For example, in Lexington, both Kazakhstan and Bulgaria generate the suggestion of "adoption" (decidedly different that the U.K. results) perhaps linking these countries in the minds of near-Lexington based searchers with international adoption. While these countries are not the largest source of adopted children (China and Russia are 1 and 2) Bulgaria and Kazakhstan (in particular) are connected to the U.S. via adoption and moreover are less likely to have other competing searches. Hence adoption is the first suggestion. In a similar vein, a search for " British Indian Ocean Territory" in Lexington suggests "flag" while in Oxford "holiday" is the top result.
  4. There is also a clear element of temporal closeness. The search for North Korea results in "bombs South Korea" which was an important news story during our searches.
  5. It is also clear that correctly interpreting a user's intent based on limited input remains a challenge. A search for Turkey results in the suggestions of "brine" and "cooking time".
  6. Finally it seems that autocomplete suggestions are susceptible to spamming efforts the strong presence of commercial/business representations online. For example, "tractor parts" is the top result for a search on the term Belarus in Oxford most likely because the domain Belarus.com is for tractor manufacturer. Again, the low level of Belarusian references online is likely also contributing to this.

While these results are really enlightening getting a larger sample of searches from a range of locations is important to help explore this phenomenon. And this is where you dear reader come in. Stay tuned for the next post when we work on crowdsourcing the geography of autocomplete.

4 comments:

  1. Belarus has a large and famous tractor factory, that isn't spam.

    ReplyDelete
  2. I've been playing with the same thought, great to see some real investigation! It would be interesting to try with contentious search terms such as 'climate change' in several languages across several countries, that is looking not at how a specific country "view" other countries, but compare how different countries view a specific issue.

    ReplyDelete
  3. Alph, point well taken. I edited the content to reflect this. We were most interested in highlighting how the strong presence of one entity could dominate the autocomplete results for a country. Not spam but also a somewhat surprising representation.

    Andreas, this is an excellent idea. We'll see if we can pursue this as well.

    ReplyDelete
  4. On a much less granular level I was interested to see this week that Google India had a special logo and click-through for International Women's Day that .co.uk, .com and .co.jp were not using. Is this based on hits or input from their local staff?

    ReplyDelete

Note: only a member of this blog may post a comment.