December 24, 2009

Happy Holidays from

With good tidings from all of us, the Floatingsheep team wishes you all a very happy holiday season, no matter your religious preference. With Christmas coming soon, we don't want you to go out chasing Old Saint Nick, as we're still a bit unsure of his whereabouts. We are sure, however, that our investigative work on Christmassy geographies is featured in the December 24th edition of the Lexington Herald-Leader. From the article:
Deep in the bowels of the geography department earlier this month, while Zook was engaged in his real work on how people use spatially based Internet data, he thought he'd come up with what passes for academic humor. He wondered how he could locate the exact whereabouts of Santa (because, really, who doesn't want to know this?) and torture his graduate students (and, really, who doesn't want to do this?) at the same time.
Enjoy the write-up by Amy Wilson and your holiday celebration, no matter where you may be. And if you're really that concerned about where Santa is, you can always track him on Google Maps throughout Christmas Eve. HO HO HO!

December 20, 2009

Searching for Santa: Locating the most Christmassy Points in the World

A question asked by children and adults for generations has been, "Where does Santa live?" While some may scoff that there is an obvious answer to this ("The North Pole") any rational thinker easily sees why that simply cannot be. The lack of a suitable landmass to construct the necessary castle and workshops, the deficit of a robust power grid and the complete absence of basic raw materials like wood, plastic or sugared plums, make the North Pole a poor location for any sort of industrial- or craft-style production. Moreover, the modern obsession with planting flags (both above and beneath the ice) guarantees a steady stream of unwanted (and potentially naughty) visitors.

It is far more reasonable to suppose that Santa has utilized a combination of locational analysis, centrography, transportation topographies and central place theory to select an optimal site for his headquarters. However, since access to his list of priorities (including secrecy) and model specifications is closely guarded, replicating Santa's thinking process is simply not possible.

Instead the Anglo-American research team of decided to leverage the power of Web 2.0 technologies (user produced services and content) to triangulate Santa's location. After all the collective knowledge of the Internet is clearly more than any one of us alone. Right? Right?

Using the patented approach we searched for references to "santa" and "reindeer" in user generated placemarks indexed by Google Maps. After all, Santa and Reindeer go together almost as well as that classic cinema couple, Turner and Hooch. Unfortunately for lovers of folk tales, the polar projections below illustrate that there is a decided dearth of references to Santa at the North Pole.

Polar Projection of Santa
Instead we see that the entire Nordic region of Europe is covered in a virtual "duvet of Santa"! North America needs to be content with a much lighter "blanketing of St. Nick". If one assumes that Santa needs to be located as close to the pole as possible, then a few other extreme northern locations also emerge, such as the "coverlet-ing of Father Christmas" on Svalbard and the "quilt-ing of Pere Noel" on the Severnaya Zemlya archipeligo.

Polar Projection of Reindeer
Reindeer are much less prevalent than Santa (which is hard to understand given the 8:1 ratio) but the Nordic region, Svalbard and Alaska are all looking like strong contenders.

However, it is only when we amalgamate Santa and Reindeer together in some kind of googlistic geo-genetic goo that we are able to zero in on the exact locations of Santa's global enterprise. (And they called us MAD! We'll show them!) We will of course not reveal the exact locations (we're hoping for more than coal in our stockings) but will highlight the general areas.

The MegaChristmas Index – Global View

The MegaChristmas Index – Polar View
In retrospect it seems so obvious, but the most Christmassy points in the world are Los Angeles (measured in raw Christmasness) and near the town of Kittilä, Finland (measured in Christmasness per capita). Clearly in the 21st century, Santa has recognized the value of geographical diversification in order to leverage the competitive advantages of each location. Los Angeles offers access to the creative talent of show business and the technological innovation of a world class manufacturing milieu. Kittilä offers...Trees? Moss? Rare Lichen? we are less familiar with Northern Finland as befits some one in today's networked society, the locational advantages of Kittilä must wait until another posting. Any Kittilä-ites (-onians? –ese? –ians?) are welcome to address this issue as well.

We were at first stymied by the strong showing of Angola for reindeer but upon reflection we theorize that this is a likely location of Santa's post-December vacation. According to this theory, Santa flies his reindeer team for several well deserved weeks of R&R incognito. Since reindeer, however, are not indigenous to tropical climates, their presence does not go unnoted. Likewise, trips to the Falkland Islands, New Zealand, Australia and Florida seem highly probably as well. It should be noted that this is simply a theory and unlike the rigorous analysis on the location of Santa's workshop, further research on this topic is needed.

Likewise we plan on taking a closer look at the sub-national networks of Santa's enterprise. The U.S. maps below confirm Southern California's Santaness but shows some highly suspicious clusters of reindeerness in Texas and Missouri. Do these represent regional distribution centers? R&D centers? Back office customer support? Only further research will tell.

Santa Normalized in the U.S.
Reindeer Normalized in the U.S.

So. Age old question answered through the judicious use of technology.

We just hope we don't end up on the naughty list for this.

December 16, 2009

User-Created Geographies of Religion: Allah, Buddha, Hindu, Jesus

Are there distinct geographies to religious references in user-created content indexed by Google? The following maps will demonstrate that there undoubtedly are.

User Generated References to Allah

User Generated References to Buddha

User Generated References to Hindu

User Generated References to Jesus

High rankings (in terms of specialization and absolute references) are often found in the most likely regions. For example, the Middle East, North Africa, and Muslim parts of South and Southeast Asia are all characterised by a significant amount of specialization and a large number of references to "Allah."

References to "Buddha" are similarly clustered in East and Southeast Asia, the Himalayas and Sri Lanka. The geography of references to "Hindu" is even more clustered. Here, the Indian Subcontinent, Afghanistan, Angkor Wat, Bali, Singapore and Kuala Lumpur (two cities with large Indian populations) have a large number of references.

References to "Jesus" are more broadly distributed than any of the other three terms, but still show an incredible degree of concentration. The Americas, Western Europe and the Philippines are blanketed by references to Jesus.

Unlike user generated references to sex and business, religious search terms tend to display a geographic concentration in both absolute and relative terms. Or in other words, it is interesting that sex and business are far more global in scope than even these four very global religions.

December 14, 2009

Peer produced business and sex

One of the real advantages of user generated placemarks is that there are no restrictions on the type of references that can be made. Historical references, pop culture icons and everyday minutia are all potential topics for placemarks. With this breadth in mind, we wanted to see how the common global memes of "business" and "sex" become evident via the geoweb.

In each of the following maps, the size of the black circles indicates the absolute number of references to either "sex" or "business" in user-created Google placemarks. The shading of each map represents the specialization in references to each term (each term was compared to an index of all other user-generated content).

User Generated References to Business
Sex and business clearly have distinct albeit related geographies. Not surprisingly the developed world has the largest concentration of both types of placemarks; consistent with the information inequality we've already noted.

North America, Japan and much of Europe are largely blanketed by references to business., while most of the rest of the world is characterized by far fewer virtual references. The UK and North America also have a high degree of specialization in terms of references to business, but high values are also present in non-Western countries that have strong ties to global business networks. As the largest low cost manufacturer, China shows a high degree of business specialization as does much of Central America which recently entered the Central America Free Trade Agreement (CAFTA) accord. The two largest economies of sub-Saharan Africa (Nigeria and South Africa) are specialized in business, as is the U.A.E. (where Dubai is located). Other countries such as Indonesia and Hungary are highly specialized as well.

User Generated References to Sex
Interestingly, references to business, are much more geographically dispersed than references to sex. Again, in absolute terms, the United States, Northern Europe and Japan have by far the most references to sex. However, when looking at specialization, intriguing patterns emerge. The United States and parts of Northern Europe (particularly the UK, Sweden, Germany, the Netherlands, Iceland, and for some reason the Norwegian Island of Svalbard) continue to be ranked highly.

Yet it is large parts of Africa that contain the highest degree of specialization. Or, in other words, user-generated content in countries like Nigeria, Kenya and Tunisia is far more likely to contain references to sex than user-generated content in most other places. While one would expect to see a degree of specialization in countries like the Netherlands (due to well known sex industries of Amsterdam), the amount of specialization in places like Mauritania, Zambia and Lesotho is truly surprising. It could simply be a spurious result based on the generally low number of user generated placemarks in those locations. Alternatively it suggests that "sex" may be one of the first topics in which people comment about a place and it is only later that more mainstream foci appear.

December 11, 2009

Finding a Restaurant

Finding a restaurant can be one of the most vexing tasks in modern life and an extremely useful application of Google Maps is getting help locating nearby establishments. The map below shows the number of user-generated placemarks containing the word "restaurant". The density of restaurant references corresponds closely with the distribution of population in the United States and Canada. In particular, the densely populated Northeast is blanketed with New York City containing the largest concentration.
When user generated placemarks are compared to regular Google Maps directory listings one sees essentially the same pattern of clusters, albeit and a higher density. For example, the largest number of directory listings of restaurants (again in New York City) is about 25 percent higher than user generated ones. Moreover, more rural areas (see the eastern U.S.) clearly have a high number of directory listing relative to user generated ones.
This suggests that user generated placemarks are biased towards urban areas where early technology adopters are most likely to dwell and use.

December 10, 2009

Swine flu: a user-generated pandemic?

In a recent post at, Nate Silver delves into mapping the spatio-temporal diffusion of swine flu in the US, via Google Flu Trends. Drawing from queries referencing swine flu, the map below shows the approximate date at which state-wide searches for "swine flu" crossed a particular threshold, potentially signifying the onset of what has become a swine flu pandemic. According to Silver, the date at which the relative number of searches reaches the indexed value of 5000 serves as a proxy for measuring the diffusion of the year's most talked about genetic mix-up.
So we know when and where people were looking for information about swine flu, but what about geo-references to the virus? How does the geography of swine flu differ between Google Flu Trends and user-generated Google Maps placemarks? How do Google's multiple representations compare to the actual number of cases of swine flu in the United States?

Although the CDC has stopped collecting data on the outbreak of swine flu on a state-by-state basis, the regional-level data in the map above shows the concentration of swine flu cases. The upper Midwest, for example, which has the highest number of swine flu infections in the country, only recently surpassed the 5000 point mark on Google Flu Trends. Clearly the act of searching for information on swine flu need not closely correspond to the number of cases. And while this region shows significant clustering in user-generated Google Maps placemarks, the values fail to approach the maximums for the nation as a whole. The peer produced geography of swine flu also seems to support CDC statistics for the southeastern US (showing a relatively high infection rate), while the Flu Trends data fails to match accordingly both there and along the US-Mexico border.
The greatest number of mentions of swine flu in user-generated placemarks is located in Baltimore, Maryland - part of District 3, which is home to the second-most cases of swine flu in the US. However, as one moves up the DC-Philadelphia-NYC-Boston metropolitan corridor there is an increasing disconnection between the online representations and material reality of swine flu. Although the absolute and population-adjusted number of actual swine flu cases in Regions 1 and 2 (home to Boston and New York respectively) are relatively low compared to other regions, they are highly visible in terms of user generated placemarks references to H1N1 or swine flu.
The population-adjusted map does, however, give a much clearer picture of the swine flu landscape in the US. Both the west coast and upper midwest, despite having the highest incidence of swine flu in the country, were previously overshadowed by the population centers of the east coast. Normalized by population, the placemark density comes to mirror much more closely the actual diffusion of swine flu across the country.

December 08, 2009

Toronto and Cape Cod are the "funnest" places in North America

These maps illustrate the distribution of "fun" in North America as defined by user generated placemarks containing the term. Luckily for society, fun seems to be well dispersed and corresponds with the distribution of population. In other words, where there are people there is also fun. But one can also see concentrations and specializations in fun.

For example, Toronto has a massive (dare we say strategic?) reserve of fun clustered around it. Who knew? I have fond memories of my trips to Toronto but had no idea. The film festival is great, the neighborhoods are fantastic and the underground walkways keep you warm in the winter but how does it all come together to make this mother lode of fun? Jane Jacobs clearly had it right. Perhaps this will become the next invisible export for the region's economy.

Also the Northwest is suspiciously fun. How does that work with all the rain?

Clearly, some means of standardizing "fun" needs to be down to separate the large concentrations from the places that truly specialize in fun. When we use population, i.e., fun per capita, it turns out that Cape Cod, a place outside of Ogden, Utah and Cancun, Mexico have the most fun per person in North America. But before you start planning a vacation to the Great Salt Lake, remember that the high showing outside of Ogden was largely due to a very small population figure.

December 03, 2009

November 30, 2009

Baptists, bibliophiles, and bibles, Oh My!

Two powerful and often opposing forces within society are faith and reason. Regardless of the extent to which a cultural war exists, the balance between the two (e.g., teaching evolution in the schools, etc.) is a prominent feature of popular socio-political discourse in the United States. Thus, the topics makes a perfect subject of a map and leads us to ask which parts of the country prefer bookstores to bibles? What's the ratio of Baptists to bibliophiles?
Using the number of Google Maps directory listings[1] for "bookstores" and "churches" as proxy values, this visualization maps the spectrum of the faith and reason conflict. As there are an overwhelmingly larger number of churches than bookstores nationwide it is important to index each of these variables before comparison. The technique used in this map was to divide the number of churches (or bookstores) at a location by the national average of churches or bookstores. If a location had twice the number of churches as the national average it would receive an indexed value of 2. Similarly having only 50 percent of the national average of bookstores would produce an indexed value of 0.5. The church index was then divided by the bookstore index to see each locations relative balance of churches to bookstores. If each of the indexed values were the same, the faith-reason index would be equal to 1. But as in the case of the example above (church index = 2, bookstore index = 0.5) the faith-reason index would be 4. This indicates that this particular location has a much higher relative number of churches to bookstores. In order to exclude places that had approximately equal number of churches and bookstores, this map only includes locations where the faith-reason index was skewed more than 20 percent in either direction (i.e., values greater than 1.2).

For the most part, the relative prevalence of bookstores occurs in and around the big cities - Los Angeles, California is the site of the highest indexed value, and is joined by the megalopolis of the eastern seaboard as having the highest concentrations in favor of bookstores. Even cities such as Atlanta, nestled in the Bible Belt of the American southeast, tend towards a relatively large number of bookstores. On the converse, other large cities like Dallas, San Antonio and Houston continue to favor churches, with New Orleans (the largest city in Louisiana) having the highest relative concentration of churches in the nation. Suburban areas surrounding large population centers also show a near-universal favoritism for churches.

So while there appears to be no single variable determining the local trends toward faith or reason, it is evident that even some of the most common assumptions regarding the geographies of faith and reason have proven to be more complicated; not all large cities are necessarily bookish, but neither is the bible belt a homogeneous geographic unit.

[1] Google Maps directories are drawn from a range of sources such as yellow page listings. This category is distinct from and excludes user generated placemarks that we use in other maps.

November 17, 2009

Mapping Wikipedia

The following maps are the first of a series that will be made in order to map out the distinct geographies of Wikipedia. Many Wikipedia articles (about half a million) are either about a place or an event that occurred within a place, and most of these geographic articles handily contain a set of coordinates that can be imported into mapping software.

The map below displays the total number of Wikipedia articles tagged to each country. The country with the most articles is the United States (almost 90,000 articles), while most small island nations and city states have less than 100 articles. However, it is not just microstates that are characterised by extremely low levels of wiki representation. Almost all of Africa is poorly represented in Wikipedia. Remarkably there are more Wikipedia articles written about Antarctica than all but one of the fifty-three countries in Africa (or perhaps even more amazingly, there are more Wikipedia articles written about the fictional places of Middle Earth and Discworld than about many countries in Africa, the Americas and Asia).

When examining the data normalised by area, an entirely different pattern is evident. Central and Western Europe, Japan and Israel have the most articles per square kilometre, while large countries like Russia and Canada have low ratios of Wikipedia articles per area.

Finally, the data were also mapped out against population. Here countries with small populations and large landmasses rise to the top of the rankings. Canada, Australia and Greenland all have extremely high levels of articles per every 100,000 people. Smaller nations with many noteworthy features or geotaggable events also appear high in the rankings (e.g. Pitcairn or Iceland).

Presences and absences play a fundamental role in shaping how we interpret and interact with the world. The fact that the geographies of Wikipedia content are so uneven therefore leads to worrying conclusions. As we increasingly rely on peer produced information, large parts of the world remain a digital 'terra incognita' (in a similar manner to the ways in which many of those same places were represented on European maps before the 19th Century).

More maps examining the distribution of content in specific languages, and looking in more detail at specific regions will be uploaded soon.

November 16, 2009

Visualizing the abortion debate

Abortion is a hotly contested political issue in the United States, as it has been since even before the Supreme Court's decision in Roe v. Wade in 1973. Regardless of one's position on the matter, the ongoing debate often lends itself to hyperbole, obscuring the observable facts.In this visualization, the difference between the number of abortion alternatives and abortion providers listed in the Google Maps directory is mapped across the US in quarter degree intervals. The greatest difference in favor of abortion providers is found in New York City, with Los Angeles and Seattle representing a similarly disproportionate number of abortion providers. Similar to some previous maps we've published, this concentration of abortion providers has a strong urban bias. However, there are many cities such as Atlanta, Dallas and Cincinnati which have more abortion alternatives than providers while some rural areas such as upstate New York and Maine have more providers.

Overall, the blue coverage across the United States shows that, in a vast majority of the country, abortion alternatives are much easier to find than abortion providers. So while the "pro-life" camp ended up on the wrong side of the 1973 Supreme Court ruling legalizing abortion, they have built a significant organizational infrastructure which can be leveraged to promote their cause, while "pro-choice" advocates remain concentrated primarily in the nation's more politically progressive urban centers.

November 07, 2009

Where in the world is Barack Obama? (and John McCain, too!)

To follow up on our previous map showing the difference in the number of mentions between Barack Obama and John McCain in user-generated Google Maps content prior to the 2008 US Presidential Election, we figured an alternative visualization might be beneficial. The following maps represent the absolute number of mentions of Obama and McCain, respectively, in user-generated placemarks, a disaggregation of the map in our previous post.
This map, much like the previous iteration, shows the vast concentration of user-generated placemarks mentioning Obama in the nation's urban centers. The nation's largest cities - New York City, Los Angeles and Chicago - all appear prominently in this map. Although many of the notable points in both the Obama and McCain maps can be attributed to the large populations (and thus, presumably, a greater level of connectedness), a number of other explanations remain necessary. Despite being the 3rd largest city in the United States, Chicago is also the home of Barack Obama, and it houses the highest concentration of placemarks that mention his name. Significant events also seem assert their presence spatially, as Denver, Colorado, the site of the 2008 Democratic National Convention, is another relatively well-represented area, along with Portland, Oregon, where 70000+ rallied for Obama in May 2008.
Mirroring the already established pattern of urban primacy, much of McCain's presence is concentrated in the nation's urban centers, again including both New York City and the Washington, DC metro area (where McCain has the highest concentration). Unlike Obama, the places McCain is best represented in Google Maps were not necessarily the places he fared the best during either the primary or general election. For example, both Iowa and Michigan, in which McCain receives a nearly uniform number of mentions across the state, voted against him in both the primary and general elections.

Despite some of these patterns of user-generated content merely confirming the primacy of urban areas in virtual representations of the material world, others depart significantly from the predicted spatial clustering. Some areas that voted for McCain feature more prominently in the user-generated representations for Barack Obama, and vice versa, with the number of mentions for Barack Obama being more than double the number of mentions for John McCain. Although not all of the patterns displayed can be easily attributed to a particular causal factor, they only further complicate the relational geographies of the virtual and material world.

October 17, 2009

Google Mapping the 2008 US Presidential Election

Despite being highly contentious, the 2008 US Presidential Election resulted in an overwhelming electoral college victory by President Barack Obama. This map shows the difference in the number of mentions of Barack Obama and Republican candidate John McCain in user-generated placemarks indexed by Google. This peer-produced representation is remarkably similar to more official cartographic representations of the final election results, with a couple of notable exceptions.

Because placemark concentration is correlated with large urban populations, even the states that overwhelmingly voted for Senator McCain seem to favor Obama. This concentration of placemarks in urban areas show a significant advantage for Obama, mirroring his successes during the election. Another anomaly is the red clustering in New Hampshire, a state in which Obama defeated McCain 54%-45%. However, this cluster can be explained by McCain's momentum-building primary win in the Granite State, which eventually propelled him on to the GOP nomination.

Following J.B. Harley (1988), we should also take interest in the silences of this map. Here the primarily rural areas contain either no user-generated placemark information or an equal number of mentions for both Obama and McCain, but nonetheless appear uniformly devoid of content.

July 10, 2009

The Virtual ‘Bible Belt’

The size of the dots in this map represents the relative number of mentions of the word “church” in placemarks uploaded to Google. Results for the word “church” have been divided by the "0" and "1" baseline measure (see the last two blog posts), thus highlighting the parts of North America in which mentions of the word “church” are over- and under- represented. Interestingly, while the “bible belt” in the physical world is often talked about as being synonymous with the American South, the virtual “bible-belt” additionally incorporates large parts of the Midwest. Less surprising is the fact that the Northeast and the West have relatively low scores. The GeoWeb is in many ways a mirror (albeit a distorted one) of the physical places that it represents.

June 22, 2009

Information Inequality

Following on from the last post, here are some examples of Google placemark inequality:

First of all, China offers perhaps one of the most striking examples of regional disparities. Beijing, Shanghai, and the Pearl River Delta Region all are characterized by heavy information densities. In other words, a lot of information has been created and uploaded about these places. However, much of the rest of the country has very little cyber-presence within the Google Geoweb. In the map below, the height of each bar is an indicator the number of placemarks in each location.

The U.S.-Mexico border along the Rio Grande river offers a similarly striking contrast between high and low information densities.

The border between North and South Korea offers another example of placemark density not being correlated to population density. For obvious reasons, very little information is being created and uploaded about North Korea. In the map below (top), each dot represents 100+ placemarks. Interestingly, there are strong similarities between the map of placemarks on the Korean Peninsula, and satellite maps of lights visible from the Peninsula at night (bottom).

image source:

Information inequalities are clearly a defining characteristic of the Geoweb. Some places are highly visible, while others remain a virtual terra incognita. In particular, Africa, South America, and large parts of Asia are being left out of the flurry of mapping that is happing online (e.g. the Tokyo/Yokohama metro region has almost three times as many 0/1 placemark hits (923,034) as the entire continent of Africa (311,770)). Some of the geographical implications of cyber-visibility and invisibility have been examined in part (e.g. here and here), but there is clearly a lot more to be discussed. In particular, because Google allows any keyword to be searched for (not only "0" and "1"), we are able to explore not only the raw amounts of information attached to each place, but also the contents of that information.

June 15, 2009

Global Placemark Intensity

The following map shows the intensity of google placemarks on a global scale. Using custom-designed software, a dataset was created based on a 1/4 degree grid of all the land mass in the world (roughly 250,000 points). For each point a search was run on the numbers “0” and “1” in order to create a baseline measure of the amount of online geo-referenced content in each place. In the below map, every place with more than 100 placemarks is highlighted with a yellow dot.

The same method was used to create a map that highlights every place on the globe containing more than 1000 placemark hits:

When compared to a map of population density (see the map below), the distinct geographies of placemarks become apparent.

Image source: NASA

These maps suggest that the GeoWeb is far from being a simple mirror of population density or human activity. Online representations of the physical world are highly concentrated in North America, Western Europe, and the more affluent parts of East Asia and Australasia. Maps displaying placemark density on regional and local scales will be explored in more detail in the next post.

June 09, 2009


Over the last few years a range of terms dealing with the links between cyberspaces and physical spaces have been popularised. The Geoweb, Volunteered Geographic Information, Maps 2.0, Neogeography, Code/Space, and Digiplace all symbolise some aspect of the changes that are occurring to the ways that Geography is both represented and experienced.

The increasing availability of spatial data and the rise of Web 2.0 applications have helped produce a previously unimaginable collection of online spatial information. Hundreds of thousands of people from around the world are creating, uploading, and sharing reviews, guides, images, videos, stories, and descriptions that have one thing in common: they have been tagged to some point on the Earth’s surface.

User-created geographic data represent more than just a simple online database of maps. They instead create another layer to the physical world; they become a component of our understandings of place and, as such, influence the ways in which we move through and interact with the rest of the world.

Our research project therefore focuses on understanding how the Geoweb is structured, and the effects that it has on the offline world. Some examples of questions that we hope to explore both empirically and theoretically include:

  • What kind of information is being provided?
  • Who is writing this information?
  • How accurate/reliable is this information?
  • How do we get access to it?
  • How is filtered and ranked?
  • How are the resulting Cyberspaces/DigiPlaces being used?
  • What places are being annotated?
  • How does the Geoweb vary by scale and topic?

This blog will serve as a sketch pad for these, and other, questions. New findings will be posted, discussed, and left open for comment. More soon....

Matthew Zook and Mark Graham