floatingsheep: oii

Showing posts with label oii. Show all posts

June 28, 2013

The Geography of #StandWithWendy Tweets

The filibuster by Texas State Senator Wendy Davis on June 25th to block a new piece of legislation that would have resulted in many more restrictions on abortion in Texas brought a lot of attention to the Lone Star state this week. Day-long filibusters, parliamentary machinations, vocal protesters, and changing the time stamps on votes all make for great political theater, even more so as it involves a highly contentious issue and inter-party fights. From our perspective, one of the most compelling elements of this story was the strong response within social media (including Twitter) that this event engendered. In the course of a few hours, tens (or even hundreds) of thousands of tweets were sent using the hashtag #standwithwendy in order to show their support for the senator's efforts.

We collected all geocoded tweets from June 25th and 26th that contained the text "standwithwendy", resulting in a dataset of 3,702 tweets. Although we are primarily interested in the spatial dimension of tweeting activity, the way this event played out over time is particularly interesting. Using our dataset, one can see how this event - or at least its reflection in Twitterspace - started building around 8pm on the 25th and peaking around midnight as the deadline for the special session neared, though it maintained momentum well into the early hours of the 26th when the legislative session was officially declared over and the bill defeated.

Temporal Distribution of Relative Frequency of

Tweets Containing #StandWithWendy at the County Level

Blue = relatively more #StandWithWendy Tweets; Red = relatively fewer

Source: DOLLY, n = 3702 #StandWithWendy tweets on June 25th and 26th, 2013; Normalized by the total number of tweets sent during the same time period; The peak is at ~700 tweets right around midnight

Returning to our primary interest in the spatial distribution of tweets, it should come as no surprise that Texas had by far the most tweets, around a thousand in all, or 28.7% of all tweets with the aforementioned hashtag. While Texas is home to six of the twenty largest cities in the US, and thus is likely to have a significant number of tweets based on its population alone, the state is over-represented in the corpus of #StandWithWendy tweets by ~3.5x, relative to its share of the total US population (Texas constitutes around 8% of the country's population), so there is an obvious localizing effect that comes with being the epicenter of this debate. But the phenomenon was far from limited to Texas, with many tweets coming from around the country, though the rest of these tweets much more closely resemble the distribution of population.

Percentage of Tweets by State (blue text) & Location of Each Tweet (pink dot)

Source: DOLLY, n = 3702 #StandWithWendy tweets on June 25th and 26th, 2013; Darker shading indicates greater intensity

The spatial differences are particularly telling when one looks not just at the raw number of tweets, but rather a value normalized by the total number of tweets sent during this time. Doing so allows us to avoid simply highlighting those places with a large number of people by comparing a given place's production of #StandWithWendy tweets relative to its 'usual' tweet output.

The map below shows this normalized distribution. Darker shaded states have relatively more tweets containing #StandWithWendy than the national average, and lighter states have relatively fewer tweets. The darker the shading the greater the intensity. That Texas remains shaded dark grey in this map is further indication of the above point that its high volume of tweets in this case goes beyond simply its mass of population, while it becomes evident that the large amount of tweeting in California and New York is more dependent on its population than on any unusual interest in the issue by users in those states. South Carolina and Kentucky were the biggest standouts in terms of having relatively few tweets on the subject.

Geographic Distribution of Relative Frequency of

Tweets Containing #StandWithWendy at the State Level

Source: DOLLY, n = 3702 #StandWithWendy tweets on June 25th and 26th, 2013; Darker shading indicates greater intensity; Normalized by the total number of tweets sent during the same time period

Overall, there seems to be a general pattern of more tweets in the Northeast, Upper Great Plains and West Coast, while states in the Southeast, Mid-Atlantic, Midwest and Southwest have relatively fewer. But as this is a quick analysis, we'd caution against reading too much into this.

We can also look into the relative amount of tweets at the county level. The map below shows a small section of the country from Texas to South Carolina. One can see that Austin, the location of the state capitol and Senator Davis' filibuster, is very over represented in the number of tweets, as are many other places in Texas. One of the most interesting patterns is within larger metropolitan areas in which the level of tweeting activity around #StandWithWendy varies widely between neighboring counties, as in Atlanta.

Geographic Distribution of Relative Frequency of

Tweets Containing #StandWithWendy at the County Level

Blue = relatively more #StandWithWendy Tweets; Red = relatively fewer; White = no tweets; Darker shading indicates greater intensity

Source: DOLLY, n = 3702 #StandWithWendy tweets on June 25th and 26th, 2013; Normalized by the total number of tweets sent during the same time period

January 11, 2013

Premier League teams on Twitter (or why Liverpool wins the league and the Queen might support West Ham)

Have you ever wondered where Premier League football teams draw most of their support from? Or what the geography of fandom is? We have too, and set about to better understand how Premiership teams are reflected in Twitter usage across the UK.

The Floatingsheep team, with the help of two researchers from the Oxford Internet Institute - Joshua Melville and Scott A. Hale (both of whom did most of the work) - have created a neat interactive map for you to both explore the geography of Twitter mentions of specific teams, and let you explore the patterns of five key rivalries. Click on the screenshot below to be brought to the full interactive map.

The data used include all geotagged tweets mentioning any of the Premiership football teams and their associated hashtags (e.g., #MUFC or #YNWA) that were sent between August 18 and December 19, 2012. We have only included one tweet per user to prevent 'loud' fans from skewing the results. The users were then aggregated to postcode districts in order to see a fairly fine-grained geography of results. The number of tweeters per district is normalized by the total 'population' of Twitter users based on a 0.25% random sample of all tweets within the UK.

What do the data show us, you ask? In Manchester, for instance, there is the oft-repeated stereotype that Manchester City are the 'real' local team, while Manchester United attract support from further afield. Our map doesn't really support that idea though. There are only a few parts of Greater Manchester in which we see significant more tweets mentioning Manchester City than their local rivals. We also, strangely, see more mentions of Manchester City in Scotland and Merseyside, and more support for Manchester United in Northern Ireland.

The Merseyside rivalry (Liverpool vs. Everton) is another interesting one to map. There we see that Liverpool have the slight edge in the postcode that is home to both team's stadiums. However, there is no clear winner in the rest of the region: with most postcodes having a fairly close split between the two teams. Interestingly, many postcodes in Scotland seem to have more mentions of Everton; while many in Northern Ireland have more mentions of Liverpool.

We can also zoom into particular postcodes and see which teams are most mentioned there. The academics in Oxford (for some strange reason) mention Manchester City more than any other team. Central Edinburgh (when not focusing on Hearts or Hibs) has more mentions of Everton than any other Club. And the Queen's home of SW1A goes for West Ham.

What about maybe the most important question of all. Who wins the league based on total number of Tweets sent from anywhere in the UK? The answer is Liverpool (a team that hasn't won the actual league since 1990). Manchester United are a somewhat distant second, joined by Everton and Tottenham in the Champions League spots. We also find out that Fulham, Swansea, and Wigan are the three teams that get relegated due to their quite abysmal scores. Apparently just not that many people want to tweet about Wigan.

There is no doubt that using Tweets as a proxy for fandom is messy and not always reliable. In other words, we are mapping mentions and not measuring sentiment. But, the data do give us a rough sense of who is interested in (or at least talking about what), and where they are doing it from. It allows to begin to counter myths (e.g. that Mancunians don't support Manchester United), develop new insights about places that we don't necessarily have good data about, and most importantly, have some guesses as to which team the Queen might support.

See also:
A broader take on how information augments place (a second paper on the topic can be accessed here)
Other examples of our Twitter mapping (racism, flooding, earthquakes)
The code behind this visualisation (made freely [CC-BY-NC-SA] available on Github)

November 28, 2012

Digital Data Trails of the UK Floods

What do data scraped from the Internet tell us about a range of social, economic, political, and even environmental processes and practices? As ever more people take to social media to share and communicate, we are seeing that the data shadows of any particular story or event become increasingly well defined.

The ongoing UK floods offer a useful example of some of the links between digital data trails and the phenomena they represent. In the graphics below, we mapped every geocoded tweet between Nov 20 and Nov 27, 2012 that mentioned the word "flood" (or variations like "flooded" or "flooding").

Unlike many maps of online phenomena (relevant XKCD),careful analysis and mapping of Twitter data does NOT simply mirror population densities. Instead concentration of twitter activity (in this case tweets containing the keyword flood) seem to closely reflect the actual locations of floods and flood alerts even when simply look at the total counts. This pattern becomes even clearer when we do normalise the map (the second map is a location quotient where everything greater than 1 indicates that there are more tweets related to flooding than one would expect based on normal Twitter usage in that area), the data even more closely mirror the UK Environment Agency's flooding map.

As we demonstrated with our maps of Hurricane Sandy, it is important to approach these sorts of maps with caution. At least in the information-dense Western world, they are often able to reflect the broad contours of large phenomena. But, because we are still necessarily measuring subsets of subsets, our big data shadows start to become quite small and unrepresentative at more local levels. This is particularly an issue when the use of the relevant technology is unevenly distributed across demographic sectors such as was the case in post-Katrina New Orleans.

Nonetheless, with every new large event, movement, and phenomena, we are undoubtedly going to see a much more research into both the potentials and limitations of mapping and measuring digital data shadows. This is because physical phenomena like hurricanes and floods don't just leave physical trails, but create digital ones as well.

July 04, 2012

Church or Beer? Americans on Twitter

In honor of the anniversary when American colonists kicked out the oppressive British (apologies to Mark and other oppressive Brits) today is the birthday of the United States. Traditionally it is celebrated by attempting to blow up or burn a small part of it with fireworks, and given the dry conditions at the moment, we may very well succeed at this beyond our wildest expectations.

But until #badideaswithfireworks becomes a trending hash tag, we thought we'd use Twitter to explore some of the regional differences that ~~are rending the fabric of society~~ make America great. It also gives us a chance to showcase some of the potential of our nascent DOLLY project (feel free to visit the Knight News Challenge website and comment positively!), which integrates and maps geographic social media and official data sources. DOLLY is still not quite ready for general use, but the backend database is all set which makes it really easy to pull out user generated geocoded data, in this case from Twitter.

So in honor of the 4th of July, we selected all geotagged tweets^[1] sent within the continental US between June 22 and June 28 (about 10 million in total) and extracted all tweets containing the word "church" (17,686 tweets of which half originated on Sunday) or "beer" (14,405 tweets which are much more evenly distributed throughout the week). See below for more technical details^[2] or just go straight to the map below to see the relative distribution of the tweets in the U.S.

Relative Number of Tweets containing the terms "church" or "beer" aggregated to the county level, June 22-28, 2012

This map clearly illustrates some fairly big regional divides (more on that in a bit) but it is worth drilling down a bit to see how this plays out at the local level. San Francisco has the largest margin in favor of "beer" tweets (191 compared to 46 for "church") with Boston (Suffolk county) running a close second. Los Angeles has the distinction of containing the most tweets overall (busy, busy thumbs in Southern California). In contrast, Dallas, Texas wins the FloatingSheep award for most geotagged tweets about "church" with 178 compared to only 83 about "beer."

Of course, since these are tweets, the content is decidedly less spiritual than one might expect given the focus on beer and church. For example, the most common example of a "church" tweet was simply a report such as "I am at _______ church". More amusing are what we characterize as "competitive church going" when one person replaces another as the Foursquare "mayor" of a church. "I just ousted Jef N. as the mayor of Dallas Bible Church on @foursquare! 4sq.com/5hNW6x"

This of course echoes the Sermon on the Mount and the famous verse, "Blessed are those who check in for they shall inherit the badges of righteousness." Another common category were politically related tweets such as "#ICantDateYou If You Dont Go To Church" or "@____ you're right. It's like separation of church and state. But they really shouldn't be separated. #twitterpolitics".

Given the cultural content of the "church" tweets, the clustering of relatively more "church" than "beer" content in the southeast relative to the north-east suggests that this could be a good way to identify the contours of regional difference. In order to quantify these splits, we ran a Moran's I test for spatial auto-correlation which proved to be highly significant as well.^[3] Without going into too much detail, this test shows which counties with high numbers of church tweets are surrounded by counties with similar patterns (marked in red) and which counties with many beer tweets are surrounded by like-tweeting counties (marked in blue). Intriguingly there is a clear regional (largely north-south split) in tweeting topics which highlights the enduring nature of local cultural practices even when using the latest technologies for communication.

We also note that this map strongly aligns with the famous 'red state'/'blue state' map from the 2000, 2004, and 2008 elections with a strong "religious right" component in the Southeastern United States (see also The Virtual 'Bible Belt') and a more liberal, or at least beer-tweeting, Northeast and upper Midwest (see also The Beer Belly of America).

In any case, happy 4th of July to our American readership. We hope you enjoy your beer in the north, or your church service if you are tweeting from the south.

----------------------
[1] It is important to note that geotagged tweets are somewhat of an oddity among tweets, as only one to three percent of tweets (depending on the country) are geotagged. Still a small percentage of a very large number (the total number of tweets) results in a LOT of data.

[2] There are a number of technical issues tied to the validity and scale of geography associated with tweets which we won't go into here but it is worth mentioning that we are NOT using user profile locations. This data is limited to geographic information associated with each tweet, often drawn from a GPS capable device. While the relevant scale at which analysis can be done differs between tweets about 90 percent of the tweets in this sample are accurate on the city level or lower which works well for this analysis.

[3] Based on IDW matrix for 2.34 decimal degrees (Euclidean distance), this test achieved a z-score of 14.34, implying there is a less than 1% likelihood that this high-clustered pattern could be the result of random chance.

March 26, 2012

Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web

Mark and Matt have just had a paper accepted to Environment and Planning A (Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web). The paper is concerned with the ways in which augmented inclusions and exclusions, visiblilities and invisibilities will shape the way that places become defined, imagined, and experienced.

The maps above are all taken from an earlier draft of the paper. They visualise the layers of information indexed by Google and segment the data by language in order to map some of the geo-linguistic contours of the Web. Have a glance through the paper, and let us know if you have any comments or questions. The publication date of the full paper should be some time in early 2013.

December 13, 2011

Mapping Wikipedia Article Quality in North America

The maps of Wikipedia previously posted on the blog offer useful insights into the geographies of one of the world's largest platforms for user-generated content. They, along with similar visualizations, reiterated some of the massive inequalities in the layers of information that augment our planet.

But not all articles are created equally, and those maps didn't give us much of a sense of the quality of articles. "Quality" is obviously a slippery word and there are infinite ways of measuring it, but for the purposes of this post, we'll crudely use the term to refer to article length (future maps will employ a variety of other metrics).

The maps below visualize this measure of quality within Wikipedia entries -- yellow dots represent the location of relatively short articles in the English version of Wikipedia (e.g. the article on "Bandana, Kentucky"), while red dots indicate the location of relatively long articles (e.g. the articles on the "Republic of Molossia".

The map below displays the same data, but with smaller dots: making it easier to see some of the patterns if you expand the image.

Interestingly, the states with the highest average word counts are New Jersey (966) and Michigan (914). The states with the lowest averages are Delaware (534) and West Virginia (492). The reasons for these rather large differences are unclear.

Are Wikipedians from New Jersey that much more loquacious than their West Virginian counterparts? Or does it just take more words to describe the many dazzling wonders of New Jersey? Or is it something else entirely?

Apart from the obvious and increasingly evident urban bias in these information geographies, we'd certainly welcome your thoughts in explaining some of these patterns.

November 16, 2011

The tea party, hipsters, and the methodological limitations of Internet mapping.

America traditionally likes to party. Well, at least engage in the throwing tea off ships into harbors and annoy-the-English kind of party. And let's face it, who doesn't enjoy annoying the English now and again?

Arguably poking fun at the English is the only activity that the two groups we are comparing today -- the new "tea party" movement and "hipsters" -- may share in common. Or not. Both groups probably enjoy a party and the occasional beer. Of course, "tea partiers" will be complaining about taxes on alcohol and "hipsters" will be drinking the beer ironically whilst watching other people party and discussing bands you've probably never heard of.

Unfortunately, parties really have very little (OK, nothing) to do with this post, but they are a (rather forced) way of introducing our comparison of online references to 'hipsters' and the 'tea party'.

Interestingly, America is covered by far more references to the tea party than to hipsters. There are a few pockets of expected hipsterdom: San Francisco, Los Angeles, New York City, and of course Seattle. But otherwise the country is characterized by far more online attention given to the tea partiers. But we need to ask why that is? Is it because tea partiers have an identity they like to flaunt, whereas hipsters might tend to see the essence of hipster identity in others rather than themselves? After all, who really ever admits to being a hipster?

Or perhaps the technology itself is an explanatory factor here. Whilst (hipsters love to use the term whilst) we are somewhat shocked that the tea party, or at least people who talk about them, are harnessing the power of a technology designed and sponsored by the American/socialist/fascist/Kenyan government that is the subject of so much of their ire, they nonetheless maintain an impressive web presence.

Hipsters on the other hand, undoubtedly are proficient social media users, but we doubt they are using the word "hipster" on their sleek tumblr pages. Perhaps it would simply not be ironic enough? Other varieties of hipster, like their tea-throwing brethren of yesteryear, might eschew modern technology altogether and communicate using hand written notes or retro typewriters or cool early 20th century printing presses requiring months of careful restoration. In any case, in contrast to tea partiers, hipsters' general lack of self-professed identity means that they are less likely to create digital traces explicitly referencing themselves online.

And, ultimately, this is the point of this post. Mapping keywords in Google is often an incredibly useful exercise, but it can take hipsters and tea partiers to demonstrate some of the significant methodological limitations of such an exercise.

November 14, 2011

Mapping Wikipedia Globally

Wikipedia is an incredibly impressive coming-together of human labour on a scale that the world rarely sees. Over the last few years, we've also seen a few maps of the encyclopedia (including some work on this blog) which have shown that the project is far from complete (whatever that might mean).

That doesn't mean we should stop mapping the project though, and as part of a multi-year project to study Wikipedia in the Middle East, North Africa, and East Africa, we present this global-scale maps of every article in the November 2011 version of the English Wikipedia.

The English encyclopedia is by far the largest, and currently hosts almost 700,000 geotagged articles (click on the image for a larger and more detailed version):

Each one of these yellow dots represents human effort that has gone into describing some aspect of a place. The density of this layer of information over some parts of the world is astounding. Some of our future posts will look more closely at measures of inequality in Wikipedia, but it is still hard not to be awed by this cloud of information about hundreds of thousands of events and places around the globe.What we can also do is compare the English Wikipedia to the Arabic, French, Hebrew, and Swahili versions (these languages are chosen because they are the subject of the research project mentioned above).

This map should be interpreted with caution for a few reasons. First, it only displays content from six Wikipedias (there are currently 282 of them). Second, many articles in multiple languages appear in the same place. The reason for this is that they are articles about the same feature, event, or place: albeit in different languages. This means that when mapping those features, the dots in each language will show up on the map in exactly the same place. As such, we get a lot of overlapping dots. And dots that higher up in the legend will then necessarily show up on top of others.

The map still remains useful to show some of the different geographical foci of different linguistic groups. In Iran, for instance, there are more articles in Persian than any other languages in our sample. We see more articles about Quebec and parts of North Africa in French, and then a complicated mix of Arabic, Hebrew, English and French in the Levant.Nonetheless it remains that there are far more English language articles than articles in any other language. As such, it remains that if your primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world inevitably looks very different to you than if you were accessing knowledge through the English Wikipedia. There are far more absences and many parts of the world simply don't exist in the representations that are available to you.

September 19, 2011

All the tea in China

As part of our ongoing research into the online geographies of mind-altering substances, we present in this post an analysis of caffeinated beverages in China and Taiwan. In these first two maps we compare references to tea and coffee in both Chinese and English. You see that mainland China is almost entirely dominated by references to tea in both languages. Taiwan, in contrast, is mirrored by many more references to coffee in both languages. Somewhat surprising is the fact that there are some small pockets of coffee references on the mainland in English (notably Shanghai and a few in Beijing).

Is this an indicator of a move towards more Western types of consumption (i.e. coffee) in Taiwan and Shanghai? Or perhaps it reflects areas of with larger populations of coffee-drinking expatriates? Or -- in a completely unwarrented, speculative and highly alarmist vein -- it could be a sign of an impending trade war (reminiscent of the Opium Wars) in which Caramel Frappuccinos are used to balance the western (largely American) trade deficit.

The answers are unclear, but we can explore the data further by comparing the relative visibility of references to "tea" and "coffee" in English and Chinese:

What we see here is that while there is unsurprisingly more Chinese content referencing both tea and coffee in most parts of the region, there are interestingly more references to tea in English in large parts of rural China. Many of these blue blotches of English-language references are actually layered over tea plantations in the south of the country. It is possible though that the English-language references to tea tells us more about Internet content in China than hot green or black beverages. In almost all of the cases in which there is more English-language content, it is because the English word ("tea") receives only one hit while the Chinese word ("茶") receives none. So, the explanation for these differences could simply be that Google has just not got around to indexing local content throughout much of rural China.

In any case, next time you hear some use the phrase, "not for all the tea in China" point them to these maps so they have a better sense of what they're giving up.

Thanks to Han-Teng Liao for the help with this post.

August 30, 2011

Data Shadows of an Underground Economy

Following on from our "Price of Weed" maps featured in the September issue of Wired, we would like to make available the draft report that the maps came from. The full title of the paper is "Data Shadows of an Underground Economy: Volunteered Geographic Information and the Economic Geographies of Marijuana."

Please note that we are still working on the paper (so excuse any lack of polish), but would certainly appreciate any comments and critiques on the draft before we submit it for peer-review.

June 23, 2011

Preparing for the Zombie Apocalypse, Part II: Brains or Salads?

The following is pulled from the cutting room floor of our upcoming chapter in the edited collection Zombies in the Academy: Living Death in Higher Education.

Following our first post on understanding the coming zombie apocalypse, we thought it pertinent to pose, and attempt to answer, another set of questions using the collective wisdom of the geoweb. While our earlier post sought to understand the spatial dynamics of the zombie apocalypse by finding where there were relative concentrations of references to "zombies" and "old people", we all know that a healthy supply of food is an important factor to consider when staking out places to hide from zombies.

We all know that zombies eat human brains. It's almost so widely accepted that I just wasted 15 seconds typing these two sentences. So what is the opposite of the human brain? The exact thing that zombies would not, under any circumstances, have any interest in consuming? The answer: salad. Why on earth would zombies want to eat salad? It makes no sense. Presuming this fact to be true, we can measure the relative concentrations of references to "brains" and "salads" in order to know where might be good places to avoid, and where might be good places to hide out in the case of the zombie apocalypse. Or, in a less dire scenario, where there might be lots of vegetarian restaurants to eat at next weekend.

References to Brains and Salads Worldwide

At the global scale, the distribution of brains and salads appears to heavily favor salads. This is especially good news for most of the United States, South Africa, Australia and New Zealand, and some pockets of Europe. Japan and China, however, appear to have much higher concentrations of brains, making them ripe for future zombie attacks. Who knew that all of that focus on education could end up being a bad thing? However, because continental Europe displays such variability, it is important to take a closer look at how brains and salads are distributed.

References to Brains and Salads in Europe

There appear to be a few patterns worth mentioning. First, coastal areas appear to be more secure given the prevalence of salads in coastal areas across most of Europe. While there is no clear cause for this, it seems plausible that it could be because coastal areas present more opportunity for fleeing from zombies, since water poses a considerable obstacle to the undead.

Second, there appears to be a clustering of brains in the Normandy region of France and in parts of Germany. While this contradicts our earlier finding that France and Germany are seemingly safe from the zombie apocalypse given the large number of old people, perhaps our shot in the dark about zombies masking their presence in these places was more founded in evidence than we suspected. Probably best to avoid anyone over the age of 25, just to be safe.

Given the smattering of brains across the continent, it appears that no place is entirely safe from the zombies. It's also worth noting that just because there are more salads than brains in a location does not mean there are no brains worth consuming. Just because zombies don't receive as much payoff in these areas doesn't mean they can't make their way there should all other tasty cerebral resources be exhausted.

And you thought Mad Cow disease was bad...

June 20, 2011

Preparing for the Zombie Apocalypse, Part I: Zombies or Old People?

The following is pulled from the cutting room floor of our upcoming chapter in the edited collection Zombies in the Academy: Living Death in Higher Education.

With all the recent talk of the zombie apocalypse, including our own forthcoming book chapter on a similar topic, we've been worried that the older and slightly disheveled population has been put at greater risk of personal injury due to their being confused for the undead [1]. Always eager to lend a helping hand, the Floatingsheep collective has turned to the infinite wisdom of the collective internet to map the relative prevalence of zombies and old people. It is our hope that this guide will help lower the level of zombie hunter on senior citizen violence that has plagued human kind for generations [2].

Zombies and Old People in Europe

Europe, for example, presents a quite clear picture of the spatial variation in the zombie and elderly populations. Word to the wise for our transatlantic zombie-hunting compatriots: hold your fire in France and Germany. Though we have no idea why there are so many old people, do make note that these are innocent citizens. Unless, however, the zombies have established a colony in these countries and have just effectively been able to hide their presence under the guise of retirement homes [3].

Do, however, be on the lookout in the low countries, as zombies appear to be rampant in the Netherlands as well as much of Belgium [4].

Zombies and Old People in the USA

When looking only at the United States, however, there is no such easily discernible spatial pattern. Though much of the eastern seaboard appears to be dominated by zombies, this corpse cluster is bookended by small concentrations of the merely elderly in both Washington, D.C. and Cape Cod, Massachusetts.

Given our earlier finding of Cape Cod as being the highest concentration of "fun" in the United States, we're not sure if this should be surprising. While many may not consider shuffleboard and iced tea to be the most fun things in the world, I believe we can find some general agreement on the fact that a zombie apocalypse is most certainly NOT fun. If it is a choice between spending a weekend with the undead or the old-fashioned, I think we're all going to pick grandma and grandpa.

Ultimately it appears as if zombie hunters in the United States will be forced to use their best judgment, rather than the tools of spatial visualization, to determine who needs to be taken out in the event of the zombie apocalypse.

------------------

[1] No senior citizens were harmed in the making of this crass, terrible attempt at humor. Plus it was Mark's idea. We also thought "confused people" might be mistaken for zombies as well but despite our expectations -- and considerable evidence all around us in the material world -- searches for for references to the phrase "confused people" in the geoweb did not produce many results.

[2] And it came to pass, when Israel had made an end of slaying all the zombies of Ai in the field, in the wilderness wherein they chased them, and when they were all fallen on the edge of the sword, until they were consumed, that all the Israelites returned unto Ai, and smote it with the edge of the sword. And so it was, that all that fell that day, both of male and female zombies, were twelve thousand, For Joshua drew not his hand back, wherewith he stretched out the spear, until he had utterly destroyed all the zombies of Ai and a good deal of the older and slower moving people as well. Book of Joshua, Chapter 8, versus 24-26

[3] The lack of cognitive abilities on the part of zombies does make this theory somewhat less plausible. But when you're talking about the zombie apocalypse, you can never be too careful.

[4] Or, perhaps if we had also done a search on "stoned people", the pattern would be different.

June 13, 2011

Distribution of References to Food in Arabic and Hebrew

Continuing our look at the distribution of language in the geoweb, the map below shows the pattern of references to the word food in Arabic and Hebrew. The locations marked in gray are places in which neither language had more references which usually meant both languages had zero. Locations in white are either indicative of water (e.g., the Black Sea) or are places without any placemark references.

The dominance of Hebrew in the Israel/Palestine area corresponds to some of our earlier findings. Continental Europe shows some interesting clusters with much of Italy, Belgium, the Netherlands and parts of Germany contain more Arabic references, while Switzerland, Austria and parts of Germany have more references to food in Hebrew.

References to the word "food" in Arabic and Hebrew, Data from 2010

(Green=more references in Arabic; Red=more references in Hebrew)

June 09, 2011

Geography of Beer by Language

With the summer months upon us, the FloatingSheep Collective is busy with travel and paper-writing and as a result, we've not been posting as much.

This will be changing over the next weeks as we are working on topics ranging from zombies to augmented reality to marijuana pricing to the interaction between material and virtual flows in the economy. We'll be pushing some of this material out later in June and July.

We're also continuing to work on the languages of the geoweb with specific case studies in a range of locations such as Belgium, the corridor between Toronto and Quebec, Kenya, the UAE, France, and Spain. This will likely start coming out in August and September. But to give an initial sense of what we're finding, we offer the following look at languages in Europe...

We searched for the term "beer" in about 70 different languages -- some native to Europe, others from around the world -- to see what kind of patterns we could see. The map below shows the distribution of six languages that we selected to highlight the tight ties between online use of language and offline patterns.

The clustering of references corresponds very closely with the distribution of the speakers of each language, even languages that exist within a state with another dominant language. For example, Welsh appears within Wales but in few other places within the United Kingdom and Catalan is concentrated around Barcelona within Spain. The other interesting finding is that most languages have a micro-cluster of references to beer within Brussels. Whether this is due to the high quality of Belgian beer or the fact that the E.U. is headquartered there remains to be seen.

The Geography of Beer References by Language
(Red=Estonian; Orange=Welsh; Purple=Czech; Black=Italian;
Blue=Castillian/Spanish; Yellowish Green = Catalan)

Note: The size of the circles are consistent within a language but should not be compared between languages. For example, there are many fewer references to beer (or anything) in Welsh than in Italian.

Search Terms for Beer Used in the Map Above

April 04, 2011

What's up with Montana? Comparing Google and Wikipedia in the US

As mentioned in an earlier post we're starting to have some fun with cartogram representations of geoweb data. For those who have forgotten, cartograms distort geographical areas based on the proportional value of some characteristic.

In the two cartograms below the characteristics used to determine size are (1) Google Maps placemarks and (2) the total number of geotagged Wikipedia articles. The distortion was done at the county level and include the 48 lower continental U.S. states. The coloration represents the relative number of geotags/placemarks by population. This gives a better understanding of the distribution of geotags/placemarks both by population and by area.

While many of the results are expected -- California is bursting with geoweb goodness no matter what the measure -- there are some intriguing differences between the distribution of wikipedia and Google Maps placemarks.

Cartogram depicting the distribution of Google Maps Placemarks

Cartogram depicting the distribution of geotagged Wikipedia articles

For example, Texas, Florida and North Carolina are bulging with placemarks but slim tremendously when you consider wikipedia entries. In contrast, New York and Vermont seem to have proportionally more wikipedia than Google Maps placemarks.

But the biggest contrast between these measures is Montana whose size balloons tremendously when you move from placemarks to wikipedia entries. We're really not sure what's going on with Montana and so invite folks to take a closer look. We suspect it has to do with someone (or perhaps some automated bots) who were/are extremely dedicated to documenting EVERYTHING in Montana. Interestingly this dedication does not extend to the neighboring states of North and South Dakota or to creating placemark entries for use in Google Maps.

Wikipedia Entries in Google Maps

In any case, these cartograms and the case of Montana highlights how diverse each digital layer within any place's cyberscape can be.

UPDATE: Thanks to commenter Mongo for pointing us to the page for the WikiProject Montana, where questions emanating from this blog post have uncovered that a couple of diligent Wikipedians (one of them being Mongo) have been geotagging all kinds of stuff out in the Big Sky country. So thanks for passing the info along and proving our hypothesis about the bots to be wrong!

March 22, 2011

Heatmap of Wikipedia articles: the concentrated geographies of history

Gareth Lloyd has put together a brilliant visualisation of all geotagged Wikipedia articles.

Even more fascinating is this video, showing the data mapped out over time and space:

There are, unsurprisingly enough, quite similar patterns to those found in the maps that we made of Wikipedia biographies mapped out by century. Early concentrations in the Mediterranean, and then an explosion of interest in the rest of the world in the last few centuries. This data gives us a fascinating insight into just how spatially concentrated our knowledge of history is.

March 15, 2011

A Gravity Sink in Wyoming? A Cartogram of Google Placemarks in the U.S.

One of the visualization techniques that we're beginning to work with are cartograms (thanks to Monica) which distort the size of an area based on some characteristic. We decided to do this with the number of Google Maps placemarks in the image below (we strongly recommend clicking on it to get the bigger version).

This cartogram helps to visualize the density of the geoweb within the U.S. although other measures such as Wikipedia entries produce fairly different images. Not all geoweb data is created equal.

This cartogram was created using the total number of placemarks at the county level so the distortion is at that scale rather than the scale of the state. This is very clear for the area of Illinois around Chicago which bulges out relative to the rest of the state. The west coastal region is another good example as is the area around Boston.

At the other end of the spectrum is the contraction in the upper mountain west and great plains. Although we recognize the power of labels and are loath to characterize regions solely based on our maps, there really seems to be a bit of informational gravity sink (aka black hole) in the center of Wyoming. Perhaps it would be best for those in the region to strap down their iPhones lest they be drawn into it.

February 18, 2011

The Ephemerality of Search

Google announced yesterday that search was becoming more social. We won’t go into the technical details in this post (the NYT provides a useful overview), but the basic point behind the tweaking of their interface was to allow results to incorporate information that your friends and contacts find relevant and share on platforms like Twitter, Linkedin and Facebook.

This seems to be Google’s final move to ensure the ephemerality of the search experience. Google has already made search a highly personalised experience in both space and time.

Search results have always been temporally unfixed (a search for the same topic last month, yesterday, today, and tomorrow all can yield different results). However, this trend is speeding up to the point that Google will maintain a real-time index of the Web. What is important here is that both the algorithms used and the information that they harvest and rank are constantly changing.

More recently the geography of results has also become unfixed. Our work analysing Google’s autocomplete in different locations tries to highlight some of these differences. The same search at the same time from two different locations can yield dissimilar results.

The search experience has also been personalised, not only through the memory of links that we highlight or star, but by triangulating results with other personal information that Google knows about us. The happy birthday doodle below is just one example of how this sort of personalisation is enacted.

And now, not only are results individually, temporally and geographically targeted, but also socially specific. My results are now no longer just dependent on my positionality in time and space, but also the time and space positionalities of my entire social network.

This is important due to the powerful links between representation and repetition. We are served information, we act on it, and we thus reproduce and reinforce those representations. This cycle opens up possibilities for a path-dependence of the powerful to be enacted and re-enacted.

Google has received (often warranted) criticism over the ways that it represents, ranks, structures and sorts. Yet despite its general opacity, it had a knowable presence of sorts. Its actions could be observed, and thus criticised and challenged.

However, it is now increasingly difficult to know how Google is “organizing the world’s information.” How do we map and measure, study and critique this increasingly ephemeral tool that so many of us rely on for our informational needs? This will be an increasingly central question for those of us concerned about representations, rankings, and our ability to recreate and challenge them.

See also:

- Ethan Zuckerman on "Listening to Global Voices"
- Zook and Graham on "Google and the Privatization of Cyberspace and DigiPlace."
- Thanks to Monica Stephens for the link to the story.

February 09, 2011

Autocomplete Part I: Mapping the World of Autocomplete

Building on the recent fascination with the United States of Autocomplete map, we thought we'd expand its premise to look at the entire world. In short, we'd type the name of every country into Google and record the top ranked autocomplete, i.e., Google's guess on what you are looking for. Once we started working, it quickly became apparent that the results we were getting in the U.S. sometimes differed dramatically from the results we found in the United Kingdom.

Suddenly what had been a simple mapping exercise became an exciting means of better understanding the geographic differences in search patterns. Cool! You gotta love it when stuff like that happens.

Because it's hard to fit so much data in a static map, we've created a mashup that you can download as a KMZ file and view in Google Earth. (By the way, we hope you like the iconography. We've been looking for a good excuse to use it). As the map is a bit complicated a few words of explanation.

We used a list of countries maintained by the CIA World Factbook. Obviously this exercise can be replicated with any other list of place names.
We conducted the searches in January 2011.
The icons are generally centered over the capital city of a country.
The blue icons represent the autocomplete results obtained in the U.S. (specifically Lexington, KY) and the red icons (offset a bit for readability) represent the results from Oxford, UK.
The label for each icon contains the search term, the location of the search and the top ranked autocomplete result. For example, the label "India (UK): indian visa" indicates that the first autocomplete entry in Oxford was "Indian visa".

Take it out for a spin and see what you find. What we've noticed from this exercise is that the location of the searcher clearly matters. We're not exactly sure how Google decides what other searches to include in its autocomplete (nor do we think they will tell us) the differences in our results provide some clues.

Google autocomplete is incorporating geocoded data. The best example of this is that in Lexington searches on terms China and Nicaragua return "China Star Lexington KY" and "Nicaraguan Grill Lexington KY"; two local restaurants in the city (by the way, the Nicaraguan Grill makes a great Nacatamale). This same geocoded effect does not show up in Oxford but the Lexington results show that there is a blending of regular search and spatial search.
Second, the autocomplete suggestions appear to be shaped (in part) on the makeup of other user searches in geographic proximity. The example of the restaurants above support this idea as well as does the results for India in the U.K. and U.S. Whereas, "indian visa" is the first suggestion in the U.K. (reflecting the long colonial and migration connections) the first suggestion in Lexington is "Indianapolis Colts", a football team based only a few hundred miles away. Likewise a search for Panama in Lexington results in "Panama City Beach" (located in Florida) rather than "canal" as found in the U.K.
Third, and perhaps most intriguing, is the way these differences illuminate the varying ways in which countries are conceived of (at least in terms of search queries) in separate locations. For example, in Lexington, both Kazakhstan and Bulgaria generate the suggestion of "adoption" (decidedly different that the U.K. results) perhaps linking these countries in the minds of near-Lexington based searchers with international adoption. While these countries are not the largest source of adopted children (China and Russia are 1 and 2) Bulgaria and Kazakhstan (in particular) are connected to the U.S. via adoption and moreover are less likely to have other competing searches. Hence adoption is the first suggestion. In a similar vein, a search for " British Indian Ocean Territory" in Lexington suggests "flag" while in Oxford "holiday" is the top result.
There is also a clear element of temporal closeness. The search for North Korea results in "bombs South Korea" which was an important news story during our searches.
It is also clear that correctly interpreting a user's intent based on limited input remains a challenge. A search for Turkey results in the suggestions of "brine" and "cooking time".
Finally it seems that autocomplete suggestions are susceptible to ~~spamming efforts~~ the strong presence of commercial/business representations online. For example, "tractor parts" is the top result for a search on the term Belarus in Oxford most likely because the domain Belarus.com is for tractor manufacturer. Again, the low level of Belarusian references online is likely also contributing to this.

While these results are really enlightening getting a larger sample of searches from a range of locations is important to help explore this phenomenon. And this is where you dear reader come in. Stay tuned for the next post when we work on crowdsourcing the geography of autocomplete.

February 03, 2011

Wikipedia Demographics

We've written a fair amount about the geographic and linguistic clusters of Wikipedia authors but were reminded today (via New York Times "Room for Debate" forum") that there are plenty of other clusters along social and economic dimensions. Last year a survey of Wikipedia users was conducted which highlights some interesting fissures within the user group.

One of the most provocative findings (and the one highlighted by the New York Times forum) is that less than 15 percent of the regular contributors to Wikipedia are women. This really grabs one's attention but a closer look at the data report (see also here and here) makes us wonder if this figure accurately reflects the Wikipedia community. Some of the questions are:

What was the sampling method used? Nothing is listed in the reports.
What is the bias in the sample? For example, Russia and Russian speakers are the largest language and country groups represented in the survey even though the Russian section of Wikipedia is only the 8th largest linguistic group. (English, German, French, Italian, Polish, Japanese and Spanish are all larger).
Did women have a lower participation rate then men in the survey? There were three times as many male respondents as female respondents. Does this accurately reflect the makeup of the Wikipedia audience? Given the unexpected results for language and country, it is not clear if there might be gender bias as well.

All this said, we find the question of an imbalance in gender participation very intriguing and important. We just don't know if the survey methods used are such that we can be confident in the magnitude of the highlighted differences. Anyone who can shed some light on this would be more than welcome to comment.

Pages