December 31, 2014

The Best of Floatingsheep in 2014

With yet another year coming to a close, we thought it a good time to reflect upon yet another year of sheepish maps and blogposts, recounting what we have accomplished, perhaps mostly so that we don't dare attempt such goofiness again. And so we give you the Top 10 Floatingsheep posts of 2014, ranked according to the number of page views each received. Feast on these last remnants of 2014, and a happy new year to all!

#1 The Drama of Llamas vs. the Gloating of the Goats 
What was thought to be something of a throwaway post came from the shadows to become 2014's most viewed blogpost, largely thanks to some Redditors who took the map a bit too seriously, if we do say so ourselves.

#2 New Book Chapter on the Geographies of Beer on Twitter 
Based on some great work by Matt and Ate, the map below (and others from the same book chapter) has become a staple of Vox's explanations of alcohol this year... see here, here and here.

#3 Mapping Ferguson Tweets, or more maps that won't change your mind about racism in America 
The product of the Inaugural #IronWilson Map-a-Thon, this map and post was our attempt to counter some problematic uses of geotagged Twitter data in relation to the then-nascent protests in Ferguson, Missouri, and highlight the persistent limitations of this sort of work when dealing with issues as complex and fraught as violence and structural racism.

#4 Mapping the Seven Dirty Words 
One of the biggest missed opportunities from the 2014 IronSheep dataset, our series of maps of George Carlin's infamous seven dirty words didn't yield a whole lot except for excrement.

#5 Hashtags and Haggis: Mapping the Scottish Referendum
While the Scottish ultimately decided to remain a part of Great Britain, some of our maps helped to demonstrate persistent cultural divides between the English and the Scottish, and the fact that "the Scottish referendum [was] not just simply about 'yes' or 'no' but seemingly touche[d] on much more fundamental questions of ovis-based cuisine, men's wear and mythological creatures". Indeed.

#6 Artists, Bankers, Hipsters and the "Bro-ughnut" of New York: Mapping Cultural-Economic Identities on Twitter 
Some more work by Ate and Matt for a journal article yielded the discovery of what will surely be recognized in time as one of the most fundamental geographical phenomena known to humankind: the 'Bro-ughnut' of New York.

#7 Hey Y'all! Geographies of a Colloquialism
There are few places as distinct as the American South when it comes to cultural patterns expressed through geotagged tweets, as our mapping of references to "y'all" helped to confirm.

#8 Crowdsourcing Cake or Death?
While the choice between cake or death seems like an obvious one, our maps of references to these terms yielded a much different -- and troubling -- result.

#9 Are there really more juggalos than polar bears?
"As our analysis has shown, there is more to the story of juggalos and polar bears than meets the eye. Clearly, there are more references to polar bears than to juggalos, both globally and in the United States. But the relationship between these two is considerably more complex and contradictory than is assumed by David Cross and his ilk. Obviously more research is required as ten-second gifs are not up to conveying the complexity of the juggalo-polar bear ecosystem."

#10 The Epic Tweet Fight of Bronies and Juggalos
Despite Lexington, Kentucky being at the center of a online controversy around a Bronies vs. Juggalos street fight, the Floatingsheep home base didn't have much online activity around these two subcultures. In fact, when taking the epic street fight online and evaluating the epic tweet fight, we couldn't help but declare it a draw.

December 25, 2014

Are we more interested in XXX or Xmas?

This holiday season we decide to ask the questions that really matter. As people celebrate Christmas, we wanted to know how people around the world are mentioning the holiday. And, perhaps more importantly and interestingly, how mentions of Christmas stack up against mentions of a more sexual and consumption-oriented nature. 

So, we decided to compare mentions of 'Xmas', 'XXX', and 'Xbox. 

The formula that we used (for XXX tweets for example):  

(Sum of XXX tweets in square / Sum of XXX tweets globally)
( sum XXX+Xbox+xmas in square /  sum XXX+Xbox+xmas globally)

We see some important global differences. Americans (as well as the French and Spanish) are most interested in Xboxes. Strangely, the Japanese and Nigerians seem to be most fixated on Christmas. And the British, Dutch, and Italians more interested in X-rated content: giving a whole different meaning to reflections on who has been naughty and who has been nice. 

December 18, 2014

Deconstructing the (most detailed tweet) map (ever)

If you’re the kind of person who visits our blog with any regularity, you’re almost certainly also the kind of person who would have seen some version of the map below in the last couple of weeks. Created by Eric Fischer of Mapbox, this map was released along with a blogpost entitled “Making the most detailed tweet map ever”, discussing some of the data cleaning and visualization methods necessary to produce such a striking map. The map is undoubtedly interesting and has sparked a great deal of interest from all corners of the internet, but there’s just something about the framing that rubs us the wrong way. While Eric’s post emphasizes the making part of the equation, the internet hype cycle around it has caused us to read the title a bit more along the lines of:

"Making THE MOST DETAILED tweet map EVARRRR!!!!"

That is to say, for all of the admittedly really great detail about what went into making this map, the framing of this map as not only a detailed map of six billion or so geotagged tweets, but as the most detailed tweet map ever, raises more questions than it answers. For example, what constitutes ‘detail’ in tweet maps? What do competing definitions of ‘detail’ reveal about what we value in this kind of analysis? What do these particular ideas of ‘detail’ foreclose in terms of other possibilities for analysis?

These are important questions, regardless of whether they’re applied to this particular map or any other one. The issue in this case, however, seems to be that the answers to some of these questions conflict with one another, or with the ways the project is itself described. The detail that seems to be valued here is of the “every tweet ever” variety, or, put simply “more = better”, the fetish for bigger data at the expense of all else.

But more data isn’t necessarily better, and it certainly doesn’t mean that there’s more detail, especially when the only bit of detail you're concerned with in each of these six billion points is the latitude and longitude coordinates. Each of these individual tweets contains a wealth of other interesting information, from information about the user and the way they describe themselves, to the time the tweet was created to the text of the tweet itself, which might contain hashtags that link up with bigger conversations, or @-mentions to other Twitter users that might be used to understand social networks and interactions. All of these bits of information represent a kind of detail that is not included in this, the most detailed tweet map ever

As we’ve been arguing for the past two years or so, there are a range of social and spatial processes represented in geotagged tweets that we can’t get at if all we’re concerned with is the latitude and longitude coordinates. So to say that this represents the most detailed tweet map ever serves to reify what we see as two of the most problematic assumptions of contemporary big data/social media research: (1) that more data is equivalent to better data, and (2) that the only important aspect of the data is the geographic coordinates attached to it. There's lots of interesting stuff that can be done with this kind of data, and we can do better than simply plotting points on a map and calling it a day [1].

Even if one were inclined to accept the argument that more tweets equals more detail, how should we interpret the fact that this map only visualizes about 9% of all geotagged tweets, due to the design decisions necessary in order to make the map nice and pretty [2]? Due to the existence of exact or near-duplicate coordinates that would make points indistinguishable from one another, this, the most detailed tweet map ever, actually eliminates about 91% of the detail that it seems to value most (i.e., the presence or absence of points on the map). The Gizmodo headline about the map reads, “The Most Detailed Tweet Map Ever Includes 6,341,973,478 Tweets”... except that, you know, it doesn’t [3].

Of course, there’s also good bit of imprecision in the locational accuracy associated with geotagged tweeting; our iPhones don’t come with military grade GPS units installed in them. So while Mapbox CEO Eric Gunderson was marveling at the detailed micro-geographies of an airport gate seen in the map, he was ignoring both the fact that all of those folks on the jet bridge could just have well been 40 feet away, and that a number of tweets might have been eliminated from the initial dataset due to a lack of precision in the geotagging process. Take all of that together and a lot of the detail that’s being celebrated here starts to give way to fuzziness. This map is more art than science, though the striking visuals and discursive framing give the illusion of precision and absolute insight. 

To be clear, there’s no problem with fuzziness. It’s something we all live with every day, it’s something we academics may embrace from time to time through the use of overly obtuse language. But taking all of this fuzziness and then repackaging it as the most detailed tweet map ever, comes off a bit wrong to us. These initial misgivings were only amplified when brought down to a more local level, when we saw a post from a local urbanist blog in Louisville wondering “What we can learn from where people in Louisville are using Twitter”. While relatively mundane, and certainly not nearly as celebratory, the blog’s ultimate conclusion was that "These locations [with the highest concentrations of tweets] make sense as they are places where people gather and are often held captive by events.”

This, in general, is true, but also a bit… how do we put it? Meh. More fundamentally, people tweet where people are. It comes as no surprise to anyone with even the vaguest familiarity with Louisville that people tweet in larger numbers from downtown (including 4th Street Live!), the University of Louisville campus, Bardstown Road and the St. Matthews / Oxmoor Mall area than anywhere else in the city. These are (some of) the primary gathering points on a day-to-day basis within the city.

But just identifying these locations doesn’t really help us to ‘learn’ anything beyond the fact that those are, indeed, the places with the highest concentrations of geotagged tweets in Louisville [4]. In fact, the map doesn’t even really show us actual concentrations of tweeting activity, but rather concentrations of unique tweeting locations. Take, say, two hypothetical city squares, one of just 50 x 50 meters, and another much larger one of 500 x 500 meters, both the originating point of one million geotagged tweets spread randomly over the squares. In Fischer’s method, these two squares would not 'glow' in equal amounts, but rather the larger square would show up as much more visually prominent because it has many more unique tweeting locations while many of the tweets from the smaller square would be filtered out due to a duplication of coordinates.

Further, from a data collection standpoint, all of these tweets in Louisville reveal little that isn't revealed by mapping a random sample of tweets (say 1% of tweets from 2013, see map below). If all we’re really concerned about is the question of where people are tweeting from, there isn’t much that looking at all the tweets reveals that couldn't also be found from a smaller subset, and it’s much easier to collect or analyze a few hundred thousand tweets than it is to collect 6,341,973,478 of them. But even still, all we can ‘learn’ from these kinds of maps is where people have created geotagged tweets and, to some extent, where they have not [5].

But if that’s all we can learn from this map, again, why call it the most detailed tweet map ever? Again, there are any number of details that are excluded from analysis by only looking at the locations of geotagged tweets. What if we instead took a different approach to this data, such as examining at the use history of individual Twitter users, or even collectives of Twitter users based on some kind of shared experience or identity, such as association with particular neighborhoods or other places?

OK, you're right. This particular question is a bit self-serving, as this is precisely the kind of thing we've been working on for some time now. And so rather than just offering a critique of someone else's work, we really want to see if we can push this kind of analysis in more productive directions. So we offer up the map below, which comes from a paper we currently have under review, that attempts to demonstrate how geotagged tweets can help us to better understand urban socio-spatial inequality beyond simply identifying the presence or absence of tweets in a given area, as is so often done.

Using Louisville and the now-common ‘9th Street Divide’ trope as a starting point, we sought out to understand how people from different parts of the city used and moved around the city in different ways. So in a manner not uncommon to some other things Eric Fischer has done previously, we identified a number of Twitter users as belonging to one of two groups, those with close ties to either the West End (traditionally a poorer and predominantly African-American part of the city) or the East End (a more affluent and largely white part of the city), and collected all of the geotagged tweets from those users [6]. We then compared the spatial footprint of these groups' tweeting activity via an odds-ratio measure. On the map areas in purple represent places with greater-than-usual levels of West End user tweeting activity, while orange hexagons represent places where East End users were relatively more dominant than expected. Those places which demonstrate roughly equivalent or expected levels of tweeting are signified by those hexagons with hashes.

This map, in short, represents those places in the city of Louisville which are more socially heterogeneous and homogeneous, dominated either by West End or East End residents, or characterized by a relative mix of people from parts of the city. Though it’s evident that there is indeed a kind of divide between the West End and the rest of the city, this map also shows that West End residents are incredibly spatially mobile within the city, while East End residents tend to be much more spatially constrained, sticking to their own parts of town.

While there are certainly a lot of underlying factors driving this process, suffice it to say that this map provides an alternative way of understanding socio-spatial inequality than simply identifying those places that do or do not have significant concentrations of geotagged tweets [7]. Through our analysis, we also learned that contrary to the kind of assumptions often made about this kind of informational inequality, West End users actually produce a significantly greater number of geotagged tweets than their East End counterparts, it's just that many of these tweets are created in other parts of the city. This is, of course, an important kind of detail that we can draw from the mapping and analysis of geotagged tweets and one that, in many ways, is more detailed than the most detailed tweet map ever.

There is, of course, a whole lot more detail in the paper that this one map and blog post can’t capture, just as is the case with Eric Fischer’s map. Just to be clear, we think Eric Fischer does some fantastic and beautiful work with geotagged social media data, and commend him for openly discussing and sharing his methods. And yet, we can’t help but feel like the characterization of his map as being the most detailed tweet map ever is at best a half-truth, and helps to reproduce some of the most common problems with the analysis of geotagged social media data. But the more we think about it, we’re not so sure that a single most detailed tweet map could exist, or that it’s even desirable to have such a thing. Instead, we should be striving to create any number of highly-detailed, geographically-situated tweet maps, that collectively contribute to better understandings of the complex social and spatial processes that are represented and reproduced through this kind of data. 

[1] That’s the royal we. 
[2] Which it most certainly is.
[3] As Fischer notes, there are actually no more than about 590 million dots on the map due to his filtering process. When one zooms all the way out on the map so that the entire globe is represented in a single map tile, there are only 1,586 visible tweets, a far cry from the 6 billion number that seems so, well… big.
[4] #tautology
[5] This is qualified in this way because, as Kenneth Field pointed out in a Twitter exchange with Eric Fischer about these maps, geotagged tweets that he has consciously created from his house do not appear on the map. So while we know that all of the tweets on the map were created in that place, we can't say definitively that tweets were not also created in places where they do not appear on the map.
[6] In order to do this classification, we collected all geotagged tweets created within the defined boundaries of these two areas, and then identified those users with more than 40 tweets within either area, where those 40+ tweets represented greater than 50% of their overall geotagged tweeting activity. This concentration of activity indicates that users had a strong association with, and presence within, either area, while also making sure that no users were identified as belonging to both areas.
[7] We also see this map as complicating the conventional narrative in Louisville of 9th Street as representing a kind of impenetrable barrier within the city. But since this is less directly relevant to our argument here, we'll make you wait to hear more about that particular line of reasoning.

November 12, 2014

The (Rust) Belt of Basic-ness? Mapping the Pumpkin Spice Latte

As fall gives way to winter, we're all left clinging to the best vestiges of the passing season: the changing leaves, college football, temperatures above freezing and, for many of the most basic amongst us, the pumpkin spice latte. Debuted by Starbucks in 2004, and featuring no actual pumpkin content, the pumpkin spice latte has become a staple of fall, with Ugg boots and yoga pants-wearing women practically crawling out of the woodwork to get their hands on the thing. And while Starbucks touts that over 29,000 tweets have mentioned #pumpkinspice since 2012, we suspected there was much more to the story of the pumpkin spice latte [1]. Despite the fervor, we noticed that there's been no definitive tracking of the geographical expansion of the pumpkin spice latte as it seeks to colonize the world of regular, everyday people drinking plain ol' coffee.

Searching only for the latest manifestation of the pumpkin spice phenomena, we collected all geotagged tweets in the continental United States for September and October 2014 with references to either "pumpkin spice" or "#psl", yielding a total of 19,537 tweets. But rather than simply mapping the basic distribution of these tweets, we've instead normalized this data by tweets referencing "coffee" during the same period. Using a 25% sample of all of these coffee-related tweets -- totaling 42,696 tweets -- aggregated to hexagonal cells, we calculated the odds ratio at the lower bound of the 95% confidence interval in order to provide a bit more context and account for any number of biases within the data. Using this measure, we've identified those places with greater-than-usual numbers of pumpkin spice latte tweets relative to those tweets referencing coffee (orange), and vice versa (purple), as seen in the map below.

References to Pumpkin Spice Lattes relative to References to Coffee [2]

Based on our binary classification, it's evident that the vast majority of the country has stuck with their preference for coffee, even during the PSL's peak season. But given that our interest is in mapping the prevalence of the PSL in particular, we want to pay closer attention to the smattering of orange hexagons in the map. While there are no definitive clusters of PSL-related tweeting, if you squint your eyes you can just barely visualize a belt of pro-PSL places stretching from St. Louis up to Chicago, and from Cincinnati up to Toledo and Detroit, and from Cleveland to Pittsburgh, what we've termed "the basic belt". While this belt roughly corresponds to the vernacular region of the Rust Belt, Ohio in particular sits at the center of this pumpkin spice-loving portion of the country, representing the buckle on our belt [3]. Given this clustering of PSLs, we suspect that the Buckeye State might well be on its way to becoming the Pumpkin Spice State.
[1] Well, actually, Renee Kaufmann had this hunch. All credit for the idea behind this post goes to her.
[2] Sorry about the Web Mercator projection, y'all.
[3] Can one still wear a belt with yoga pants?

October 27, 2014

Geographies of Grits

Throughout the history of this blog, we’ve mapped any number of geographically-specific social phenomenon. But often times, we’ve been drawn to mapping things associated with the American South, whether because it’s arguably the most distinctive cultural region in the United States or because all of us have lived on its outskirts for some time or another, we’re not quite sure. But if you had to choose just one region to map using social media data, the South is probably a good place to start.

Continuing this persistent obsession, we decided to map one of the South’s most prominent culinary traditions: grits. As such, we collected all geotagged tweets in the United States from June 2012 to September 2014 mentioning the strange (read: awesome) ground-corn porridge-like dish, totaling around 64,000 tweets. Keeping in mind that geotagged tweets still represent only around 2-3% of all tweets, this figure represents a breakfast table conversation of several thousand tweets per day, and highlights the ability of this kind of social media data to provide insight into a particular cultural phenomenon that is relatively more difficult (though certainly still possible) to measure through more conventional means.

The map below represents a normalized visualization of grits-related tweeting throughout the continental United States. Using a grid of hexagonal cells, the number of grits tweets were normalized using an odds ratio by a random sample of tweets from the same time period. In this measure, a value of 1 signifies that there are exactly as many grits tweets in a given location as one would expect according to the baseline measures of tweeting, with values greater than 1 indicating that there is a greater predominance of grits tweets than one would otherwise expect. In effect, this analysis cuts out the potential for these maps to simply reproduce maps of population density, honing in on the actual phenomena at hand.

Mapping the 'Grits Belt'

Indeed, here you can see that while the South in general demonstrates a general preference for grits over the rest of the country, it is actually a relatively small number of coastal localities in the low country that have the strongest connection to grits through social media. While New Orleans represents something of an outlier in the far corner of the South, there is also a consistent band of concentrated grits tweeting stretching from just north of Charleston, South Carolina down through Beaufort (though seemingly skipping over Hilton Head, a popular tourist destination that might be understood as relationally disconnected from much of the rest of the distinctly southern culture surrounding it) and Savannah, Georgia, all the way to Brunswick.

In general, this map demonstrates the general potential of this kind of method to locate geographically-specific cultural practices in space, as well as the notion that these kinds of maps can reinforce the persistent connectedness between virtual representations of the world and people’s everyday lives and material practices. But there is more that we can do with this data by putting it into relation with other datasets. The map below does just that, by comparing our existing dataset of tweets mentioning ‘grits’ with all geotagged tweets during the same time period that mention ‘oats’. We again employ the odds ratio measure, but rather than comparing using a baseline population of tweets, we use the oats-related tweets to normalize our values. In this analysis, values less than 1 signify a preference for oats, while values greater than 1 represent a tendency towards grits. Not only does this comparison continue to affirm the identification of a ‘grits belt’ in the South, but it also highlights other areas of the country – an ‘oats oval’ stretching from the Northeast to the Midwest – that stand in stark contrast to the southeast in terms of digital porridge discourse.

Grits vs. Oats: The Emergence of the 'Oats Oval'

Thus a key avenue for analysis of digital social datasets is examining the relationships between individual users or individual messages. It is also possible to identify relationships between places, based on visits or tweets made by the same person in these different places. While we’ve already identified the South as the key locus of grits-related tweeting in the United States, it’s important to not simply ignore all of the other data points available to us that are just not quite as spatially clustered. Indeed, given the strong connections between the cultural practice of grits preparation and consumption and the vernacular region of the South, we might hypothesize that even those people tweeting about grits outside of the South are likely to have some kind of connection to the South, perhaps as a kind of diasporic community now living in other parts of the country, or even just traveling for a short period of time.

Relational 'Gritspace'

To examine this relationship, we begin by looking for users in our original dataset that have tweeted about grits more than once – yielding a total of 8,958 users – then drawing a line from the tweet locations in chronological order. The resulting map below clearly shows that there is a strong relational connection with the South for those who tweet about grits from other places, even for cities like Los Angeles that are quite distant in absolute space, as well as in terms of cultural identity. Indeed, the gravity of grits appears quite strong, as of the users tweeting about grits from outside the South, approximately 55% of these also sent tweets from inside the cluster identified in the first map in this post. So even for those grits-obsessed tweeters outside of the South, the pull of porridge remains strong… and, we would expect, even stronger when you throw a bit of cheese and jalapenos in, too.

If you’re interested in reading more about the methods used to make these maps, and about the utility of mapping geotagged social media data more generally, you can check out the following pre-publication version of a forthcoming book chapter from which this work was drawn:

Poorthuis, A., M. Zook, T. Shelton, M. Graham and M. Stephens. Forthcoming. "Using Geotagged Digital Social Data in Geographic Research". In Key Methods in Geography, eds. N. Clifford, S. French, M. Cope and T. Gillespie. London: Sage.
Abstract: This chapter outlines how one might utilize the massive amounts of web-based, geographically-referenced digital social data for geographical research. Because much of these data are user-generated and produced through social media platforms, we also focus on the pitfalls associated with such sources and the benefits of a mixed methods approach to these data. Not only can digital social data be mapped for visual analysis, it is also useful to use a range of quantitative methods to understand relationships between different subsets of the data. In addition, closer, systematic readings via qualitative methods of social data provides insights of particular people’s perceptions and experiences of the world around them. Thus, while making maps is often the starting point for geographers working with this kind of research, it is rarely the end point.

September 17, 2014

Hashtags and Haggis: Mapping the Scottish Referendum

The past weeks have been quite eventful in Scotland as a monumental election unfolds. Everyone wants to know, which way will the Scots vote? While we here at Floatingsheep certainly don't have the answer or power to predict the referendum, we thought it might be interesting to see the geographic dimension of how Scots (and the rest of the world) are tweeting about a fundamentally geographic decision [1].

We pulled data from DOLLY from the last month and a half for a number of hashtags and terms that we thought might be helpful in taking the pulse of Twitter discussion around the independence referendum. Most obviously, we collected the hashtags #VoteYes and #YesBecause due to their association with the pro-independence movement, and the hashtag #NoThanks because of its association with anti-pro-independence sentiment [2].

We started by comparing the prevalence of 'no' (i.e., pro-union) hashtags versus 'yes' (i.e., pro-independence) hashtags the global level. In the map below, orange indicates a greater prevalence of 'yes' tweets and purple indicates that there are more 'no' tweets. Perhaps the most interesting thing here is that we can see the United Kingdom swing towards a 'yes' vote, which has, for the most part, appeared to be the underdog in more conventional polling leading up to the referendum. Then again, most of Western Europe, along with Thailand and Australia, also have a general preference for 'yes' tweets. Oddly enough, the United States is the staunchest defender of the union, based solely on it's massive preference for 'no' tweets. Strange for a country that yearly celebrates its breaking away from Mother England

Comparing 'Yes' vs. 'No' Tweets at the Global Scale

Looking closer at the UK, we can see that much of Scotland has a roughly equal number of tweets in support of both the 'yes' and 'no' positions -- reflecting the contentious and hotly-contested nature of this referendum. But the Central Belt in particular -- where a lot of actual votes will be coming from, as it is the most densely populated part of the nation -- swings heavily towards 'yes'. The English, on the other hand, seem very much inclined towards pro-union or anti-separation tweeting.

Comparing 'Yes' vs. 'No' Tweets in the United Kingdom

To take an alternative look at support for the different positions, we mapped the percentage of each of the three hashtags that originates in each of the administrative sub-regions of both Scotland and the UK as a whole. The Highlands and parts of the Central Belt again show up as strong bastions of 'yes' votes.

Percentage of Referendum-Related Tweets from Different Regions

But seeing as we're interested in doing more than just mapping distributions, the next question is how are we to put all of this into context? The only proper place to start is, of course, with the Queen. The map below illustrates those places which also tend to have higher-than-normal levels of tweeting about the Queen (in orange) and those places that are tweeting less about the Queen than might usually be expected (in purple), based on a baseline measure of tweeting activity. Sadly, the whole country seems to be ignoring her. Apart from Glasgow, that is. In the interests of not upsetting an 88 year-old lady, we have chosen not to explore these tweets in any more detail.

Tweets referencing "Queen"

Building on this, we also explored the geography of references (using the same method described above) to something inherent in most people's definitions of Britishness: tea and crumpets

We see an all-around tea-depression; hardly anywhere is particularly pro-tea at the moment, truly a shocking state of affairs. The British are clearly not being their usual selves, and for their sake we're glad the referendum will be over soon, regardless of the outcome. Scotland, in particular, has average tea counts that are low by historical standards.

Tweets referencing "tea and crumpets"

This analysis would, of course, all be meaningless unless we mapped the geographies of a range of uniquely Scottish phenomena: haggis [3], kilts and Nessie. Still using the same method as above, the map below shows without a shadow of a doubt that Scotland is destined to become its own nation.

Tweets referencing "haggis", "kilts" or "Nessie" 

The Scots are tweeting about these topics at a greater-than-usual rate, while their southern neighbors remain distinctly uninterested. If ever there were an indication that these nations are divided by more than just a line on a map, we see that manifested in the topic of people's Twitter conversations. In short, the Scottish referendum is not just simply about "yes" or "no" but seemingly touches on much more fundamental questions of ovis-based cuisine, men's wear and mythological creatures.

So even if the 'no' votes win out in and the Kingdom remains united, the geographies of haggis related tweeting (along with a few other things) has revealed that these are two very different nations, indeed.

UPDATE (9/18/14 @ 12:45pm):
We've added another map to our analysis below, which shows the relative prevalence of #VoteYes and #NoThanks tweets throughout Great Britain, at the level of administrative sub-regions, rather than the hexagons used above. This map makes for a stark contrast between the English (and Welsh) and the Scottish... while there are a few areas of Scotland that show relative parity between 'yes' and 'no' tweets, most of the nation demonstrates a relatively strong prevalence for 'yes', while much of England demonstrates at least a slight preference for 'no'. 

[1] In case you don't know what Twitter, is we refer you to the Scots Wikipedia page on the subject, which states: "Twitter is an online social networkin service an microbloggin service that enables its uisers tae send an read text-based messages o up tae 140 characters, kent as 'tweets'".
[2] Perhaps we could have simplified this phrasing, but then we would have lost the chance to type "anti-pro-independence", which is a lot of fun. Anti-pro-independence. Anti-pro-independence.
[3] Normally the Floatingsheep collective avoids conversation about sheep heart, liver, and lungs that are boiled in a sheep stomach. But we made an exception this time.

September 10, 2014

Mapping #RussiaInvadedUkraine

From snarky exchanges between official Canadian and Russian Twitter accounts, conflicting representations of Crimea in Google Maps and OpenStreetMap, and a recent piece by Peter Pomerantsev in The Atlantic on how Vladimir Putin is revolutionizing information warfare, the ongoing conflict in the Ukraine has been widely reflected online. One particular manifestation of this conflict on social media was the #RussiaInvadedUkraine hashtag, which emerged at the end of August as Russian troops appeared in Eastern Ukraine. The hashtag has served as a social media rallying point for supporters of Ukraine, with the New York Times reporting that in the first day of its existence, over 500,000 tweets using the hashtag were sent.

Wondering what the spatial distribution of this hashtag looked like across Europe, we fired up DOLLY and collected all geotagged tweets containing the hashtag sent from European countries between August 27th and September 7th, 2014, resulting in approximately 4,500 tweets. To control the effect of single, very active, individuals sending many tweets -- and to better represent aggregate rather than individual actions -- we only included the first five tweets from any single user, resulting in a total of about 2,100 geotagged tweets. These tweets were aggregated to the country level and then normalized by the total number of tweets sent during this same time period, resulting in a location quotient for each country. The location quotient indicates the relative prevalence of tweets containing the #RussiaInvadedUkraine hashtag compared to the overall level of Twitter activity during this same time. Values greater than one indicate that people in a given country contributed a greater number of tweets about this topic than would be expected based on usual tweeting levels, with values less than one meaning that they were underrepresented in tweets using this hashtag than one might usually expect.

The map above illustrates a strong concentration of the #RussiaInvadedUkraine hashtag in countries that are nearer to the Russian border. In short, a classic example of a distance decay function, in which distance from a phenomenon is inversely related to attention or presence of a given phenomenon. In general, most countries within Eastern Europe -- including Russia itself -- show a higher level of Twitter activity around this hashtag, with some exceptions such as Moldova, Slovakia and Romania. In particular, however, the Ukraine and its neighbor Belarus show an extremely high level of activity around this issue, with the Ukraine alone contributing roughly 48x more of the tweets using this hashtag than it did to the baseline sample used for normalization. Conversely, as one moves westward, the level of participation in this social media meme drops considerably. While Germany, which is both geographically and relationally more proximate to Eastern Europe has a location quotient of just 0.81, the Netherlands and Italy have scores of 0.25 and 0.22, with the UK and France having extremely low location quotient values of just 0.05.

Of course, Twitter is not an unproblematic representation of the population, and tweets containing this hashtag can express a range of sentiments from both sides of the conflict [1]. Quite clearly, the interest and official response from western states (and their militaries) is not tied to the level of popular participation in social media activism. Instead, as we showed in the case of tweeting related to Ferguson, Missouri and the protests around the shooting death of an unarmed black teenager at the hands of a police officer, geography matters when it comes to directing our attention to news and current events, with people more directly connected to these issues having a much greater level of interest and concern [2]... even when it involves the invasion of military forces from one country into their neighbor.

[1] Although a quick review of the text reveals that these tweets are primarily critical of Russia's actions.
[2] Of course, this shouldn't be particularly surprising. But for some people, it is.

August 18, 2014

Mapping Ferguson Tweets, or more maps that won't change your mind about racism in America

This post is the culmination of the Inaugural #IronWilson Map-a-Thon, held on Saturday, August 16th, and is the result of a collaboration between Matthew Wilson, Eric Huntley, Ryan Cooper and Taylor Shelton. 

A little over a week ago, the streets of Ferguson, Missouri, a suburb of St. Louis, were disrupted by the shooting of 18-year-old Michael Brown by Ferguson Police Officer Darren Wilson. While the details have been slow to emerge, the reaction to the killing of yet another unarmed young black man has been anything but -- whether in the form of street protests in Ferguson, or the online reaction to the news as seen on Twitter. The following graphic, produced by Twitter and published online by the Washington Post, demonstrates a typical representation of what we might call ‘#hashtag frenzy’, as people around the country take to Twitter to react to and comment upon the news. 

While certainly flashy and eye-catching, these so-called ‘animated ectoplasm maps’ tend to be short on meaningful insights. These visualizations show little more than population density in the US, and are remarkably similar from one trending topic on Twitter to the next. There is no attempt to normalize the data by population or overall levels of tweeting in a given place, thus obscuring both more detailed spatial patterns and broader social meanings that might be drawn out of such data. Still, maps such as this are useful in demonstrating the waxing and waning attention span toward issues of social importance, including the registering of yet another gun-related and police-initiated violent event, something that this blog post itself contributes to; therefore, in full admission of ‘yet another Twitter map of racial violence’...

We collected all geotagged tweets referencing a series of keywords -- 'Ferguson’, ‘handsup’, ‘mikebrown’, ‘dontshoot’ and ‘handsupdontshoot’ -- from Saturday, August 9th when the shooting occurred through the morning of Friday, August 15th, in an effort to provide a bit more resolution and, hopefully, insight into the ways and places people were tweeting about the protests. Starting with the first geotagged tweet referencing the shooting, we collected a total of 38,450 tweets. 'This tweet came from user Johnny__Tapia at 3:11pm Central Time on Saturday, saying “Ferguson police just shot a kid in the head in the middle of the street. 17 yrs old. Ain’t nobody saying what he did” [1].

In the several days following the shooting, news spread quickly over Twitter, with social media providing a key source of updates and information in lieu of any official reports or communication from the Ferguson Police Department. The map below, made by Eric Huntley, aggregates all the tweets in this dataset to hexagonal cells across the continental United States, and normalizes it relative to the overall amount of tweeting in that location at the same time. In other words it shows the relative focus of tweeting related to Michael Brown’s shooting (and the subsequent protests and police crackdown) compared to overall tweeting activity by location. In this map lighter shades indicate relatively more tweets about Ferguson than the national average.

The national and international media coverage of the story in Ferguson points toward the notion that this event transcends the local; there is something about it that speaks to people from any number of places and walks of life. For example, the aforementioned WaPo article ends with the relatively meaningless maxim, “People are watching from as far away as Fiji and Ghana. That's the world we live in now.” While discussions of increased police militarization and the persistent legacy of racism have certainly resonated strongly with a national audience, it is evident from our more-than-just-dots-on-a-map approach that the tweeting around this event is actually most prevalent in the general vicinity of where the shooting occurred: the St. Louis metropolitan area. The proportion of tweets on the topic is higher in and around St. Louis than anywhere else in the country, while other cities around the country have largely continued about their business, with lower levels of Ferguson-related tweeting relative to overall levels of Twitter activity [2]. While a few scattered and isolated areas throughout the country demonstrate a relatively high amount of tweeting about Ferguson -- mostly as a result low overall levels of tweeting -- the St. Louis region is really the only place that demonstrates a particularly concentrated and significant interest in the matter. In other words, "the world we live in now" is one in which spatial proximity and social connectedness remains incredibly important, even if people in Fiji and Ghana can follow along, too.

This lies in contrast to the aftermath of George Zimmerman receiving a ‘not guilty’ verdict last summer in his trial for the shooting death of Trayvon Martin, which is arguably the best parallel in terms of the public outcry and attention to the current Ferguson situation. Following Zimmerman’s verdict, large portions of the American South demonstrated a greater likelihood to use the #JusticeForTrayvon hashtag than other parts of the country, which we interpreted as indicative of Twitter users making connections between the events in Sanford, Florida and the broader legacies of racialized violence throughout the American South. Whether the different geographies of Twitter's reactions to these events are the result of different temporal evolutions (the immediate aftermath of the shooting vs. the trial verdict a year and a half later) or in divergent experiences, or perceptions, of racism between the South and Midwest [3], or something else entirely, is left to some level of speculation.

Despite the overall concentration of tweets in the St. Louis region, it also important to recognize that spatial unevenness exists at multiple scales, with respect to practically any phenomena. Indeed, the ability to examine such phenomena at a variety of scales is one of the major advantages to aggregating these points -- or individual tweets in this case -- to a uniform grid of hexagonal cells, as opposed to the more conventional, and largely arbitrary, Census-defined areal units. In the GIF below, created by Matt Wilson, you can see the spatial distribution of raw (i.e., non-normalized) tweets -- using the same dataset -- in the St. Louis metro area over time, beginning on Saturday when the shooting occurred, through the end of Thursday, August 14th [4].  

Given our lack of first-hand knowledge of St. Louis and its environs, we’re hesitant to draw too many conclusions from this data, though we certainly welcome any potential explanations from our readers. Because each of these snapshots is classified in the same way, we can see the diffusion of the news and growth in interest over time, becoming much more pronounced beginning on Monday. Tuesday is interesting insofar as it seems to demonstrate a much stronger clustering around Ferguson itself (the cluster of three dark blue hexagons north of downtown St. Louis), with the rest of the city actually seeing a decrease in tweeting about the event. This interest, especially in downtown St. Louis, ramps back up on Wednesday and Thursday, around the time of growing protests and the increasingly violent response from the Ferguson Police.

Ultimately, despite the centrality of social media to the protests and our ability to come together and reflect on the social problems at the root of Michael Brown's shooting, these maps, and the kind of data used to create them, can’t tell us much about the deep-seated issues that have led to the killing of yet another unarmed young black man in our country [5]. And they almost certainly won't change anyone's mind about racism in America. They can, instead, help us to better understand how these events have been reflected on social media, and how even purportedly global news stories are always connected to particular places in specific ways.

[1] It appears that this user has since deleted all of his tweets back to July 10.
[2] There is still a significant absolute amount of tweeting in these places, there just also happens to be a generally massive level of tweeting about other topics, as well.
[3] Of course, St. Louis, like pretty much everywhere in the United States, has it’s own important legacies of racism. For example, please see: Deep Tensions Rise to Surface After Ferguson ShootingThe Most Racist City In America: St. Louis?, and The Century-Old Urban Policy That Divides St. Louis.
[4] These maps also use a somewhat smaller sample of tweets that have only exact latitude and longitude coordinates, so as to avoid using those tweets tagged to place names, such as ‘St. Louis’, which might give the impression that there were large contingents of tweeters at the geographic center of the city.
[5] Though data about racial profiling, as Ryan Cooper analyzed for us here, can point towards some potential explanations.

August 17, 2014

Mapping the #LouisvillePurge

The only way to introduce this post is to say that yes, a bunch of really naive and/or, in the case of the local television news media, willfully idiotic, people thought that there was going to be a 'purge' -- a 12 hour period where all crime is made legal -- in Louisville, Kentucky on the night of Friday, August 15th, 2014. Starting with a single tweet from a local high school student, things quickly grew out of control, with #LouisvillePurge becoming a trending topic nationally by the time things were all said and done. While the best tweets referencing the purge made light of the phenomena, there were many, many more expressing confusion, fear, bewilderment and a desire to save the poor souls who might have been convinced to participate in such an event. But for all the attention given to the role of social media in spreading the hysteria [1], there's been no attempt to look at the where some of these tweets were coming from, and how the news spread over space and time.

While the tweet that kicked the whole ordeal off was created at 8:32pm on Sunday, August 10th, the first geotagged tweet with the #LouisvillePurge hashtag didn't show up for another couple of days, at 11:33pm on Wednesday, August 13th. Beginning with that tweet, we collected all geotagged tweets with the hashtag through noon on Saturday, August 16th, at which point things were dying down.

The map below shows the overall distribution of these 4,351 geotagged tweets, aggregated to hexagonal cells across the continental United States. While Louisville and the surrounding areas clearly have the highest concentrations, the discussion of the Louisville Purge was truly trans-local, with less than 25% of the total number of geotagged tweets coming from the Louisville Metro area. Of areas further away from Louisville in absolute distance, Houston, Dallas and Los Angeles represent some of the highest concentrations of tweeting about the (non-)event.

All #LouisvillePurge Tweets thru August 16th at 12pm EDT

But perhaps more interesting than just the overall spatial distribution is how this distribution evolved over time, from the first geotagged tweet all the way through the cycle of hype and hysteria that led the Louisville Purge to be featured on any number of national news websites. In the series of maps below, we have divided all of the tweets in our dataset into a series of (more-or-less arbitrary) time frames that give a good idea of when and where the news spread to other parts of the country [2].

The lead up to the purge demonstrates a relatively localized phenomenon within Louisville, though it's interesting that there is some extra-local tweeting from the very beginning, with a very small number of tweets coming from outside the state in West Virginia, Kansas, Texas and Florida. There were only a total of 182 geotagged tweets referencing #LouisvillePurge in this 44-hour aggregate time span, with tweets originating in Metro Louisville representing 55%, 66% and 60% of the total number of tweets with the hashtag during the three periods, respectively. In other words, talk of the purge spread quite slowly over the course of the week.

Time #1: 42 tweets
From August 13th at 11:30pm to August 15th at 6am

Time #2: 36 tweets
From August 15th at 6am to 4pm

Time #3: 104 tweets
From August 15th at 4pm to 8pm 

The number of tweets with the hashtag exploded right around 8pm on Friday night, the 'official' start time of the purge. This four hour time period represents the peak of tweeting activity around #LouisvillePurge, attributed largely to the fact that this is when the event started to diffuse outward beyond the city's boundaries to places both near and far. One can see both a significant increase in the amount of tweets across Kentucky, as well as to far-off cities like Los Angeles, Milwaukee, D.C., Philadelphia and New York City. From 8pm to 12am, the 757 tweets from Metro Louisville represent only 30% of the 2,533 tweets across the country, further highlighting the spatial diffusion of information about, and interest in, the purge. In fact, this measure of locally-concentrated tweeting drops even lower to less than 10% from the hours of midnight to 6am (when most Louisvillians would be asleep), though it again rebounds a bit higher to 23% during our final time span of 6am to noon on Saturday the 16th, after the purge has 'officially' ended.

Time #4: 2,533 tweets
August 15th at 8pm to August 16th at 12am

Time #5: 1,420 tweets
From August 16th at 12am to 6am

Time #6: 216 tweets
From August 16th at 6am to 12pm

Like our earlier research on #LexingtonPoliceScanner in the wake of the 2012 Kentucky Wildcats basketball championship, we can clearly see an ebb and flow in the way the event originates in a fairly localized area before gaining a larger following and eventually slowing down and becoming more localized again as many users reflect upon the aftermath. But unlike the attention paid to the #LexingtonPoliceScanner in large cities around the country, and especially the South, the interest in the #LouisvillePurge tended to be somewhat more diffuse, without any single location outside of the city or state paying a disproportionate amount of attention to the events.

In the end, we're happy to report that all of the Floatingsheep emerged from the purge unscathed and thoroughly amused, and we hope the same can be said for all of you and your loved ones. And do remember, don't trust everything you read on Twitter [3, 4]!

[1] Again, it's probably worth noting -- somewhat ironically, I suppose -- that despite the rumor originating and being passed around via social media, it was the traditional local television news networks whose willingness to believe and highlight the rumor drove further attention to the situation, which was almost obviously a farce from the very beginning.
[2] You can also access an animated GIF version of this time series map here.
[3] Especially if you are supposed to be a "real journalist"!
[4] For that matter, don't trust everything you see on the television news, either!

July 22, 2014

How Many Hobbits Could Chuck Norris Take In a Fight?

Inspired by the (relatively) recent Buzzfeed quiz, "How Many Five Year Old Children Can You Take In a Fight?" [1], we have been wondering about other potential battle royale matchups: Juggalos vs. Bronies, Juggalos vs. polar bears, Justin Bieber vs. Miley Cyrus and even goats vs. llamas

Perhaps our favorite attempt at recreating this kind of scenario is asking: how many hobbits could Chuck Norris take in a fight? The analysis was quite complex as we had to first set rules on the engagement (e.g., what kind of weapons? is mithril armor allowed or not? etc.) and decide which version of Chuck Norris (Walter, Texas Ranger Chuck Norris? Actual current Chuck Norris? Perhaps Delta Force Chuck Norris?) and what kind of hobbits (after all are we talking Brandybucks or Tooks? are these typical Shire hobbits or have they been abroad? etc.) we are talking about here.  

As you can suspect, there was a lot to sort out. But after much discussion and analysis we have come up with a clear answer but sadly, as the actual question has nothing to do with this blog, we've been forced to bury it in the footnotes [2]. What we can do, however, for the purposes of this blog is compare the distribution of references to hobbits, as opposed to references to Chuck Norris, in geotagged tweets. Starting from a 10% sample of all global geotagged tweets from July 2012 through March 2014, we collected all references to "hobbit*" and "Chuck Norris" to enable our comparison.

Hobbits vs. Chuck Norris, July 2012-March 2014

At the global level, there are actually quite comparable numbers of references to hobbits and Chuck Norris, thus making the location and scale of our hypothetical battle all the more important. There are 27,527 references to the man on Superman's pajamas, and 24,145 references to those short little guys with hairy feet.

What is evident, however, is that Chuck Norris isn't particularly popular anywhere but in the United States, as nearly half of the global references to him come from the USA, giving him a nearly 9000 tweet advantage over hobbits. Perhaps not everyone else in the world finds quite as much humor in the many Chuck Norris Facts as Americans do? Or perhaps other countries have their own Chuck Norris-like cult heroes to look up to [3]? The next closest country in terms of Chuck Norris appreciation is France, with just 250 more Chuck Norris tweets than hobbit tweets, followed up by South Africa, Nigeria and Puerto Rico in the top 5 countries favoring the man who predicted 1000 years of darkness were Barack Obama to be re-elected President of the United States.

Meanwhile, the top 5 countries favoring hobbits are Indonesia - where they hold a 2,141 tweet advantage - Turkey, Mexico, Spain and Malaysia, each of which have a greater than 500 tweet advantage for hobbits over Chuck Norris. A total of eleven countries have more than 100 more references to hobbits than Chuck Norris, a considerable feat given that only the top 3 Chuck Norris countries have a more than 100 tweet advantage.

In many ways, the pattern in this map is a replication of that from our recent map comparing references to Bieber and Miley; just as the only places with a real preference for Miley Cyrus were the USA and a smattering of African countries, so too are these the only places with a significant preference for Chuck Norris. Does this mean there is some sort of Chuck-Miley conspiracy afoot? Or that Bieber has taken command of an army of hobbits in his quest for world domination? We'll leave it to you to find out...

[1] See also: How many Justin Biebers could you take in a fight? How many 90 year olds could you take in a fight? How many hipsters could you take in a fight?
[2] The answer is zero.  Because hobbits are actually just fictional characters and Chuck Norris is a real living person. See? Sometimes there are clear and easy answers to tough questions.
[3] Ironically, of course, Kenya seems to display a slight preference for Chuck Norris over hobbits, despite Makmende's imposing presence.

July 08, 2014

A Quick Look at Global Language Patterns on Twitter

Today's post is derived from some testing we were doing within our data on language and since the results were interesting, we thought we'd share. This is a first step of a longer process of comparing language use at the global scale so much remains to be done.

Starting from a 10% sample of all global geotagged tweets from the calendar year 2013, we collected tweets that used a variety of non-Latin characters as a proxy for linguistic prevalence (see the map titles below for the list of characters searched). Using composite counts of what we found to be the five most commonly used characters in each of the given languages, we mapped normalized values at the country level in order to understand where these languages are most dominant. In other words, these maps represent the relative level of tweets containing non-Latin characters compared to all tweets; the US has plenty of tweets with Arabic, Chinese and Korean characters but these numbers are small compared to the overall number of tweets within the country.  

There are some issues with the data we collected -- for instance, we relied on non-definitive sources for our list of the most commonly used characters, and the constraints of the way we've structured our data makes (how we treat boolean queries and computing constraints) make our data somewhat incomplete. But still the initial results provide a reasonable snapshot of where Twitter is being used by people who don't speak languages which can be easily expressed in Latin characters. 

Arabic Characters:   ل   ن   م   ي   ا      

The spatial pattern of Arabic-language tweeting is interesting in that it seems to mimic a conventional distance decay effect. Saudi Arabia is the undoubted center of Arabic tweeting, with its immediate neighbors having relatively lower amounts, with their immediate neighbors having even lower concentrations, with practically no discernible differences once you reach Sub-Saharan Africa to the south, India to the east, or Europe to the north and west.

Chinese Characters:   的   一   是   不   了

While Japan has the highest absolute number of tweets containing Chinese characters, due to the fact that the Japanese language relies on written Chinese characters, the relative measure shows China to, quite unsurprisingly, be the center of Chinese-language tweeting. The territory of Greenland shows up as well, mainly because of the relatively low number of total tweets making the few tweets with Chinese characters relatively more frequent. We could, of course, account for this by requiring certain thresholds but for this initial look, we left it in. Given the increasing dominance of China within the global economy, it's somewhat interesting to see that there is very little Chinese-language tweeting happening in other parts of the world.

Korean Characters:   뭐   그   안   근데   거

The final language we explored was Korean and while it is not surprising that South Korea has by far the most Korean tweeting, it is interesting to note that North Korea, despite its almost complete disconnection to the global system, also appears on the map. Again, it seems that the scattering of relatively high scores for places such as Greenland and Somalia has more to do with the relatively low level of overall tweeting in these places than with some previously unknown concentration of Korean-speakers.

While there's not much definitive here, we believe this to be a useful, if incredibly brief, look at how online spaces such as Twitter remain connected to conventional, offline geographies, such as those of language and culture. And given the recent emergence of domain names in non-Latin characters, these maps might offer clues into the evolving geography of domain names, while also offering some potential for future research using such data.