January 09, 2015

Mapping the Twitter Reaction to the Charlie Hedbo Attack

Following the attack on the offices of the French satirical magazine Charlie Hebdo, Twitter -- and those who make maps of it -- were all aflame with discussions, speculations and conclusions. In order to process the geographic extent of the reaction to the Charlie Hebdo attacks, we collected approximately 73,000 geotagged tweets created in a roughly 36-hour period from January 7th to noon (EST) on January 8th, that contained either of the hashtags: #charliehebdo OR #jesuischarlie (English translation: 'I am Charlie').

We then aggregated these tweets to the country level and normalized these tweets by a random sample of tweets in each country during the same time period [1]. We excluded countries that did not meet a minimum threshold of activity (15 tweets) to exclude places with extremely low levels of engagement. The map below was created by Rich Donohue, a post-doc at the University of Kentucky Department of Geography, whose cartography will be showing up on the blog more in the near future. The interactive version of the map allows you to pan, zoom and select specific hashtags to reveal different patterns between the tweeting.

Normalized Distribution of Geotagged Tweets 
containing either #CharlieHebdo or #JeSuisCharlie
Click here for an interactive version of this map.

Those countries shaded in orange demonstrate a greater level of Charlie Hebdo-related tweeting than one would expect based on typical levels of tweeting [2], while those countries shaded in blue demonstrate a lower amount of tweeting than one might expect [3]. Countries shaded in grey failed to meet the minimum threshold of tweeting activity to be included, while the handful of countries in red -- France, Belgium and French Guyana -- have the highest relative number of Charlie Hebdo-related tweets.

As expected, the reaction to the Charlie Hebdo attack has mostly captured the public's attention in Europe, especially (and unsurprisingly) in France and Belgium, with a seeming distance decay effect as one moves away from Paris. But outside of Europe, one can see greater levels of tweeting about the attack in countries with historical -- often colonial -- ties to France, such as Algeria, Tunisia, Senegal and Canada, as well as French Guyana which has significantly more tweeting about the attacks than one would expect based on usual levels of tweeting [4]. Other countries, such as Australia, India and Pakistan, also demonstrate significant levels of tweeting about the attacks, but don't have the same kinds of historical connections to France that might explain such heightened awareness.

Countries with the Greatest Relative Number of Tweets 
containing either #CharlieHebdo or #JeSuisCharlie 
Note: A location quotient greater than 1 indicates a relatively higher higher level of tweets with hashtags relative to the normal amount of tweeting taking place. A location quotient less than 1 indicates a relatively lower higher level of tweets with hashtags relative to the normal amount of tweeting taking place. 

In addition, there are a number of noteworthy patterns that we wish to highlight although are not prepared to explain at this time.

While such patterns are fairly obvious and could easily be predicted, the data leave us with a number of lingering questions that we don't have ready answers for. For instance, why is there a greater level of attention to the attacks in India and Pakistan than in Turkey or Egypt, which are both nearer in absolute distance and, in some ways, social distance to the attacks in Paris? Why are Canadians more focused on the issue than people in the United States? Why are people in the United States roughly 15x more interested in the Charlie Hebdo attacks than in the attempted bombing of an NAACP branch in Colorado?

It's also interesting to explore the differences in how each hashtag is used, and how this effects the spatial distribution of the tweets. Is the use of #charliehebdo a simple indicator of attention to the event, while #jesuischarlie indicates solidarity with the magazine? For example, the UK has a relatively low amount of #charliehebdo tweeting (LQ = 0.84) but a much higher level of #jesuischarlie activity (LQ = 1.35). In contrast, other nearby countries such as Spain, Portugal, Algeria, Morocco, Tunisia have relatively more #charliehebdo than #jesuischarlie activity perhaps connected to a more fraught relationship with local populations and the satire contained within Charlie Hebdo cartoons. To be clear, the causes behind the observed patterns require much more in depth work than we can provide here and now.

Moreover, as always it's important to think about what kinds of discussions aren't captured in this particular dataset, such as discussions of the attacks in Arabic-speaking countries such as Saudi Arabia or Egypt, which use entirely different alphabets than we used in our search. While we don't want to read too much into these differences without further research, these issues do represent potentially interesting differences in the use of social media, both across space and different social groups.

It is also useful to track the distribution of tweets over time, which began shortly before noon Paris time and peaked approximately ten hours later.

Number of Geotagged Tweets Overtime (in ten minute blocks)

While we have surely raised more questions than we have answered in this post, hopefully this early attempt at mapping the response to the attacks provides some further food for thought for those wishing to delve deep into understanding the nature of the attacks and the response to them via social media.

-----------------
[1] We used the following formula (location quotient) to normalize the data:

(# of tweets with hashtags in country / # of total tweets in country)
--------------------------------------------------------------------------------
(# of tweets with hashtags globally / # of total tweets globally)

[2] With a location quotient greater than 1.
[3] With a location quotient less than 1.
[4] There were a number of Francophone African countries that had high location quotients but were excluded from this map because they did not meet the threshold of 15 tweets. This includes Côte d'Ivoire, Gabon, Burundi, Benin, Togo and Congo.  Other countries with strong ties to France -- New Caledonia, Fiji, and Saint Martin -- exhibited similar patterns.

December 31, 2014

The Best of Floatingsheep in 2014

With yet another year coming to a close, we thought it a good time to reflect upon yet another year of sheepish maps and blogposts, recounting what we have accomplished, perhaps mostly so that we don't dare attempt such goofiness again. And so we give you the Top 10 Floatingsheep posts of 2014, ranked according to the number of page views each received. Feast on these last remnants of 2014, and a happy new year to all!

#1 The Drama of Llamas vs. the Gloating of the Goats 
What was thought to be something of a throwaway post came from the shadows to become 2014's most viewed blogpost, largely thanks to some Redditors who took the map a bit too seriously, if we do say so ourselves.


#2 New Book Chapter on the Geographies of Beer on Twitter 
Based on some great work by Matt and Ate, the map below (and others from the same book chapter) has become a staple of Vox's explanations of alcohol this year... see here, here and here.


#3 Mapping Ferguson Tweets, or more maps that won't change your mind about racism in America 
The product of the Inaugural #IronWilson Map-a-Thon, this map and post was our attempt to counter some problematic uses of geotagged Twitter data in relation to the then-nascent protests in Ferguson, Missouri, and highlight the persistent limitations of this sort of work when dealing with issues as complex and fraught as violence and structural racism.


#4 Mapping the Seven Dirty Words 
One of the biggest missed opportunities from the 2014 IronSheep dataset, our series of maps of George Carlin's infamous seven dirty words didn't yield a whole lot except for excrement.


#5 Hashtags and Haggis: Mapping the Scottish Referendum
While the Scottish ultimately decided to remain a part of Great Britain, some of our maps helped to demonstrate persistent cultural divides between the English and the Scottish, and the fact that "the Scottish referendum [was] not just simply about 'yes' or 'no' but seemingly touche[d] on much more fundamental questions of ovis-based cuisine, men's wear and mythological creatures". Indeed.


#6 Artists, Bankers, Hipsters and the "Bro-ughnut" of New York: Mapping Cultural-Economic Identities on Twitter 
Some more work by Ate and Matt for a journal article yielded the discovery of what will surely be recognized in time as one of the most fundamental geographical phenomena known to humankind: the 'Bro-ughnut' of New York.


#7 Hey Y'all! Geographies of a Colloquialism
There are few places as distinct as the American South when it comes to cultural patterns expressed through geotagged tweets, as our mapping of references to "y'all" helped to confirm.

#8 Crowdsourcing Cake or Death?
While the choice between cake or death seems like an obvious one, our maps of references to these terms yielded a much different -- and troubling -- result.


#9 Are there really more juggalos than polar bears?
"As our analysis has shown, there is more to the story of juggalos and polar bears than meets the eye. Clearly, there are more references to polar bears than to juggalos, both globally and in the United States. But the relationship between these two is considerably more complex and contradictory than is assumed by David Cross and his ilk. Obviously more research is required as ten-second gifs are not up to conveying the complexity of the juggalo-polar bear ecosystem."


#10 The Epic Tweet Fight of Bronies and Juggalos
Despite Lexington, Kentucky being at the center of a online controversy around a Bronies vs. Juggalos street fight, the Floatingsheep home base didn't have much online activity around these two subcultures. In fact, when taking the epic street fight online and evaluating the epic tweet fight, we couldn't help but declare it a draw.

December 25, 2014

Are we more interested in XXX or Xmas?

This holiday season we decide to ask the questions that really matter. As people celebrate Christmas, we wanted to know how people around the world are mentioning the holiday. And, perhaps more importantly and interestingly, how mentions of Christmas stack up against mentions of a more sexual and consumption-oriented nature. 

So, we decided to compare mentions of 'Xmas', 'XXX', and 'Xbox. 




The formula that we used (for XXX tweets for example):  


(Sum of XXX tweets in square / Sum of XXX tweets globally)
---------------------------------------------------------------------------------
( sum XXX+Xbox+xmas in square /  sum XXX+Xbox+xmas globally)

We see some important global differences. Americans (as well as the French and Spanish) are most interested in Xboxes. Strangely, the Japanese and Nigerians seem to be most fixated on Christmas. And the British, Dutch, and Italians more interested in X-rated content: giving a whole different meaning to reflections on who has been naughty and who has been nice. 

December 18, 2014

Deconstructing the (most detailed tweet) map (ever)

If you’re the kind of person who visits our blog with any regularity, you’re almost certainly also the kind of person who would have seen some version of the map below in the last couple of weeks. Created by Eric Fischer of Mapbox, this map was released along with a blogpost entitled “Making the most detailed tweet map ever”, discussing some of the data cleaning and visualization methods necessary to produce such a striking map. The map is undoubtedly interesting and has sparked a great deal of interest from all corners of the internet, but there’s just something about the framing that rubs us the wrong way. While Eric’s post emphasizes the making part of the equation, the internet hype cycle around it has caused us to read the title a bit more along the lines of:

"Making THE MOST DETAILED tweet map EVARRRR!!!!"

That is to say, for all of the admittedly really great detail about what went into making this map, the framing of this map as not only a detailed map of six billion or so geotagged tweets, but as the most detailed tweet map ever, raises more questions than it answers. For example, what constitutes ‘detail’ in tweet maps? What do competing definitions of ‘detail’ reveal about what we value in this kind of analysis? What do these particular ideas of ‘detail’ foreclose in terms of other possibilities for analysis?

These are important questions, regardless of whether they’re applied to this particular map or any other one. The issue in this case, however, seems to be that the answers to some of these questions conflict with one another, or with the ways the project is itself described. The detail that seems to be valued here is of the “every tweet ever” variety, or, put simply “more = better”, the fetish for bigger data at the expense of all else.

But more data isn’t necessarily better, and it certainly doesn’t mean that there’s more detail, especially when the only bit of detail you're concerned with in each of these six billion points is the latitude and longitude coordinates. Each of these individual tweets contains a wealth of other interesting information, from information about the user and the way they describe themselves, to the time the tweet was created to the text of the tweet itself, which might contain hashtags that link up with bigger conversations, or @-mentions to other Twitter users that might be used to understand social networks and interactions. All of these bits of information represent a kind of detail that is not included in this, the most detailed tweet map ever

As we’ve been arguing for the past two years or so, there are a range of social and spatial processes represented in geotagged tweets that we can’t get at if all we’re concerned with is the latitude and longitude coordinates. So to say that this represents the most detailed tweet map ever serves to reify what we see as two of the most problematic assumptions of contemporary big data/social media research: (1) that more data is equivalent to better data, and (2) that the only important aspect of the data is the geographic coordinates attached to it. There's lots of interesting stuff that can be done with this kind of data, and we can do better than simply plotting points on a map and calling it a day [1]

Even if one were inclined to accept the argument that more tweets equals more detail, how should we interpret the fact that this map only visualizes about 9% of all geotagged tweets, due to the design decisions necessary in order to make the map nice and pretty [2]? Due to the existence of exact or near-duplicate coordinates that would make points indistinguishable from one another, this, the most detailed tweet map ever, actually eliminates about 91% of the detail that it seems to value most (i.e., the presence or absence of points on the map). The Gizmodo headline about the map reads, “The Most Detailed Tweet Map Ever Includes 6,341,973,478 Tweets”... except that, you know, it doesn’t [3].

Of course, there’s also good bit of imprecision in the locational accuracy associated with geotagged tweeting; our iPhones don’t come with military grade GPS units installed in them. So while Mapbox CEO Eric Gunderson was marveling at the detailed micro-geographies of an airport gate seen in the map, he was ignoring both the fact that all of those folks on the jet bridge could just have well been 40 feet away, and that a number of tweets might have been eliminated from the initial dataset due to a lack of precision in the geotagging process. Take all of that together and a lot of the detail that’s being celebrated here starts to give way to fuzziness. This map is more art than science, though the striking visuals and discursive framing give the illusion of precision and absolute insight. 

To be clear, there’s no problem with fuzziness. It’s something we all live with every day, it’s something we academics may embrace from time to time through the use of overly obtuse language. But taking all of this fuzziness and then repackaging it as the most detailed tweet map ever, comes off a bit wrong to us. These initial misgivings were only amplified when brought down to a more local level, when we saw a post from a local urbanist blog in Louisville wondering “What we can learn from where people in Louisville are using Twitter”. While relatively mundane, and certainly not nearly as celebratory, the blog’s ultimate conclusion was that "These locations [with the highest concentrations of tweets] make sense as they are places where people gather and are often held captive by events.”


This, in general, is true, but also a bit… how do we put it? Meh. More fundamentally, people tweet where people are. It comes as no surprise to anyone with even the vaguest familiarity with Louisville that people tweet in larger numbers from downtown (including 4th Street Live!), the University of Louisville campus, Bardstown Road and the St. Matthews / Oxmoor Mall area than anywhere else in the city. These are (some of) the primary gathering points on a day-to-day basis within the city.

But just identifying these locations doesn’t really help us to ‘learn’ anything beyond the fact that those are, indeed, the places with the highest concentrations of geotagged tweets in Louisville [4]. In fact, the map doesn’t even really show us actual concentrations of tweeting activity, but rather concentrations of unique tweeting locations. Take, say, two hypothetical city squares, one of just 50 x 50 meters, and another much larger one of 500 x 500 meters, both the originating point of one million geotagged tweets spread randomly over the squares. In Fischer’s method, these two squares would not 'glow' in equal amounts, but rather the larger square would show up as much more visually prominent because it has many more unique tweeting locations while many of the tweets from the smaller square would be filtered out due to a duplication of coordinates.

Further, from a data collection standpoint, all of these tweets in Louisville reveal little that isn't revealed by mapping a random sample of tweets (say 1% of tweets from 2013, see map below). If all we’re really concerned about is the question of where people are tweeting from, there isn’t much that looking at all the tweets reveals that couldn't also be found from a smaller subset, and it’s much easier to collect or analyze a few hundred thousand tweets than it is to collect 6,341,973,478 of them. But even still, all we can ‘learn’ from these kinds of maps is where people have created geotagged tweets and, to some extent, where they have not [5].


But if that’s all we can learn from this map, again, why call it the most detailed tweet map ever? Again, there are any number of details that are excluded from analysis by only looking at the locations of geotagged tweets. What if we instead took a different approach to this data, such as examining at the use history of individual Twitter users, or even collectives of Twitter users based on some kind of shared experience or identity, such as association with particular neighborhoods or other places?

OK, you're right. This particular question is a bit self-serving, as this is precisely the kind of thing we've been working on for some time now. And so rather than just offering a critique of someone else's work, we really want to see if we can push this kind of analysis in more productive directions. So we offer up the map below, which comes from a paper we currently have under review, that attempts to demonstrate how geotagged tweets can help us to better understand urban socio-spatial inequality beyond simply identifying the presence or absence of tweets in a given area, as is so often done.


Using Louisville and the now-common ‘9th Street Divide’ trope as a starting point, we sought out to understand how people from different parts of the city used and moved around the city in different ways. So in a manner not uncommon to some other things Eric Fischer has done previously, we identified a number of Twitter users as belonging to one of two groups, those with close ties to either the West End (traditionally a poorer and predominantly African-American part of the city) or the East End (a more affluent and largely white part of the city), and collected all of the geotagged tweets from those users [6]. We then compared the spatial footprint of these groups' tweeting activity via an odds-ratio measure. On the map areas in purple represent places with greater-than-usual levels of West End user tweeting activity, while orange hexagons represent places where East End users were relatively more dominant than expected. Those places which demonstrate roughly equivalent or expected levels of tweeting are signified by those hexagons with hashes.

This map, in short, represents those places in the city of Louisville which are more socially heterogeneous and homogeneous, dominated either by West End or East End residents, or characterized by a relative mix of people from parts of the city. Though it’s evident that there is indeed a kind of divide between the West End and the rest of the city, this map also shows that West End residents are incredibly spatially mobile within the city, while East End residents tend to be much more spatially constrained, sticking to their own parts of town.

While there are certainly a lot of underlying factors driving this process, suffice it to say that this map provides an alternative way of understanding socio-spatial inequality than simply identifying those places that do or do not have significant concentrations of geotagged tweets [7]. Through our analysis, we also learned that contrary to the kind of assumptions often made about this kind of informational inequality, West End users actually produce a significantly greater number of geotagged tweets than their East End counterparts, it's just that many of these tweets are created in other parts of the city. This is, of course, an important kind of detail that we can draw from the mapping and analysis of geotagged tweets and one that, in many ways, is more detailed than the most detailed tweet map ever.

There is, of course, a whole lot more detail in the paper that this one map and blog post can’t capture, just as is the case with Eric Fischer’s map. Just to be clear, we think Eric Fischer does some fantastic and beautiful work with geotagged social media data, and commend him for openly discussing and sharing his methods. And yet, we can’t help but feel like the characterization of his map as being the most detailed tweet map ever is at best a half-truth, and helps to reproduce some of the most common problems with the analysis of geotagged social media data. But the more we think about it, we’re not so sure that a single most detailed tweet map could exist, or that it’s even desirable to have such a thing. Instead, we should be striving to create any number of highly-detailed, geographically-situated tweet maps, that collectively contribute to better understandings of the complex social and spatial processes that are represented and reproduced through this kind of data. 

----------------
[1] That’s the royal we. 
[2] Which it most certainly is.
[3] As Fischer notes, there are actually no more than about 590 million dots on the map due to his filtering process. When one zooms all the way out on the map so that the entire globe is represented in a single map tile, there are only 1,586 visible tweets, a far cry from the 6 billion number that seems so, well… big.
[4] #tautology
[5] This is qualified in this way because, as Kenneth Field pointed out in a Twitter exchange with Eric Fischer about these maps, geotagged tweets that he has consciously created from his house do not appear on the map. So while we know that all of the tweets on the map were created in that place, we can't say definitively that tweets were not also created in places where they do not appear on the map.
[6] In order to do this classification, we collected all geotagged tweets created within the defined boundaries of these two areas, and then identified those users with more than 40 tweets within either area, where those 40+ tweets represented greater than 50% of their overall geotagged tweeting activity. This concentration of activity indicates that users had a strong association with, and presence within, either area, while also making sure that no users were identified as belonging to both areas.
[7] We also see this map as complicating the conventional narrative in Louisville of 9th Street as representing a kind of impenetrable barrier within the city. But since this is less directly relevant to our argument here, we'll make you wait to hear more about that particular line of reasoning.

November 12, 2014

The (Rust) Belt of Basic-ness? Mapping the Pumpkin Spice Latte

As fall gives way to winter, we're all left clinging to the best vestiges of the passing season: the changing leaves, college football, temperatures above freezing and, for many of the most basic amongst us, the pumpkin spice latte. Debuted by Starbucks in 2004, and featuring no actual pumpkin content, the pumpkin spice latte has become a staple of fall, with Ugg boots and yoga pants-wearing women practically crawling out of the woodwork to get their hands on the thing. And while Starbucks touts that over 29,000 tweets have mentioned #pumpkinspice since 2012, we suspected there was much more to the story of the pumpkin spice latte [1]. Despite the fervor, we noticed that there's been no definitive tracking of the geographical expansion of the pumpkin spice latte as it seeks to colonize the world of regular, everyday people drinking plain ol' coffee.

Searching only for the latest manifestation of the pumpkin spice phenomena, we collected all geotagged tweets in the continental United States for September and October 2014 with references to either "pumpkin spice" or "#psl", yielding a total of 19,537 tweets. But rather than simply mapping the basic distribution of these tweets, we've instead normalized this data by tweets referencing "coffee" during the same period. Using a 25% sample of all of these coffee-related tweets -- totaling 42,696 tweets -- aggregated to hexagonal cells, we calculated the odds ratio at the lower bound of the 95% confidence interval in order to provide a bit more context and account for any number of biases within the data. Using this measure, we've identified those places with greater-than-usual numbers of pumpkin spice latte tweets relative to those tweets referencing coffee (orange), and vice versa (purple), as seen in the map below.

References to Pumpkin Spice Lattes relative to References to Coffee [2]

Based on our binary classification, it's evident that the vast majority of the country has stuck with their preference for coffee, even during the PSL's peak season. But given that our interest is in mapping the prevalence of the PSL in particular, we want to pay closer attention to the smattering of orange hexagons in the map. While there are no definitive clusters of PSL-related tweeting, if you squint your eyes you can just barely visualize a belt of pro-PSL places stretching from St. Louis up to Chicago, and from Cincinnati up to Toledo and Detroit, and from Cleveland to Pittsburgh, what we've termed "the basic belt". While this belt roughly corresponds to the vernacular region of the Rust Belt, Ohio in particular sits at the center of this pumpkin spice-loving portion of the country, representing the buckle on our belt [3]. Given this clustering of PSLs, we suspect that the Buckeye State might well be on its way to becoming the Pumpkin Spice State.
----
[1] Well, actually, Renee Kaufmann had this hunch. All credit for the idea behind this post goes to her.
[2] Sorry about the Web Mercator projection, y'all.
[3] Can one still wear a belt with yoga pants?

October 27, 2014

Geographies of Grits

Throughout the history of this blog, we’ve mapped any number of geographically-specific social phenomenon. But often times, we’ve been drawn to mapping things associated with the American South, whether because it’s arguably the most distinctive cultural region in the United States or because all of us have lived on its outskirts for some time or another, we’re not quite sure. But if you had to choose just one region to map using social media data, the South is probably a good place to start.

Continuing this persistent obsession, we decided to map one of the South’s most prominent culinary traditions: grits. As such, we collected all geotagged tweets in the United States from June 2012 to September 2014 mentioning the strange (read: awesome) ground-corn porridge-like dish, totaling around 64,000 tweets. Keeping in mind that geotagged tweets still represent only around 2-3% of all tweets, this figure represents a breakfast table conversation of several thousand tweets per day, and highlights the ability of this kind of social media data to provide insight into a particular cultural phenomenon that is relatively more difficult (though certainly still possible) to measure through more conventional means.

The map below represents a normalized visualization of grits-related tweeting throughout the continental United States. Using a grid of hexagonal cells, the number of grits tweets were normalized using an odds ratio by a random sample of tweets from the same time period. In this measure, a value of 1 signifies that there are exactly as many grits tweets in a given location as one would expect according to the baseline measures of tweeting, with values greater than 1 indicating that there is a greater predominance of grits tweets than one would otherwise expect. In effect, this analysis cuts out the potential for these maps to simply reproduce maps of population density, honing in on the actual phenomena at hand.

Mapping the 'Grits Belt'

Indeed, here you can see that while the South in general demonstrates a general preference for grits over the rest of the country, it is actually a relatively small number of coastal localities in the low country that have the strongest connection to grits through social media. While New Orleans represents something of an outlier in the far corner of the South, there is also a consistent band of concentrated grits tweeting stretching from just north of Charleston, South Carolina down through Beaufort (though seemingly skipping over Hilton Head, a popular tourist destination that might be understood as relationally disconnected from much of the rest of the distinctly southern culture surrounding it) and Savannah, Georgia, all the way to Brunswick.

In general, this map demonstrates the general potential of this kind of method to locate geographically-specific cultural practices in space, as well as the notion that these kinds of maps can reinforce the persistent connectedness between virtual representations of the world and people’s everyday lives and material practices. But there is more that we can do with this data by putting it into relation with other datasets. The map below does just that, by comparing our existing dataset of tweets mentioning ‘grits’ with all geotagged tweets during the same time period that mention ‘oats’. We again employ the odds ratio measure, but rather than comparing using a baseline population of tweets, we use the oats-related tweets to normalize our values. In this analysis, values less than 1 signify a preference for oats, while values greater than 1 represent a tendency towards grits. Not only does this comparison continue to affirm the identification of a ‘grits belt’ in the South, but it also highlights other areas of the country – an ‘oats oval’ stretching from the Northeast to the Midwest – that stand in stark contrast to the southeast in terms of digital porridge discourse.

Grits vs. Oats: The Emergence of the 'Oats Oval'

Thus a key avenue for analysis of digital social datasets is examining the relationships between individual users or individual messages. It is also possible to identify relationships between places, based on visits or tweets made by the same person in these different places. While we’ve already identified the South as the key locus of grits-related tweeting in the United States, it’s important to not simply ignore all of the other data points available to us that are just not quite as spatially clustered. Indeed, given the strong connections between the cultural practice of grits preparation and consumption and the vernacular region of the South, we might hypothesize that even those people tweeting about grits outside of the South are likely to have some kind of connection to the South, perhaps as a kind of diasporic community now living in other parts of the country, or even just traveling for a short period of time.

Relational 'Gritspace'

To examine this relationship, we begin by looking for users in our original dataset that have tweeted about grits more than once – yielding a total of 8,958 users – then drawing a line from the tweet locations in chronological order. The resulting map below clearly shows that there is a strong relational connection with the South for those who tweet about grits from other places, even for cities like Los Angeles that are quite distant in absolute space, as well as in terms of cultural identity. Indeed, the gravity of grits appears quite strong, as of the users tweeting about grits from outside the South, approximately 55% of these also sent tweets from inside the cluster identified in the first map in this post. So even for those grits-obsessed tweeters outside of the South, the pull of porridge remains strong… and, we would expect, even stronger when you throw a bit of cheese and jalapenos in, too.

-----
If you’re interested in reading more about the methods used to make these maps, and about the utility of mapping geotagged social media data more generally, you can check out the following pre-publication version of a forthcoming book chapter from which this work was drawn:

Poorthuis, A., M. Zook, T. Shelton, M. Graham and M. Stephens. Forthcoming. "Using Geotagged Digital Social Data in Geographic Research". In Key Methods in Geography, eds. N. Clifford, S. French, M. Cope and T. Gillespie. London: Sage.
Abstract: This chapter outlines how one might utilize the massive amounts of web-based, geographically-referenced digital social data for geographical research. Because much of these data are user-generated and produced through social media platforms, we also focus on the pitfalls associated with such sources and the benefits of a mixed methods approach to these data. Not only can digital social data be mapped for visual analysis, it is also useful to use a range of quantitative methods to understand relationships between different subsets of the data. In addition, closer, systematic readings via qualitative methods of social data provides insights of particular people’s perceptions and experiences of the world around them. Thus, while making maps is often the starting point for geographers working with this kind of research, it is rarely the end point.

September 17, 2014

Hashtags and Haggis: Mapping the Scottish Referendum

The past weeks have been quite eventful in Scotland as a monumental election unfolds. Everyone wants to know, which way will the Scots vote? While we here at Floatingsheep certainly don't have the answer or power to predict the referendum, we thought it might be interesting to see the geographic dimension of how Scots (and the rest of the world) are tweeting about a fundamentally geographic decision [1].

We pulled data from DOLLY from the last month and a half for a number of hashtags and terms that we thought might be helpful in taking the pulse of Twitter discussion around the independence referendum. Most obviously, we collected the hashtags #VoteYes and #YesBecause due to their association with the pro-independence movement, and the hashtag #NoThanks because of its association with anti-pro-independence sentiment [2].

We started by comparing the prevalence of 'no' (i.e., pro-union) hashtags versus 'yes' (i.e., pro-independence) hashtags the global level. In the map below, orange indicates a greater prevalence of 'yes' tweets and purple indicates that there are more 'no' tweets. Perhaps the most interesting thing here is that we can see the United Kingdom swing towards a 'yes' vote, which has, for the most part, appeared to be the underdog in more conventional polling leading up to the referendum. Then again, most of Western Europe, along with Thailand and Australia, also have a general preference for 'yes' tweets. Oddly enough, the United States is the staunchest defender of the union, based solely on it's massive preference for 'no' tweets. Strange for a country that yearly celebrates its breaking away from Mother England

Comparing 'Yes' vs. 'No' Tweets at the Global Scale

Looking closer at the UK, we can see that much of Scotland has a roughly equal number of tweets in support of both the 'yes' and 'no' positions -- reflecting the contentious and hotly-contested nature of this referendum. But the Central Belt in particular -- where a lot of actual votes will be coming from, as it is the most densely populated part of the nation -- swings heavily towards 'yes'. The English, on the other hand, seem very much inclined towards pro-union or anti-separation tweeting.

Comparing 'Yes' vs. 'No' Tweets in the United Kingdom

To take an alternative look at support for the different positions, we mapped the percentage of each of the three hashtags that originates in each of the administrative sub-regions of both Scotland and the UK as a whole. The Highlands and parts of the Central Belt again show up as strong bastions of 'yes' votes.

Percentage of Referendum-Related Tweets from Different Regions

But seeing as we're interested in doing more than just mapping distributions, the next question is how are we to put all of this into context? The only proper place to start is, of course, with the Queen. The map below illustrates those places which also tend to have higher-than-normal levels of tweeting about the Queen (in orange) and those places that are tweeting less about the Queen than might usually be expected (in purple), based on a baseline measure of tweeting activity. Sadly, the whole country seems to be ignoring her. Apart from Glasgow, that is. In the interests of not upsetting an 88 year-old lady, we have chosen not to explore these tweets in any more detail.

Tweets referencing "Queen"

Building on this, we also explored the geography of references (using the same method described above) to something inherent in most people's definitions of Britishness: tea and crumpets

We see an all-around tea-depression; hardly anywhere is particularly pro-tea at the moment, truly a shocking state of affairs. The British are clearly not being their usual selves, and for their sake we're glad the referendum will be over soon, regardless of the outcome. Scotland, in particular, has average tea counts that are low by historical standards.

Tweets referencing "tea and crumpets"

This analysis would, of course, all be meaningless unless we mapped the geographies of a range of uniquely Scottish phenomena: haggis [3], kilts and Nessie. Still using the same method as above, the map below shows without a shadow of a doubt that Scotland is destined to become its own nation.

Tweets referencing "haggis", "kilts" or "Nessie" 

The Scots are tweeting about these topics at a greater-than-usual rate, while their southern neighbors remain distinctly uninterested. If ever there were an indication that these nations are divided by more than just a line on a map, we see that manifested in the topic of people's Twitter conversations. In short, the Scottish referendum is not just simply about "yes" or "no" but seemingly touches on much more fundamental questions of ovis-based cuisine, men's wear and mythological creatures.

So even if the 'no' votes win out in and the Kingdom remains united, the geographies of haggis related tweeting (along with a few other things) has revealed that these are two very different nations, indeed.

UPDATE (9/18/14 @ 12:45pm):
We've added another map to our analysis below, which shows the relative prevalence of #VoteYes and #NoThanks tweets throughout Great Britain, at the level of administrative sub-regions, rather than the hexagons used above. This map makes for a stark contrast between the English (and Welsh) and the Scottish... while there are a few areas of Scotland that show relative parity between 'yes' and 'no' tweets, most of the nation demonstrates a relatively strong prevalence for 'yes', while much of England demonstrates at least a slight preference for 'no'. 



--------------
[1] In case you don't know what Twitter, is we refer you to the Scots Wikipedia page on the subject, which states: "Twitter is an online social networkin service an microbloggin service that enables its uisers tae send an read text-based messages o up tae 140 characters, kent as 'tweets'".
[2] Perhaps we could have simplified this phrasing, but then we would have lost the chance to type "anti-pro-independence", which is a lot of fun. Anti-pro-independence. Anti-pro-independence.
[3] Normally the Floatingsheep collective avoids conversation about sheep heart, liver, and lungs that are boiled in a sheep stomach. But we made an exception this time.