December 31, 2014

The Best of Floatingsheep in 2014

With yet another year coming to a close, we thought it a good time to reflect upon yet another year of sheepish maps and blogposts, recounting what we have accomplished, perhaps mostly so that we don't dare attempt such goofiness again. And so we give you the Top 10 Floatingsheep posts of 2014, ranked according to the number of page views each received. Feast on these last remnants of 2014, and a happy new year to all!

#1 The Drama of Llamas vs. the Gloating of the Goats 
What was thought to be something of a throwaway post came from the shadows to become 2014's most viewed blogpost, largely thanks to some Redditors who took the map a bit too seriously, if we do say so ourselves.

#2 New Book Chapter on the Geographies of Beer on Twitter 
Based on some great work by Matt and Ate, the map below (and others from the same book chapter) has become a staple of Vox's explanations of alcohol this year... see here, here and here.

#3 Mapping Ferguson Tweets, or more maps that won't change your mind about racism in America 
The product of the Inaugural #IronWilson Map-a-Thon, this map and post was our attempt to counter some problematic uses of geotagged Twitter data in relation to the then-nascent protests in Ferguson, Missouri, and highlight the persistent limitations of this sort of work when dealing with issues as complex and fraught as violence and structural racism.

#4 Mapping the Seven Dirty Words 
One of the biggest missed opportunities from the 2014 IronSheep dataset, our series of maps of George Carlin's infamous seven dirty words didn't yield a whole lot except for excrement.

#5 Hashtags and Haggis: Mapping the Scottish Referendum
While the Scottish ultimately decided to remain a part of Great Britain, some of our maps helped to demonstrate persistent cultural divides between the English and the Scottish, and the fact that "the Scottish referendum [was] not just simply about 'yes' or 'no' but seemingly touche[d] on much more fundamental questions of ovis-based cuisine, men's wear and mythological creatures". Indeed.

#6 Artists, Bankers, Hipsters and the "Bro-ughnut" of New York: Mapping Cultural-Economic Identities on Twitter 
Some more work by Ate and Matt for a journal article yielded the discovery of what will surely be recognized in time as one of the most fundamental geographical phenomena known to humankind: the 'Bro-ughnut' of New York.

#7 Hey Y'all! Geographies of a Colloquialism
There are few places as distinct as the American South when it comes to cultural patterns expressed through geotagged tweets, as our mapping of references to "y'all" helped to confirm.

#8 Crowdsourcing Cake or Death?
While the choice between cake or death seems like an obvious one, our maps of references to these terms yielded a much different -- and troubling -- result.

#9 Are there really more juggalos than polar bears?
"As our analysis has shown, there is more to the story of juggalos and polar bears than meets the eye. Clearly, there are more references to polar bears than to juggalos, both globally and in the United States. But the relationship between these two is considerably more complex and contradictory than is assumed by David Cross and his ilk. Obviously more research is required as ten-second gifs are not up to conveying the complexity of the juggalo-polar bear ecosystem."

#10 The Epic Tweet Fight of Bronies and Juggalos
Despite Lexington, Kentucky being at the center of a online controversy around a Bronies vs. Juggalos street fight, the Floatingsheep home base didn't have much online activity around these two subcultures. In fact, when taking the epic street fight online and evaluating the epic tweet fight, we couldn't help but declare it a draw.

December 25, 2014

Are we more interested in XXX or Xmas?

This holiday season we decide to ask the questions that really matter. As people celebrate Christmas, we wanted to know how people around the world are mentioning the holiday. And, perhaps more importantly and interestingly, how mentions of Christmas stack up against mentions of a more sexual and consumption-oriented nature. 

So, we decided to compare mentions of 'Xmas', 'XXX', and 'Xbox. 

The formula that we used (for XXX tweets for example):  

(Sum of XXX tweets in square / Sum of XXX tweets globally)
( sum XXX+Xbox+xmas in square /  sum XXX+Xbox+xmas globally)

We see some important global differences. Americans (as well as the French and Spanish) are most interested in Xboxes. Strangely, the Japanese and Nigerians seem to be most fixated on Christmas. And the British, Dutch, and Italians more interested in X-rated content: giving a whole different meaning to reflections on who has been naughty and who has been nice. 

December 18, 2014

Deconstructing the (most detailed tweet) map (ever)

If you’re the kind of person who visits our blog with any regularity, you’re almost certainly also the kind of person who would have seen some version of the map below in the last couple of weeks. Created by Eric Fischer of Mapbox, this map was released along with a blogpost entitled “Making the most detailed tweet map ever”, discussing some of the data cleaning and visualization methods necessary to produce such a striking map. The map is undoubtedly interesting and has sparked a great deal of interest from all corners of the internet, but there’s just something about the framing that rubs us the wrong way. While Eric’s post emphasizes the making part of the equation, the internet hype cycle around it has caused us to read the title a bit more along the lines of:

"Making THE MOST DETAILED tweet map EVARRRR!!!!"

That is to say, for all of the admittedly really great detail about what went into making this map, the framing of this map as not only a detailed map of six billion or so geotagged tweets, but as the most detailed tweet map ever, raises more questions than it answers. For example, what constitutes ‘detail’ in tweet maps? What do competing definitions of ‘detail’ reveal about what we value in this kind of analysis? What do these particular ideas of ‘detail’ foreclose in terms of other possibilities for analysis?

These are important questions, regardless of whether they’re applied to this particular map or any other one. The issue in this case, however, seems to be that the answers to some of these questions conflict with one another, or with the ways the project is itself described. The detail that seems to be valued here is of the “every tweet ever” variety, or, put simply “more = better”, the fetish for bigger data at the expense of all else.

But more data isn’t necessarily better, and it certainly doesn’t mean that there’s more detail, especially when the only bit of detail you're concerned with in each of these six billion points is the latitude and longitude coordinates. Each of these individual tweets contains a wealth of other interesting information, from information about the user and the way they describe themselves, to the time the tweet was created to the text of the tweet itself, which might contain hashtags that link up with bigger conversations, or @-mentions to other Twitter users that might be used to understand social networks and interactions. All of these bits of information represent a kind of detail that is not included in this, the most detailed tweet map ever

As we’ve been arguing for the past two years or so, there are a range of social and spatial processes represented in geotagged tweets that we can’t get at if all we’re concerned with is the latitude and longitude coordinates. So to say that this represents the most detailed tweet map ever serves to reify what we see as two of the most problematic assumptions of contemporary big data/social media research: (1) that more data is equivalent to better data, and (2) that the only important aspect of the data is the geographic coordinates attached to it. There's lots of interesting stuff that can be done with this kind of data, and we can do better than simply plotting points on a map and calling it a day [1].

Even if one were inclined to accept the argument that more tweets equals more detail, how should we interpret the fact that this map only visualizes about 9% of all geotagged tweets, due to the design decisions necessary in order to make the map nice and pretty [2]? Due to the existence of exact or near-duplicate coordinates that would make points indistinguishable from one another, this, the most detailed tweet map ever, actually eliminates about 91% of the detail that it seems to value most (i.e., the presence or absence of points on the map). The Gizmodo headline about the map reads, “The Most Detailed Tweet Map Ever Includes 6,341,973,478 Tweets”... except that, you know, it doesn’t [3].

Of course, there’s also good bit of imprecision in the locational accuracy associated with geotagged tweeting; our iPhones don’t come with military grade GPS units installed in them. So while Mapbox CEO Eric Gunderson was marveling at the detailed micro-geographies of an airport gate seen in the map, he was ignoring both the fact that all of those folks on the jet bridge could just have well been 40 feet away, and that a number of tweets might have been eliminated from the initial dataset due to a lack of precision in the geotagging process. Take all of that together and a lot of the detail that’s being celebrated here starts to give way to fuzziness. This map is more art than science, though the striking visuals and discursive framing give the illusion of precision and absolute insight. 

To be clear, there’s no problem with fuzziness. It’s something we all live with every day, it’s something we academics may embrace from time to time through the use of overly obtuse language. But taking all of this fuzziness and then repackaging it as the most detailed tweet map ever, comes off a bit wrong to us. These initial misgivings were only amplified when brought down to a more local level, when we saw a post from a local urbanist blog in Louisville wondering “What we can learn from where people in Louisville are using Twitter”. While relatively mundane, and certainly not nearly as celebratory, the blog’s ultimate conclusion was that "These locations [with the highest concentrations of tweets] make sense as they are places where people gather and are often held captive by events.”

This, in general, is true, but also a bit… how do we put it? Meh. More fundamentally, people tweet where people are. It comes as no surprise to anyone with even the vaguest familiarity with Louisville that people tweet in larger numbers from downtown (including 4th Street Live!), the University of Louisville campus, Bardstown Road and the St. Matthews / Oxmoor Mall area than anywhere else in the city. These are (some of) the primary gathering points on a day-to-day basis within the city.

But just identifying these locations doesn’t really help us to ‘learn’ anything beyond the fact that those are, indeed, the places with the highest concentrations of geotagged tweets in Louisville [4]. In fact, the map doesn’t even really show us actual concentrations of tweeting activity, but rather concentrations of unique tweeting locations. Take, say, two hypothetical city squares, one of just 50 x 50 meters, and another much larger one of 500 x 500 meters, both the originating point of one million geotagged tweets spread randomly over the squares. In Fischer’s method, these two squares would not 'glow' in equal amounts, but rather the larger square would show up as much more visually prominent because it has many more unique tweeting locations while many of the tweets from the smaller square would be filtered out due to a duplication of coordinates.

Further, from a data collection standpoint, all of these tweets in Louisville reveal little that isn't revealed by mapping a random sample of tweets (say 1% of tweets from 2013, see map below). If all we’re really concerned about is the question of where people are tweeting from, there isn’t much that looking at all the tweets reveals that couldn't also be found from a smaller subset, and it’s much easier to collect or analyze a few hundred thousand tweets than it is to collect 6,341,973,478 of them. But even still, all we can ‘learn’ from these kinds of maps is where people have created geotagged tweets and, to some extent, where they have not [5].

But if that’s all we can learn from this map, again, why call it the most detailed tweet map ever? Again, there are any number of details that are excluded from analysis by only looking at the locations of geotagged tweets. What if we instead took a different approach to this data, such as examining at the use history of individual Twitter users, or even collectives of Twitter users based on some kind of shared experience or identity, such as association with particular neighborhoods or other places?

OK, you're right. This particular question is a bit self-serving, as this is precisely the kind of thing we've been working on for some time now. And so rather than just offering a critique of someone else's work, we really want to see if we can push this kind of analysis in more productive directions. So we offer up the map below, which comes from a paper we currently have under review, that attempts to demonstrate how geotagged tweets can help us to better understand urban socio-spatial inequality beyond simply identifying the presence or absence of tweets in a given area, as is so often done.

Using Louisville and the now-common ‘9th Street Divide’ trope as a starting point, we sought out to understand how people from different parts of the city used and moved around the city in different ways. So in a manner not uncommon to some other things Eric Fischer has done previously, we identified a number of Twitter users as belonging to one of two groups, those with close ties to either the West End (traditionally a poorer and predominantly African-American part of the city) or the East End (a more affluent and largely white part of the city), and collected all of the geotagged tweets from those users [6]. We then compared the spatial footprint of these groups' tweeting activity via an odds-ratio measure. On the map areas in purple represent places with greater-than-usual levels of West End user tweeting activity, while orange hexagons represent places where East End users were relatively more dominant than expected. Those places which demonstrate roughly equivalent or expected levels of tweeting are signified by those hexagons with hashes.

This map, in short, represents those places in the city of Louisville which are more socially heterogeneous and homogeneous, dominated either by West End or East End residents, or characterized by a relative mix of people from parts of the city. Though it’s evident that there is indeed a kind of divide between the West End and the rest of the city, this map also shows that West End residents are incredibly spatially mobile within the city, while East End residents tend to be much more spatially constrained, sticking to their own parts of town.

While there are certainly a lot of underlying factors driving this process, suffice it to say that this map provides an alternative way of understanding socio-spatial inequality than simply identifying those places that do or do not have significant concentrations of geotagged tweets [7]. Through our analysis, we also learned that contrary to the kind of assumptions often made about this kind of informational inequality, West End users actually produce a significantly greater number of geotagged tweets than their East End counterparts, it's just that many of these tweets are created in other parts of the city. This is, of course, an important kind of detail that we can draw from the mapping and analysis of geotagged tweets and one that, in many ways, is more detailed than the most detailed tweet map ever.

There is, of course, a whole lot more detail in the paper that this one map and blog post can’t capture, just as is the case with Eric Fischer’s map. Just to be clear, we think Eric Fischer does some fantastic and beautiful work with geotagged social media data, and commend him for openly discussing and sharing his methods. And yet, we can’t help but feel like the characterization of his map as being the most detailed tweet map ever is at best a half-truth, and helps to reproduce some of the most common problems with the analysis of geotagged social media data. But the more we think about it, we’re not so sure that a single most detailed tweet map could exist, or that it’s even desirable to have such a thing. Instead, we should be striving to create any number of highly-detailed, geographically-situated tweet maps, that collectively contribute to better understandings of the complex social and spatial processes that are represented and reproduced through this kind of data. 

[1] That’s the royal we. 
[2] Which it most certainly is.
[3] As Fischer notes, there are actually no more than about 590 million dots on the map due to his filtering process. When one zooms all the way out on the map so that the entire globe is represented in a single map tile, there are only 1,586 visible tweets, a far cry from the 6 billion number that seems so, well… big.
[4] #tautology
[5] This is qualified in this way because, as Kenneth Field pointed out in a Twitter exchange with Eric Fischer about these maps, geotagged tweets that he has consciously created from his house do not appear on the map. So while we know that all of the tweets on the map were created in that place, we can't say definitively that tweets were not also created in places where they do not appear on the map.
[6] In order to do this classification, we collected all geotagged tweets created within the defined boundaries of these two areas, and then identified those users with more than 40 tweets within either area, where those 40+ tweets represented greater than 50% of their overall geotagged tweeting activity. This concentration of activity indicates that users had a strong association with, and presence within, either area, while also making sure that no users were identified as belonging to both areas.
[7] We also see this map as complicating the conventional narrative in Louisville of 9th Street as representing a kind of impenetrable barrier within the city. But since this is less directly relevant to our argument here, we'll make you wait to hear more about that particular line of reasoning.