December 18, 2014

Deconstructing the (most detailed tweet) map (ever)

If you’re the kind of person who visits our blog with any regularity, you’re almost certainly also the kind of person who would have seen some version of the map below in the last couple of weeks. Created by Eric Fischer of Mapbox, this map was released along with a blogpost entitled “Making the most detailed tweet map ever”, discussing some of the data cleaning and visualization methods necessary to produce such a striking map. The map is undoubtedly interesting and has sparked a great deal of interest from all corners of the internet, but there’s just something about the framing that rubs us the wrong way. While Eric’s post emphasizes the making part of the equation, the internet hype cycle around it has caused us to read the title a bit more along the lines of:

"Making THE MOST DETAILED tweet map EVARRRR!!!!"

That is to say, for all of the admittedly really great detail about what went into making this map, the framing of this map as not only a detailed map of six billion or so geotagged tweets, but as the most detailed tweet map ever, raises more questions than it answers. For example, what constitutes ‘detail’ in tweet maps? What do competing definitions of ‘detail’ reveal about what we value in this kind of analysis? What do these particular ideas of ‘detail’ foreclose in terms of other possibilities for analysis?

These are important questions, regardless of whether they’re applied to this particular map or any other one. The issue in this case, however, seems to be that the answers to some of these questions conflict with one another, or with the ways the project is itself described. The detail that seems to be valued here is of the “every tweet ever” variety, or, put simply “more = better”, the fetish for bigger data at the expense of all else.

But more data isn’t necessarily better, and it certainly doesn’t mean that there’s more detail, especially when the only bit of detail you're concerned with in each of these six billion points is the latitude and longitude coordinates. Each of these individual tweets contains a wealth of other interesting information, from information about the user and the way they describe themselves, to the time the tweet was created to the text of the tweet itself, which might contain hashtags that link up with bigger conversations, or @-mentions to other Twitter users that might be used to understand social networks and interactions. All of these bits of information represent a kind of detail that is not included in this, the most detailed tweet map ever

As we’ve been arguing for the past two years or so, there are a range of social and spatial processes represented in geotagged tweets that we can’t get at if all we’re concerned with is the latitude and longitude coordinates. So to say that this represents the most detailed tweet map ever serves to reify what we see as two of the most problematic assumptions of contemporary big data/social media research: (1) that more data is equivalent to better data, and (2) that the only important aspect of the data is the geographic coordinates attached to it. There's lots of interesting stuff that can be done with this kind of data, and we can do better than simply plotting points on a map and calling it a day [1].

Even if one were inclined to accept the argument that more tweets equals more detail, how should we interpret the fact that this map only visualizes about 9% of all geotagged tweets, due to the design decisions necessary in order to make the map nice and pretty [2]? Due to the existence of exact or near-duplicate coordinates that would make points indistinguishable from one another, this, the most detailed tweet map ever, actually eliminates about 91% of the detail that it seems to value most (i.e., the presence or absence of points on the map). The Gizmodo headline about the map reads, “The Most Detailed Tweet Map Ever Includes 6,341,973,478 Tweets”... except that, you know, it doesn’t [3].

Of course, there’s also good bit of imprecision in the locational accuracy associated with geotagged tweeting; our iPhones don’t come with military grade GPS units installed in them. So while Mapbox CEO Eric Gunderson was marveling at the detailed micro-geographies of an airport gate seen in the map, he was ignoring both the fact that all of those folks on the jet bridge could just have well been 40 feet away, and that a number of tweets might have been eliminated from the initial dataset due to a lack of precision in the geotagging process. Take all of that together and a lot of the detail that’s being celebrated here starts to give way to fuzziness. This map is more art than science, though the striking visuals and discursive framing give the illusion of precision and absolute insight. 

To be clear, there’s no problem with fuzziness. It’s something we all live with every day, it’s something we academics may embrace from time to time through the use of overly obtuse language. But taking all of this fuzziness and then repackaging it as the most detailed tweet map ever, comes off a bit wrong to us. These initial misgivings were only amplified when brought down to a more local level, when we saw a post from a local urbanist blog in Louisville wondering “What we can learn from where people in Louisville are using Twitter”. While relatively mundane, and certainly not nearly as celebratory, the blog’s ultimate conclusion was that "These locations [with the highest concentrations of tweets] make sense as they are places where people gather and are often held captive by events.”


This, in general, is true, but also a bit… how do we put it? Meh. More fundamentally, people tweet where people are. It comes as no surprise to anyone with even the vaguest familiarity with Louisville that people tweet in larger numbers from downtown (including 4th Street Live!), the University of Louisville campus, Bardstown Road and the St. Matthews / Oxmoor Mall area than anywhere else in the city. These are (some of) the primary gathering points on a day-to-day basis within the city.

But just identifying these locations doesn’t really help us to ‘learn’ anything beyond the fact that those are, indeed, the places with the highest concentrations of geotagged tweets in Louisville [4]. In fact, the map doesn’t even really show us actual concentrations of tweeting activity, but rather concentrations of unique tweeting locations. Take, say, two hypothetical city squares, one of just 50 x 50 meters, and another much larger one of 500 x 500 meters, both the originating point of one million geotagged tweets spread randomly over the squares. In Fischer’s method, these two squares would not 'glow' in equal amounts, but rather the larger square would show up as much more visually prominent because it has many more unique tweeting locations while many of the tweets from the smaller square would be filtered out due to a duplication of coordinates.

Further, from a data collection standpoint, all of these tweets in Louisville reveal little that isn't revealed by mapping a random sample of tweets (say 1% of tweets from 2013, see map below). If all we’re really concerned about is the question of where people are tweeting from, there isn’t much that looking at all the tweets reveals that couldn't also be found from a smaller subset, and it’s much easier to collect or analyze a few hundred thousand tweets than it is to collect 6,341,973,478 of them. But even still, all we can ‘learn’ from these kinds of maps is where people have created geotagged tweets and, to some extent, where they have not [5].


But if that’s all we can learn from this map, again, why call it the most detailed tweet map ever? Again, there are any number of details that are excluded from analysis by only looking at the locations of geotagged tweets. What if we instead took a different approach to this data, such as examining at the use history of individual Twitter users, or even collectives of Twitter users based on some kind of shared experience or identity, such as association with particular neighborhoods or other places?

OK, you're right. This particular question is a bit self-serving, as this is precisely the kind of thing we've been working on for some time now. And so rather than just offering a critique of someone else's work, we really want to see if we can push this kind of analysis in more productive directions. So we offer up the map below, which comes from a paper we currently have under review, that attempts to demonstrate how geotagged tweets can help us to better understand urban socio-spatial inequality beyond simply identifying the presence or absence of tweets in a given area, as is so often done.


Using Louisville and the now-common ‘9th Street Divide’ trope as a starting point, we sought out to understand how people from different parts of the city used and moved around the city in different ways. So in a manner not uncommon to some other things Eric Fischer has done previously, we identified a number of Twitter users as belonging to one of two groups, those with close ties to either the West End (traditionally a poorer and predominantly African-American part of the city) or the East End (a more affluent and largely white part of the city), and collected all of the geotagged tweets from those users [6]. We then compared the spatial footprint of these groups' tweeting activity via an odds-ratio measure. On the map areas in purple represent places with greater-than-usual levels of West End user tweeting activity, while orange hexagons represent places where East End users were relatively more dominant than expected. Those places which demonstrate roughly equivalent or expected levels of tweeting are signified by those hexagons with hashes.

This map, in short, represents those places in the city of Louisville which are more socially heterogeneous and homogeneous, dominated either by West End or East End residents, or characterized by a relative mix of people from parts of the city. Though it’s evident that there is indeed a kind of divide between the West End and the rest of the city, this map also shows that West End residents are incredibly spatially mobile within the city, while East End residents tend to be much more spatially constrained, sticking to their own parts of town.

While there are certainly a lot of underlying factors driving this process, suffice it to say that this map provides an alternative way of understanding socio-spatial inequality than simply identifying those places that do or do not have significant concentrations of geotagged tweets [7]. Through our analysis, we also learned that contrary to the kind of assumptions often made about this kind of informational inequality, West End users actually produce a significantly greater number of geotagged tweets than their East End counterparts, it's just that many of these tweets are created in other parts of the city. This is, of course, an important kind of detail that we can draw from the mapping and analysis of geotagged tweets and one that, in many ways, is more detailed than the most detailed tweet map ever.

There is, of course, a whole lot more detail in the paper that this one map and blog post can’t capture, just as is the case with Eric Fischer’s map. Just to be clear, we think Eric Fischer does some fantastic and beautiful work with geotagged social media data, and commend him for openly discussing and sharing his methods. And yet, we can’t help but feel like the characterization of his map as being the most detailed tweet map ever is at best a half-truth, and helps to reproduce some of the most common problems with the analysis of geotagged social media data. But the more we think about it, we’re not so sure that a single most detailed tweet map could exist, or that it’s even desirable to have such a thing. Instead, we should be striving to create any number of highly-detailed, geographically-situated tweet maps, that collectively contribute to better understandings of the complex social and spatial processes that are represented and reproduced through this kind of data. 

----------------
[1] That’s the royal we. 
[2] Which it most certainly is.
[3] As Fischer notes, there are actually no more than about 590 million dots on the map due to his filtering process. When one zooms all the way out on the map so that the entire globe is represented in a single map tile, there are only 1,586 visible tweets, a far cry from the 6 billion number that seems so, well… big.
[4] #tautology
[5] This is qualified in this way because, as Kenneth Field pointed out in a Twitter exchange with Eric Fischer about these maps, geotagged tweets that he has consciously created from his house do not appear on the map. So while we know that all of the tweets on the map were created in that place, we can't say definitively that tweets were not also created in places where they do not appear on the map.
[6] In order to do this classification, we collected all geotagged tweets created within the defined boundaries of these two areas, and then identified those users with more than 40 tweets within either area, where those 40+ tweets represented greater than 50% of their overall geotagged tweeting activity. This concentration of activity indicates that users had a strong association with, and presence within, either area, while also making sure that no users were identified as belonging to both areas.
[7] We also see this map as complicating the conventional narrative in Louisville of 9th Street as representing a kind of impenetrable barrier within the city. But since this is less directly relevant to our argument here, we'll make you wait to hear more about that particular line of reasoning.

10 comments:

  1. Query: What's your definition of "expected" tweets? Thx.

    ReplyDelete
    Replies
    1. I assume you're referring to the discussion of the odds-ratio measure here... In effect, the odds-ratio compares the relative number of tweets between the two groups in a given hexagon relative to the overall balance of tweets between the two groups. So by "expected", we mean the overall distribution of West End vs. East End tweets, which is compared to a more micro-level distribution within certain areas. So a value of 1 signifies that the relative number of tweets in a given area is equivalent to the overall distribution between the two groups, rather than meaning that the two numbers are exactly equal. Does that make sense?

      Delete
    2. Thanks. Let me play this back and make sure I got it. Are you saying that you calculated a West End/East End total tweet split, and they compared the ratios of tweets in each individual cell relative to that overall ratio and used that to assign the shading?

      Delete
  2. Thanks for putting so much attention into analyzing this map and what it might mean.

    As I said on Twitter, the reason I called it "the most detailed ever" is because it does have more points on it and lets you zoom in further than any other tweet map I have ever seen. Twitter's own archives must have more points, but those archives aren't accessible.

    And you're right: it doesn't tell anyone very much that they didn't already know. That's why I framed the blog post as a tutorial on how to make dot maps, rather on anything particularly special in this one.

    The things I think are genuinely interesting about Twitter data are that it shows where the non-residential concentrations of people are, and how those places relate to each other by travel and by communication. I still struggle with how to show those relationships between places on a map, but even showing the concentrations at arbitrary scale is challenging enough that it's worth talking about.

    At the same time, I do think there is value in simply making things like this available to the public. The reaction has demonstrated that there are a lot of people who are interested in scrolling around and seeing what the activity patterns look like in places that they know or are interested in knowing. When you have this sort of data available at your own disposal to answer your own questions, it's easy to forget that not everyone has it until you expose it to them.

    I'm glad that your local knowledge of Louisville lets you go further into what the tweets there mean. My hope is that someday I will be able to form a general theory of friction between nearby places and how those gaps might be able to be bridged.

    ReplyDelete
    Replies
    1. @Eric: Thanks for the thoughtful reply and further clarification of what you meant by 'detail'. Of course, the bigger question of what that particular definition of detail does, and how it was interpreted and reframed by the world-at-large when writing about your map, remains...

      An issue you raise, which we didn’t touch on in our blog post, is the issue of access to this kind of data, and what it means that many people are unable to access this data because they either lack the necessary technical skills to collect it themselves, or the Twitter Terms of Service bars those of us with these large databases at our fingertips from sharing them widely, all because Twitter stands to make some money from selling the data through their partnership with GNIP.

      There’s definitely something powerful about people being able to explore this data for themselves — I think the same is even more true of your earlier work on tourists vs. locals and on the use of iPhones vs. Android phones — and it’s easy to forget that there are plenty of folks who *should* have access to this data who don’t, and that this actually keeps people from asking and answering more complex questions with this kind of data. And perhaps it’s because of this lack of access to experiment and do more with the data that people tend to be, I would argue, overly fascinated by the map itself, and not more critical or inquisitive about what other things we could be doing with data.

      Anyways, thanks again for engaging! Hopefully we can keep this conversation going in some way, shape or form!

      Delete
  3. Terrific critique. Well argued. I posted something similar for one of Eric's previous maps in 2013 (http://cartonerd.blogspot.com/2013/06/3-billion-tweets-on-map.html) and at the time said much the same. The problem isn't so much to do with the map per se, it's to do with the hype and the rhetoric that accompanies it (and others, this isn't specific to Eric's map).

    There's nothing intrinsically wrong with making a map of a load of latitude/longitude pairs. A lot of people are doing the same. Where Eric goes beyond most is making the map really beautiful. Could you imagine it with emoji symbols instead? But I see many maps these days have titles that simply don't match the work. They generate interest and capture people's imagination (and get blogged, liked and retweeted) but unfortunately those same people aren't able to assess the map in the same way as presented here. They treat it at face value. In truth, maps have always lied and this is not a new problem but I fear unless we begin to cast our maps in a more objective, less sensational fashion we risk damaging trust beyond repair.

    I looked at Eric's map and immediately saw flaws because I know something of the data and the impact of his choice of methods. I liked it to look at and I could largely ignore the title because it makes the map out to be something it isn't. I think it's time to respect the readers of our maps a little more and not be so quick with the grandiose claims.

    Experimental mapping is great. Great looking maps are great. We need to develop a healthier way of telling people what it is they're looking at. Making a map of latitudes and longitudes is fine but they don't tell is this, or that or whatever other fanciful assumption we might make from combining the image with a sensationalist title. The campaign for modest map titles starts here!

    ReplyDelete
  4. Never heard the saying "Brevity is the soul of wit"? Or was this an homage to the days of being paid by word and using the most possible words to tell a story?

    ReplyDelete
    Replies
    1. Sorry that not everything can fit in a tweet, Terry.

      Delete

Note: only a member of this blog may post a comment.