floatingsheep: The Geography of Hate

May 10, 2013

The Geography of Hate

UPDATE (5/13/13 @ 10:45pm): We have written and published a FAQ to respond to some of the questions and concerns raised in the comments here and elsewhere. Please review our comments there before commenting or emailing.

Following the 2012 US Presidential election, we created a map of tweets that referred to President Obama using a variety of racist slurs. In the wake of that map, we received a number of criticisms - some constructive, others not - about how we were measuring what we determined to be racist sentiments. In that work, we showed that the states with the highest relative amount of racist content referencing President Obama - Mississippi and Alabama - were notable not only for being starkly anti-Obama in their voting patterns, but also for their problematic histories of racism. That is, even a fairly crude and cursory analysis can show how contemporary expressions of racism on social media can be tied to any number of contextual factors which explain their persistence.

The prominence of debates around online bullying and the censorship of hate speech prompted us to examine how social media has become an important conduit for hate speech, and how particular terminology used to degrade a given minority group is expressed geographically. As we’ve documented in a variety of cases, the virtual spaces of social media are intensely tied to particular socio-spatial contexts in the offline world, and as this work shows, the geography of online hate speech is no different.

Rather than focusing just on hate directed towards a single individual at a single point in time, we wanted to analyze a broader swath of discriminatory speech in social media, including the usage of racist, homophobic and ableist slurs.

Using DOLLY to search for all geotagged tweets in North America between June 2012 and April 2013, we discovered 41,306 tweets containing the word ‘nigger’, 95,123 referenced ‘homo’, among other terms. In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way. For example the phrase ‘dyke’, while often negative when referring to an individual person, was also used in positive ways (e.g. “dykes on bikes #SFPride”). The students were able to discern which were negative, neutral, or positive. Only those tweets used in an explicitly negative way are included in the map.

Tweets negatively referring to "Dyke"

All together, the students determined over 150,000 geotagged tweets with a hateful slur to be negative. Hateful tweets were aggregated to the county level and then normalized by the total number of tweets in each county. This then shows a comparison of places with disproportionately high amounts of a particular hate word relative to all tweeting activity. For example, Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map. So when viewing the map at a broad scale, it’s best not to be covered with the blue smog of hate, as even the lower end of the scale includes the presence of hateful tweeting activity.

Even when normalized, many of the slurs included in our analysis display little meaningful spatial distribution. For example, tweets referencing ‘nigger’ are not concentrated in any single place or region in the United States; instead, quite depressingly, there are a number of pockets of concentration that demonstrate heavy usage of the word. In addition to looking at the density of hateful words, we also examined how many unique users were tweeting these words. For example in the Quad Cities (East Iowa) 31 unique Twitter users tweeted the word “nigger” in a hateful way 41 times. There are two likely reasons for higher proportion of such slurs in rural areas: demographic differences and differing social practices with regard to the use of Twitter. We will be testing the clusters of hate speech against the demographic composition of an area in a later phase of this project.

Hotspots for "wetback" Tweets

Perhaps the most interesting concentration comes for references to ‘wetback’, a slur meant to degrade Latino immigrants to the US by tying them to ‘illegal’ immigration. Ultimately, this term is used most in different areas of Texas, showing the state’s centrality to debates about immigration in the US. But the areas with significant concentrations aren’t necessarily that close to the border, and neither do other border states who feature prominently in debates about immigration contain significant concentrations.

Ultimately, some of the slurs included in our analysis might not have particularly revealing spatial distributions. But, unfortunately, they show the significant persistence of hatred in the United States and the ways that the open platforms of social media have been adopted and appropriated to allow for these ideas to be propagated.

Funding for this map was provided by the University Research and Creative Activities Fellowship at HSU. Geography students Amelia Egle, Miles Ross and Matthew Eiben at Humboldt State University coded tweets and created this map.

The full interactive map is available here: http://users.humboldt.edu/mstephens/hate/hate_map.html

113 comments:

Persy10 May 2013 at 17:51
If I may ask, why weren't misogynist/sexist tweets included in this map?
ReplyDelete
Replies
Unknown10 May 2013 at 17:58
I'm curious to know if you included alternate spellings or misspellings in your search. There's more than one way to misspell the six-letter word used to defame gay men, for example.

Also, why only "cripple" for ableist slurs? What about "retard" or "gimp"?
ReplyDelete
Replies
Matthew Zook10 May 2013 at 18:00
This is the first iteration of a larger project, other terms/topics will be posted as they are processed.
ReplyDelete
Replies
Brendan O'Connor10 May 2013 at 18:05
Hi, how are you doing the normalization to compute the frequencies? The main map looks simliar to the overall density of Twitter messages in the USA, so I was wondering if it had been normalized.
ReplyDelete
Replies
jpa10 May 2013 at 18:44
This is great, but it mostly shows highly populated areas. Are you controlling for number twitter users in some way? You will see that there are lots of racists in an area, but that's just because there are more people.
ReplyDelete
Replies
Matthew Zook10 May 2013 at 20:12
Normalization is based on hateful tweets / total tweets.

It clearly is not a map of population density. See California, lots of people but relatively few hateful tweets when compared to the total volume of tweets.
ReplyDelete
Replies
CHT10 May 2013 at 20:54
This isn't very scientific. It lists pejoratives spoken by only one group. Since it's impossible that only one group can be racist, and does not use epitaphs used for every group, this project isn't something you could really take seriously.
ReplyDelete
Replies
Tom11 May 2013 at 00:14
I second the suggestion to check for alternate spellings. Also, some who use offensive language online will use alternate spellings to avoid certain filters common on websites. Typing "fa99ot" or "ni99er"for example, could become a habit that would transfer to tweeting.
ReplyDelete
Replies
Edward11 May 2013 at 04:05
Why have you omitted derogatory words directed against Whites such as: honkey, peckerwood, white trash, white boy and cracker?
ReplyDelete
Replies
Unknown11 May 2013 at 12:07
Is it possible to download the 150,000 offensive tweets along with the geodata in a KMZ for our own analysis? I work for a software company that specializes in Big Data and I am intrigued by how the information would be displayed on a heatmap using a linear distribution as opposed to the percentile distribution you use here.
ReplyDelete
Replies
raph11 May 2013 at 14:24
This is fascinating, but I'm concerned that you might not have processed enough data to draw meaningful conclusions.

You (appropriately!) normalize by number of tweets in that area to counter the effects of varying population density,
which ought to leads to a relatively continuously varying heat map. This is the case when zoomed out, but when one zooms in it becomes clear that individual hot spots are disproportionately affecting the average. For example,
if I look at the "racist" heat map at a high level (level 2) it looks like central Minnesota, from the Twin Cities westwards, is unusually racist. However, when I zoom in there's just one county (Hutchinson) in central MN that's quite hot.

Frankly it might just even be one individual, and there are so few overall tweets in the area that the normalization results in him or her affecting how the entire state of Minnesota appears.

Perhaps with more data a less "chunky" heat map could be produced. Or perhaps a different normalization algorithm would be more appropriate, that mitigates the effect of individual discontinuities affecting an entire region?
ReplyDelete
Replies
Unknown11 May 2013 at 14:46
I agree with Raph in that it seems there are a few problems with this analysis. You are assuming an unbiased distribution of people who geotag their tweets. Perhaps people who geotag are more likely to express racist sentiment? Normalizing hateful tweets per twitter subscribers should control for some effects of population density, but not all. Can you perform a geographically weighted regression, with pop density as an influencing variable?
ReplyDelete
Replies
P.B.11 May 2013 at 16:08
I am a bit upset that there is no religious "hate" and where is the "black racists" this is really starting to smell a bit bad, the anti christian, anti conservative, pro-constitution, pro-gun, anti-NRA does not seem to be counted, why?
ReplyDelete
Replies
Roger11 May 2013 at 18:06
HEAT MAPs are probably the wrong visualization choice for this data. It is misleading since hate generally appears to be driven by urban vs rural living.

A better mapping might be color filled counties, with no overlapping bleed, similar to voting district maps. Other possibilities include target circles.

The current map makes it look like the entire midwest/east coast is hate driven, yet when you zoom in ANYWHERE, you see that the hate driving low population counties had bled over into the major acceptance cities like Chicago/Nashville/etc.

Hate is often fueled by fear and lack of understanding of other cultures. In smaller communities, you are less likely to have friends of each culture so you are less likely to feel for them. In those small communities, the minority hides or moves away because of a lack of community support, increasing the problem. With lack of support, it only takes 1 person to spread hate and seed it in the community.

An appropriate graphic illustration should highlight the local areas so that it becomes apparent what drives the hate.
ReplyDelete
Replies
Roger11 May 2013 at 19:04
I would also be curious if the map looks different if you "grouped" all tweets to individuals. In other words, instead of normalizing around the number of tweets, determine if individuals send any hate tweets, and then normalize around the individual tweeters.

You used an example that 31 individuals sent 41 hate tweets. This could lead to isolated hot spots from angry individuals.

Often you see an individual angry Facebook user who can't seem to send a single response without hate. There may be 100 benign comments from other users, but 30 hate comments were all from the same user, who was particularly active that day.

This approach would isolate the "percent of the twitter population" that sends hate, vs the "percent of hate" among the tweets.

Both would be meaningful in a media setting.
ReplyDelete
Replies
Unknown13 May 2013 at 02:10
Hatred of women doesn't count? This is incredible. You build a complex application to map hate in the United States, and you leave out the single largest category: misogyny. Frankly the lacuna just reinforces the generalized misogyny of our culture, that hating women is so routine you don't even think to track it. For those of us who are women and the constant lifelong objects of that hatred, however, it's stunning.
ReplyDelete
Replies
Unknown13 May 2013 at 07:55
Did you find any hateful tweets toward people of Middle Eastern descent? Or would it have been too difficult to distinguish racial hatred from religious/ideological hatred?
ReplyDelete
Replies
rafael calsaverini13 May 2013 at 08:23
Is this raw data available in some way? Just the raw text of the tweets, with their geo-data and offensive/non-offensive tagging?

If there's some way to anonymize the author, this information would be nice to have too.
ReplyDelete
Replies
Jenny Reiswig13 May 2013 at 10:35
Would really like to see this without the normalizing based on volume of tweets. What would it look like normalized for population? I bet California wouldn't look like this hate-free paradise.
ReplyDelete
Replies
W.P. McNeill13 May 2013 at 10:48
I am going to venture an answer to the question of why the the authors of the study didn't track Tweets hostile to women/white people/Christians/conservatives/etc.

They can't do everything. It is impossible to draw a map of all the forms of group hostility in America that get expressed on Twitter, so they have chosen to focus on a few that everybody knows about and generally agrees are bad. That doesn't mean that other groups don't get attacked on Twitter, perhaps unfairly, perhaps in ways that would also make for an interesting map. This study is not the final word on hate. The omission of your preferred group is not itself an attack.

ReplyDelete
Replies
transfigure13 May 2013 at 12:34
Agree w/ W.P. McNeill. I'd suggest that the reason for the relatively small selection of hate-speech types and words comes down to resources. Consider that the creators of the study used human volunteers to pore through 150,000 tweets manually. If they had unlimited resources, they could increase the scope of the study dramatically. As it is, they never claimed that it is comprehensive.
ReplyDelete
Replies
jessicap!13 May 2013 at 13:47
Just saw this floating around the Tumblr-sphere, and have to say I'm so happy to see Humboldt in the URL - my own alma mater! Thank you for doing these kinds of great work (and putting up with the nutters in the comments too.)
ReplyDelete
Replies
Mike Craycraft13 May 2013 at 14:03
This is very interesting and thanks for the great work. It would be interesting if you could do some validation like comparing cancer tweets with SEER data on cancer incidence and see if the county level data matches. Or, perhaps more importantly if you could identify anti-American sentiments and map them to identify areas that may contain concentrations on anti-american groups.
ReplyDelete
Replies
Unknown13 May 2013 at 17:05
Are many of the same people issuing homophobic and racist tweets? Looks like that might be occurring.
bob mcconnaughey
ReplyDelete
Replies
Unknown13 May 2013 at 18:23
This is very interesting, but not very surprising. It validates what I have always thought in regard to racism in the United States.
I am surprised however, that a criteria for MUSLIM or ARAB was not used in this study. I suspect, that the HATE for this group is at an all time high.
ReplyDelete
Replies
Josh13 May 2013 at 22:01
can you tell us more about your methodology, both for discerning which tweets are really negative and for normalization? would you mind sharing your county-level data--not the tweets themselves but the counts for each slur geocoded by county?
ReplyDelete
Replies
Anonymous13 May 2013 at 22:45
The "hover" feature (that should show the underlying data at each particular county) does not work for me in either IE or Firefox. Is it broken? Do I need to use a different browser?
ReplyDelete
Replies
ret314 May 2013 at 10:03
I'll point out that the regrettable popularity of that particular slur as unique to Texas is probably a function of the riparian border Texas shares with Mexico (and hence the assumption that most undocumented immigrants in Texas must have waded across), while the other three border states have only surveyed lines in the desert to mark the international boundary.
ReplyDelete
Replies
Magister Calvert14 May 2013 at 10:57
The methodology behind this map was far from rigorous. First problem is they break the data down by counties however represent each county as a data point, not using the actual boundaries of the county. This causes wildly inaccurate results when you zoom in or out. Second problem is that the map doesn't account for population density or total internet presence, it only counts total number of incidences. 1000 hateful tweets from Harris County, Texas (pop over 4 million) would be represented identically to 1000 hateful tweets from Loving County, Texas (pop less than 100). That is wholly unrepresentative of reality.

Fix the design problems with the map and you will certainly get results that are more accurate to what is actually being measured.
ReplyDelete
Replies
TheFountainHead14 May 2013 at 12:11
I have two comments about this.
The first is that there are MANY fake accounts set up by sock puppets and Google Monkies that could easily skew things to create public sentiment.

Second, it's easy to tell who hates Arabs, Asians, immigrants, etc. Look at how many friends one person has with different races and nationalities. For example, one white person can have friends who are black, Indian, Jewish, Islamic, Ethiopian, Greek, Asian, Persian, the whole 9 yards and all that jazz, right?

The other white person has NOBODY else on their account except for the same white people who have each other on their other accounts. With connections from high school, work, church, etc.

Sometimes there might be 10 minorities in a school of 500. If that minority is popular, everyone would have them on their account.

I've seen comments and they don't fit a "criteria" so negative assessments do go ignored.

Nobody needed to do a study for me, I was able to figure it out.
ReplyDelete
Replies
D14 May 2013 at 12:33
I would be interesting to see statistics of #tweets/state's population - East coast is much more densely populated and may be skewing the picture (away from the central states or west coast)
ReplyDelete
Replies
Tom14 May 2013 at 12:53
is it a percent of negative tweets/total tweets or just # of tweets for region. what was the sampling distribution? thanks! interesting discussion topic
ReplyDelete
Replies
Throbert16 May 2013 at 18:31
Why was this study published under the title "The Geography of Hate", rather than something more neutral like "The Geography of Various Taboo Pejoratives" or simply "The Geography of Rudeness"?

I don't see any point in "dumbing down" the useful and important word hate by sloppily attaching it to ALL negative or prejudicial use of language. Speaking as a 41-year-old gay man, I have a suspicion that when a sensational label like HATE SPEECH!! is widely applied to a word as pedestrian as "faggot", it tends to inculcate a Princess-and-the-Pea mentality in young LGBT people, and undermines the lesson in Eleanor Roosevelt's adage "No one can make you feel inferior without your consent."

I can also attest that, for example, gay Republicans and gay Democrats will sometimes call each other "faggot" in an unmistakably hostile way -- and that in some cases, they really do despise each other's politics, and may even hate each other as individuals. But it's not clear to me that their use of the slur "faggot" is necessarily an expression of anti-homosexual hatred.

P.S. By the way, I understand that you were using the colors traditionally associated with "heat" and "coolness." Even so, it might have been better to do the mapping with varying shades of orange, green, and purple -- since in recent years, Americans have been accustomed to think that "blue = Democrats, red = Republicans."
ReplyDelete
Replies
Anonymous16 May 2013 at 21:15
I don't really understand why people are attacking each other in comments. Hate is hate no matter who it is directed at. Hatred only breeds more hatred. So yeah, some groups are excluded from this study because of limited scope and resources. Stop complaining that the map doesn't show how hated your favorite group is and focus on something that isn't hate.
ReplyDelete
Replies
Unknown17 May 2013 at 06:03
I'm sorry, but... http://xkcd.com/1138/
ReplyDelete
Replies
Emma17 May 2013 at 11:07
I read through all of the sections you suggested for those with questions about your methodology and I'm still left with one question that I don't think was covered substantively. I'm a Vermonter, and while I'm more than willing to acknowledge homophobia in my home state, and indeed, have experienced it firsthand, I was surprised to see a high concentration of homophobia around Burlington, the largest city in the state. It seems a much more likely explanation for the high concentration of "homophobic" tweets in Burlington, is the reclamation of the word "queer," as Burlington has a number of self-proclaimed queer activist organizations doing a lot of progressive work. How did your methodology account for positive usage of the word "queer?"
ReplyDelete
Replies
Throbert17 May 2013 at 16:21
@Smeezer:

First, I'm pretty sure, Throbert, that if I called you a "fucking nigger" because of your clearly ignorant response to this topic that any person of reasonably average intelligence would both discern the "hostility" and "deep or emotional dislike"

True enough, but if you instead called me a "stupid motherfucking shithead", that would ALSO express hostility and dislike, yet would not have been flagged on the map, simply because it didn't include a particular keyword!

Conversely, if someone tweeted "Dan Savage is a disgusting faggot," I assume it would've been counted as "negative" on the map. But could we actually be sure that the tweeter's motivation was in fact hatred for all gay people as a category? Maybe. But it's also possible that the tweeter has a specific animus towards Savage as an individual, not towards LGBT people generally, and chose the word "faggot" primarily because it nowadays has greater shock value than "motherfucker" does.

In short, the title "Geography of Hate" ascribes a particular type of motivation to the use of these taboo words, yet the students who were "scoring" the tweets were -- at best -- relying on guesswork about what the motivation was.

Incidentally, I agree that physical violence against gay people is a serious problem and clearly an expression of hate, but I'm not convinced that a proliferation of vulgar anti-gay slurs in Twitter is predictive of such violence.
ReplyDelete
Replies
hlots1118 May 2013 at 02:17
The overall map of hate looks like a wireless coverage map. Since people tend to tweet from their smartphones and their tablets, this isn't really a surprise. I think what was proven here is that most people will show how hateful they really are when provided a way to instantly broadcast their stream of consciousness without the filters that "polite society" demands they use.
ReplyDelete
Replies
Ernesto Molinas19 May 2013 at 04:31
I think that there are two main problems about the maps: first, the choice to employ counties centroids instead of counties areas, second, spots' sizes are proportional do intensity.
This makes maps to be biased by county dimensions and county density.

Western counties usually have larger areas than eastern ones, so the number on counties in the west is smaller. If spot sizes were all equal or if data were aggregated by county area bias would be severally reduced.

In your results different observational scales give rise to different interpretations, and for me that is a serious problem.
ReplyDelete
Replies
Unknown30 May 2013 at 12:10
The map with "Dyke", I would like to add if you put in common misspellings like Dike, there is a town close to the hottest part of the map in Iowa called Dike, so that one might (I stress MIGHT) be a little hotter for a reason.
ReplyDelete
Replies
Brian Meadows31 May 2013 at 22:04
So far no one's asked about indicators of Jew hating. Surely that ancient scourge still merits attention, no? What about tweets containing the words, 'k**e', 's***ny','Jewboy', 'yid' or 'hebe'? Those last two are sometimes used among Jews as some blacks use the term 'n****r' so that would need to be discounted. In any case, Jew hating can't have sunk to a level THAT insignificant, even here.
ReplyDelete
Replies
Brian Meadows31 May 2013 at 22:09
Or, for that matter, use of 'Zionist' as a pejorative. I'll second Dr. King's take on 'anti-Zionism'--with which I believe you are familiar.
ReplyDelete
Replies
music parent1 June 2013 at 14:53
There is a reasonable explanation for the use of the term "wetback" in Texas. "Wetback" is a reference to wading, fording, or swimming across the river as opposed to more formal ways of entering another country. The entire border between Texas and Mexico is the Rio Grande River. As a former Texan, referencing the Rio Grande is almost synonymous with referencing the international boundary. In other parts of the Southwest (New Mexico, Arizona, California), it would be linguistically and culturally odd to use the term "wetback" because there is no major river forming the international boundary. Perhaps there isn't an equivalent kind of pejorative term in other geographic regions that don't mentally reference a river as an international boundary.
ReplyDelete
Replies