June 03, 2014

Mapping the Seven Dirty Words

As many of our regular readers know, each year at the Annual Meetings of the Association of American Geographers we hold a map hacking event that we call IronSheep. Modeled after the Iron Chef television show, we provide the 'secret sauce' of a dataset to the teams of contestants who must then concoct a 'tasty map' for the crowd to consume. When putting together the dataset for this year, we consciously embedded the potential for a few different concepts to be explored, but without telling the contestants about these possibilities.

One of these unrealized possibilities, which we bring you today, was a comparison of George Carlin's infamous "seven dirty words". For those of you who are unacquainted, the genesis of Carlin's bit was that saying these words in any context could get one in trouble with the law -- especially if uttered on a television or radio broadcast. But since we're talking about the internet here, pretty much anything goes, as can be seen in the sheer numbers of times these words are referenced in geotagged tweets around the United States. And while we could technically get away with saying these words on this medium, we like to run a family-friendly website and so we'll be using euphemisms for each. Our apologies if you're offended by these words, but this is, after all, science. And for those who absolutely have to see the terms that Carlin referred to as bad, dirty, filthy, foul, vile, vulgar, coarse, in poor taste and unseemly (among many other things), we have included them in the footnotes, with a few selectively redacted letters to lessen the shock [1].

Like the rest of this year's IronSheep dataset, this data is culled from our database of all geotagged tweets from July 2012 through March 2014. In order to stay as true as possible to Carlin's seven dirty words, we didn't include references to derivative words outside the original seven [2].  Even with this restriction we ended up with a total of 43,086,300 references to the seven dirty words which shows how twitter users are just a *un** of foul-mouthed, **x****, ***q**, *****ff** flaming ***d*!!! The list below shows the true magnitude of foul, unholy geotagged tweets (or FUGTs) generated in the United States, with an average of:
  • 2,051,728.6 FUGTs per month
  • 67,533.4 FUGTs per day
  • 2,813.9 FUGTs per hour
  • 46.9 FUGTs per minute
  • 0.78 FUGTs per second  
One of the seven dirty words gets tweeted out nearly every second? We truly are number one [3]! But in order to get a better sense of the spatial distribution of this collection of twisted bilge masquerading as discourse and social commentary, we aggregated this complete pile of **** to the county level and normalized it by the total number of tweets in each county. And yes, there were indeed some non-profane tweets so this normalization exercise actually means something.

Bodily Waste (solid); the Act of Evacuation; Pretense/Lies; Expressing Amazement, Incredulity or Annoyance; Something Inferior; Something Superior (the ____)
(vulgar, noun, verb, interjection, n=22,630,879)

The first word in Carlin's sequence, another word for excrement, is by far the most popular of the seven. It accounts for over half of the total number of references in our dataset, with more than 22 million tweets. This word also presents arguably the most interesting finding of our study, in that references to this word are overwhelming concentrated in the American South. While our previous research has shown the South to be unique in its interest in church, racial issues and referring to groups of two or more people as "y'all", it is apparently also unique in its unabashed love for excremental exclamations [4].

Bodily Waste (liquid); the Act of Evacuation; Drunk (____ed); Angry (____ed); Request to leave (____ off)
(vulgar, noun, verb, interjection, n=645,100)

...perhaps we should qualify that last statement, based on our map of the second of Carlin's dirty words, as the geography of liquid excrement seems to be somewhat reversed from our previous map. While much of the South falls back into the lower values, one can also observe a greater concentration of references in central Appalachia and throughout the Rust Belt to its north. Even much of the west coast seems averse to the word, seemingly showing that it is largely the Midwest that is awash in this term.

To Engage in Carnal Congress; To mistreat (____ over) or meddle (____ with); Expressing Disgust, Anger or Rejection (____ you or ____ off); To ruin (____ up); To be concerned, usually negated (give a ____)
(vulgar, verb, noun, interjection, n=19,125,640)

Hopefully is is little more than a coincidence that another word used to refer to carnal congress -- itself the second most popular of the seven dirty words, with nearly half of the total number of hits in our dataset -- in many ways mimics the geography of the most popular of the seven mentioned above, albeit with a less pronounced concentration in the American South. Instead, this word seems to have solid clusters in the northeast and west coast, though the counties with the highest relative values seem more scattered throughout the mountain west and Great Plains while much of the rest of the country doesn't appear to give a ____.

Lady Bits; Pejorative Characterization of Individual (generally women)
(vulgar, noun, n=263,959)

Arguably the most derogatory word of the seven given that its commonly used as a tool of misogyny, this term has no real significant clustering anywhere in the continental US. It is interesting, however, that as much as we've found Southerners to love certain four letter words (see Map #1) there is a distinctly below average frequency of references of this decidedly uncouth term. 

A purveyor of oral invigoration towards a male recipient; An offensive individual
(vulgar, noun, n=6,625)

We're almost heartened by the fact that another word used to refer a purveyor of oral invigoration towards a male recipient has by far the fewest mentions of any of the seven dirty words, perhaps due to a declining societal acceptance of homophobia, which is arguably the most common use of this particular term. References here are scattered at best, thought most seem to be in the Midwest and Great Plains, with some lesser concentrations in the northeast with rural areas tending towards higher relative frequency of use. 

An individual engaged in carnal congress with another who has the status, function or authority associated with female parenting derived via biological reproduction, adoption or legal guardianship; a despicable person 
(vulgar, noun, adjective, n=159,786)

This twelve letter word, which might be used literally to refer to someone who has engaged in carnal congress with another who has the status, function or authority associated with female parenting derived via biological reproduction, adoption or legal guardianship, is definitely the most universal of the seven dirty words, with near uniform usage across the United States. While parts of the northeast and rural Great Plains have higher concentrations, this is pretty much the word you can be sure to hear no matter where you are in the good ole US of A.

Paired glands secreting matter (which is neither gaseous nor solid) for nourishment for progeniture; Given in retaliation (singular form, ___ for tat)
(vulgar, noun (plural), n=254,311)

The last of the seven dirty words, another word referring paired glands secreting matter (which is neither gaseous nor solid) for nourishment for progeniture, has relatively few references within the South, while a handful of counties in the Great Plains states seem to have a fairly significant number of mentions relative to their overall tweeting. 

In conclusion, it is evident that while Carlin saw these words as being united around their prohibition, they remain divided in both their general levels of use and acceptability, as well as in their spatial distribution. While the first and third dirty words in the sequence are much more prevalent than, say, the fifth, their spatial distributions and remarkably different, as we have shown with this series of maps. So even if we've all got stuff to be *i*s*ed off about, we all express it in our special ways. Now **c* off. 

[1]  *h**, **s*, ****, **n*, ************, **********e*, and **t*. 
[2] Imagine adding -ing or -ty, among other things, to the end of some of these words. 
[3] Yay?
[4] Band name.

  1. As a southern Appalachian, I'm not surprised to see "Bodily Waste (solid)" is most frequently tweeted from the southern states. We have mastered the use of that word so that it can be used in any sentence as a noun, verb, adjective, adverb, and numeral.


