July 22, 2014

How Many Hobbits Could Chuck Norris Take In a Fight?

Inspired by the (relatively) recent Buzzfeed quiz, "How Many Five Year Old Children Can You Take In a Fight?" [1], we have been wondering about other potential battle royale matchups: Juggalos vs. Bronies, Juggalos vs. polar bears, Justin Bieber vs. Miley Cyrus and even goats vs. llamas

Perhaps our favorite attempt at recreating this kind of scenario is asking: how many hobbits could Chuck Norris take in a fight? The analysis was quite complex as we had to first set rules on the engagement (e.g., what kind of weapons? is mithril armor allowed or not? etc.) and decide which version of Chuck Norris (Walter, Texas Ranger Chuck Norris? Actual current Chuck Norris? Perhaps Delta Force Chuck Norris?) and what kind of hobbits (after all are we talking Brandybucks or Tooks? are these typical Shire hobbits or have they been abroad? etc.) we are talking about here.  

As you can suspect, there was a lot to sort out. But after much discussion and analysis we have come up with a clear answer but sadly, as the actual question has nothing to do with this blog, we've been forced to bury it in the footnotes [2]. What we can do, however, for the purposes of this blog is compare the distribution of references to hobbits, as opposed to references to Chuck Norris, in geotagged tweets. Starting from a 10% sample of all global geotagged tweets from July 2012 through March 2014, we collected all references to "hobbit*" and "Chuck Norris" to enable our comparison.

Hobbits vs. Chuck Norris, July 2012-March 2014

At the global level, there are actually quite comparable numbers of references to hobbits and Chuck Norris, thus making the location and scale of our hypothetical battle all the more important. There are 27,527 references to the man on Superman's pajamas, and 24,145 references to those short little guys with hairy feet.

What is evident, however, is that Chuck Norris isn't particularly popular anywhere but in the United States, as nearly half of the global references to him come from the USA, giving him a nearly 9000 tweet advantage over hobbits. Perhaps not everyone else in the world finds quite as much humor in the many Chuck Norris Facts as Americans do? Or perhaps other countries have their own Chuck Norris-like cult heroes to look up to [3]? The next closest country in terms of Chuck Norris appreciation is France, with just 250 more Chuck Norris tweets than hobbit tweets, followed up by South Africa, Nigeria and Puerto Rico in the top 5 countries favoring the man who predicted 1000 years of darkness were Barack Obama to be re-elected President of the United States.

Meanwhile, the top 5 countries favoring hobbits are Indonesia - where they hold a 2,141 tweet advantage - Turkey, Mexico, Spain and Malaysia, each of which have a greater than 500 tweet advantage for hobbits over Chuck Norris. A total of eleven countries have more than 100 more references to hobbits than Chuck Norris, a considerable feat given that only the top 3 Chuck Norris countries have a more than 100 tweet advantage.

In many ways, the pattern in this map is a replication of that from our recent map comparing references to Bieber and Miley; just as the only places with a real preference for Miley Cyrus were the USA and a smattering of African countries, so too are these the only places with a significant preference for Chuck Norris. Does this mean there is some sort of Chuck-Miley conspiracy afoot? Or that Bieber has taken command of an army of hobbits in his quest for world domination? We'll leave it to you to find out...

[1] See also: How many Justin Biebers could you take in a fight? How many 90 year olds could you take in a fight? How many hipsters could you take in a fight?
[2] The answer is zero.  Because hobbits are actually just fictional characters and Chuck Norris is a real living person. See? Sometimes there are clear and easy answers to tough questions.
[3] Ironically, of course, Kenya seems to display a slight preference for Chuck Norris over hobbits, despite Makmende's imposing presence.

July 08, 2014

A Quick Look at Global Language Patterns on Twitter

Today's post is derived from some testing we were doing within our data on language and since the results were interesting, we thought we'd share. This is a first step of a longer process of comparing language use at the global scale so much remains to be done.

Starting from a 10% sample of all global geotagged tweets from the calendar year 2013, we collected tweets that used a variety of non-Latin characters as a proxy for linguistic prevalence (see the map titles below for the list of characters searched). Using composite counts of what we found to be the five most commonly used characters in each of the given languages, we mapped normalized values at the country level in order to understand where these languages are most dominant. In other words, these maps represent the relative level of tweets containing non-Latin characters compared to all tweets; the US has plenty of tweets with Arabic, Chinese and Korean characters but these numbers are small compared to the overall number of tweets within the country.  

There are some issues with the data we collected -- for instance, we relied on non-definitive sources for our list of the most commonly used characters, and the constraints of the way we've structured our data makes (how we treat boolean queries and computing constraints) make our data somewhat incomplete. But still the initial results provide a reasonable snapshot of where Twitter is being used by people who don't speak languages which can be easily expressed in Latin characters. 

Arabic Characters:   ل   ن   م   ي   ا      

The spatial pattern of Arabic-language tweeting is interesting in that it seems to mimic a conventional distance decay effect. Saudi Arabia is the undoubted center of Arabic tweeting, with its immediate neighbors having relatively lower amounts, with their immediate neighbors having even lower concentrations, with practically no discernible differences once you reach Sub-Saharan Africa to the south, India to the east, or Europe to the north and west.

Chinese Characters:   的   一   是   不   了

While Japan has the highest absolute number of tweets containing Chinese characters, due to the fact that the Japanese language relies on written Chinese characters, the relative measure shows China to, quite unsurprisingly, be the center of Chinese-language tweeting. The territory of Greenland shows up as well, mainly because of the relatively low number of total tweets making the few tweets with Chinese characters relatively more frequent. We could, of course, account for this by requiring certain thresholds but for this initial look, we left it in. Given the increasing dominance of China within the global economy, it's somewhat interesting to see that there is very little Chinese-language tweeting happening in other parts of the world.

Korean Characters:   뭐   그   안   근데   거

The final language we explored was Korean and while it is not surprising that South Korea has by far the most Korean tweeting, it is interesting to note that North Korea, despite its almost complete disconnection to the global system, also appears on the map. Again, it seems that the scattering of relatively high scores for places such as Greenland and Somalia has more to do with the relatively low level of overall tweeting in these places than with some previously unknown concentration of Korean-speakers.

While there's not much definitive here, we believe this to be a useful, if incredibly brief, look at how online spaces such as Twitter remain connected to conventional, offline geographies, such as those of language and culture. And given the recent emergence of domain names in non-Latin characters, these maps might offer clues into the evolving geography of domain names, while also offering some potential for future research using such data.

July 01, 2014

The Drama of Llamas vs. the Gloating of the Goats

It should be no surprise to anyone that we're interested in sheep. But today we want to continue to mine the possibilities of our IronSheep 2014 dataset to bring you an alternative geography of animals as they are discussed and represented in social media [1]. Focusing on the global level and using a 10% sample of all geotagged tweets created between July 2012 and March 2014, we sought out to understand the global distribution of goats as opposed to llamas. 

Because, you know, it's important. Or perhaps because we're a bit bored.

While goats and llamas don't carry the same inherent antagonism as, say, bronies and juggalos [2], we thought it might be interesting to see how the two compare across the world since they are both major competitors to our favored sheep in the world of livestock [3]. At the most general level, llamas are absolutely dominant, with nearly triple the number of tweets as those mentioning goats, with 63,606 references to llamas and 24,322 references to goats. Of course, one does wonder, what all this llama/goat discourse is about? Are people extolling the virtue of their animal, or mentioning a chance sighting, or perhaps talking about what's on for dinner? Or perhaps someone has finally invented a hoof-accessible mobile device and the animals are taking to the net?

In any case, these raw numbers certainly don't tell the whole story, although arguably llamas are much cooler and more interesting than goats, so as to warrant significantly greater tweeting about them.

Global References to Goats and Llamas, July 2012-March 2014

Indeed, by mapping the concentrations of each term relative to the other, we can see that while llamas are dominant overall, their spatial distribution is much more concentrated, while goats, though in smaller numbers, are much more widely dispersed throughout the globe. 

Llamas dominate livestock-related tweeting in Latin America. While perhaps unsurprising given their offline manifestation throughout South America, Spain and Mexico actually have the highest number of both absolute and relative references to llamas, despite neither being a native habitat for the animal. Further, only two countries in the top 20 for relative references to llamas are not predominantly Spanish-speaking: Brazil has 1,189 more references to llamas based on our 10% sample, good for 8th most, while France has 82 more references to llamas, making it the 20th-most llama-est country in the world. Also interesting is the fact that the only three countries in Latin America and the Caribbean which do not favor llamas over goats are not Spanish-speaking: Guyana, Suriname and Haiti.

Meanwhile, the United States and United Kingdom are the only countries worldwide to display significant preference for goats over llamas, with over 10,000 and 3,000 more references respectively, while Nigeria, Canada and Australia all show some moderate preference for goats. The fact that the US also has the fifth-most absolute number of references to llamas just goes to show how much people in the US love their goats. I mean, who doesn't love goats, especially when they sound like humans? Plus, they can eat all of your leftover beer cans!

While much of Africa's preference for goats is also largely unsurprising given that it has some of the highest levels of global goat production next to China and India (which are likely lower on the goat rankings due to linguistic differences), we are somewhat baffled as to why most of Europe has a preference for llamas. But then again, after watching the goat screaming video for awhile it all seems to make sense.

[1] But definitely not an animal geography.
[2] A quick Google search for "goats and llamas" will likely return a number of results for how farmers can use llamas to protect their goat herds. Should these results not show up for you, blame Google and their never-ending drive to collect massive amounts of personal data about you in order to create a personalized experience of the internet for you that never exposes you to such oddities or anything else you might find unseemly.
[3] The less said about cows the better.

June 24, 2014

To Bieb or Not to Bieb? The Geographies of Bieber and Miley Fandom

In our continuing effort to use the massive amount of social data available to us in order to uncover unforeseen, unusual and sometimes uninteresting facts about the world around us, we turn today to a question that has long troubled our world (or at least the part of it consisting of fourteen year-old girls): Bieber or Miley? 

While the once (sort of?) innocent teen pop stars have long since grown up, getting any number of ridiculous and ill-advised tattoos, twerking across your television screen and maybe even romancing one another, Justin Bieber and Miley Cyrus remain inextricably tied in the imaginations of those of us who mostly don't really know what's going on with the kids these days [1]. But by firing up DOLLY and looking at the global distribution of tweets referencing one or the other of these music icons, we can see that the two couldn't be more different in their geographic reach.

Our comparison is based on a 10% random sample of all global geotagged tweets between July 2012 and March 2014, which yielded a total of 165,406 tweets referencing "Bieber" and 99,146 tweets mentioning "Miley".

The first thing that's evident from this map is that Justin Bieber is truly "All Around the World", garnering more references to his name than Miley Cyrus' in most of the world's countries. And while Bieber's dominance starts in his native Canada and extends south throughout the Americas from there, Miley Cyrus comes in like a "Wrecking Ball" to have a real "Party in the USA", where she has a nearly 10,000 tweet advantage over the Bieber. Unfortunately for Miley, however, the US is really the only place where she is more popular than Bieber. Indeed, she only has any advantage whatsoever in 45 countries around the world, with most of these clustered in Africa and the Caribbean. Then again, maybe she's just getting "The Best of Both Worlds"?

And while Bieber's advantage extends through Europe and much of Asia, his dominance is actually most deeply rooted in Latin America. The country with the biggest difference favoring Bieber tweets is Brazil, with over 22,000 more Bieber tweets than Miley tweets, even in our limited dataset. This is likely due to Bieber's well-documented risqué escapades in the country. In addition to his absolute dominance in Brazil, Bieber has an advantage of over 1,000 tweets in 18 other countries around the world, from Indonesia, Mexico, Turkey and Argentina at the top of the list, to Sweden, Denmark and Paraguay at the bottom.

Forty countries have no geotagged tweets referencing Bieber or Miley, though many of these are small island nations with very little tweeting activity to begin with. We suspect that there is probably a development grant that these places could apply for to help make them Beliebers.

The most interesting thing is that no country with any significant amount of tweeting about these pop stars displays parity between the two. This leads us to posit that there has been a significant Balkanization of the Biebersphere [2], with no reconciliation between the two opposing poles of over-sexualized, tabloid headline-gracing teen pop stars who are now more known for their distasteful appropriations of other cultural traditions than for actually making music anyone wants to hear. Then again, if you want to get dialectical about it, there's really nothing oppositional about them. Hell, they even twerk together! And by making this map, we've now probably set society back at least a good couple weeks in our arduous process of learning to ignore them. Our apologies. Sometimes, "We Can't Stop" ourselves.

OK, seriously, we're done now [3].

[1] Seriously, turn that music down! And get off of our (virtual) lawn!
[2] If you're wondering why we suddenly decided to invent the term 'Biebersphere' to refer to Twitter, look no further than the fact that Justin Bieber remains arguably the largest single topic of conversation on Twitter. It's frankly sort of amazing how many people tweet about him on a regular basis. And yes, this does utterly depress us about the state of humanity.
[3] Although, "Never Say Never".

June 10, 2014

Crowdsourcing Cake or Death?

Following up on our recent trend of finding inspiration for our maps in various oppositions that we've encountered in our day-to-day lives, we turn today to the seemingly obvious question posed by Eddie Izzard: cake or death? 

While this should be a no-brainer for us, we thought we'd crowdsource the answer to this question, turning to the collective wisdom of the geographically-referenced tweet machine. We draw on a dataset of all geotagged tweets mentioning "cake" or "death" between July 2012 and March 2014 [1]. Given that cake is so much more pleasurable than death, we expected Twitter references to show a similar preference. But the results might surprise you. 

Humans, apparently share a similar fondness for talking about cake and death. Extrapolating from our 10% random sample of global tweets, there are approximately 1,302,310 mentions of cake during this time, as opposed to 1,314,880 mentions of death.

Global Geotagged Twitter References to Cake or Death, July 2012-March 2014

The death loving nations of the United States, Nigeria, Canada, South Africa, and India clearly stand out on the map. Cake, on the other hand, is a much more frequent topic of conversation in the UK and a handful of Southeast Asian countries including Indonesia, Malaysia, the Philippines, and Thailand.

Among countries with a significant number of references to both cake and death, the Mediterranean countries of Lebanon and Greece, along with the Caribbean nations of Trinidad and Tobago and Barbados are the only ones that could be said to have found a nice balance between cake and death.

The real question here is, why do some countries prefer death over cake? It is understandable that Canadians are locked in a deep cake-less existential crisis (we would be too if we lived there), while South Africa has one of the world's highest murder rates. But why is the US so infatuated with death?

Geotagged Twitter References to Cake or Death in the USA, July 2012-March 2014

If we zoom into the world's most death-loving country, death is, well, pretty much everywhere around you. Death to everyone, indeed. In absolute terms, there are a total of 162,205 mentions of death in the US, as opposed to 845,923 mentions of cake, but the geographic distribution of these references is all the more stark and, dare we say it, troubling. If you happen to live in or, god forbid, be passing through the post-industrial towns of Michigan, Ohio or Pennsylvania, or the BosWash megalopolis, death is really everywhere around you. From the frozen tundra of the north to the sunny retirement hotspots of southern California, Arizona and Florida, you can't really escape it.

That is, unless you live in one of a handful of cities or towns smattered throughout the south and Great Plains. If, by choice or extreme luck, you happen to live in Atlanta or in one of several Texas cities -- from Dallas to Waco, down to Houston and all the way to Brownsville in the southern portion of the state -- you may be able to revel in the joy of boundless cake. Given the widespread dominance of death in other places, it is only natural to assume that cake will essentially become so abundant as to be given away for free at all restaurants and grocery stores. May we all be so lucky! [2]

[1] Yes, this is another missed opportunity from IronSheep 2014!
[2] This, of course, doesn't account for the fact that too much cake consumption will likely lead to obesity and then, yes, death.

June 03, 2014

Mapping the Seven Dirty Words

As many of our regular readers know, each year at the Annual Meetings of the Association of American Geographers we hold a map hacking event that we call IronSheep. Modeled after the Iron Chef television show, we provide the 'secret sauce' of a dataset to the teams of contestants who must then concoct a 'tasty map' for the crowd to consume. When putting together the dataset for this year, we consciously embedded the potential for a few different concepts to be explored, but without telling the contestants about these possibilities.

One of these unrealized possibilities, which we bring you today, was a comparison of George Carlin's infamous "seven dirty words". For those of you who are unacquainted, the genesis of Carlin's bit was that saying these words in any context could get one in trouble with the law -- especially if uttered on a television or radio broadcast. But since we're talking about the internet here, pretty much anything goes, as can be seen in the sheer numbers of times these words are referenced in geotagged tweets around the United States. And while we could technically get away with saying these words on this medium, we like to run a family-friendly website and so we'll be using euphemisms for each. Our apologies if you're offended by these words, but this is, after all, science. And for those who absolutely have to see the terms that Carlin referred to as bad, dirty, filthy, foul, vile, vulgar, coarse, in poor taste and unseemly (among many other things), we have included them in the footnotes, with a few selectively redacted letters to lessen the shock [1].

Like the rest of this year's IronSheep dataset, this data is culled from our database of all geotagged tweets from July 2012 through March 2014. In order to stay as true as possible to Carlin's seven dirty words, we didn't include references to derivative words outside the original seven [2].  Even with this restriction we ended up with a total of 43,086,300 references to the seven dirty words which shows how twitter users are just a *un** of foul-mouthed, **x****, ***q**, *****ff** flaming ***d*!!! The list below shows the true magnitude of foul, unholy geotagged tweets (or FUGTs) generated in the United States, with an average of:
  • 2,051,728.6 FUGTs per month
  • 67,533.4 FUGTs per day
  • 2,813.9 FUGTs per hour
  • 46.9 FUGTs per minute
  • 0.78 FUGTs per second  
One of the seven dirty words gets tweeted out nearly every second? We truly are number one [3]! But in order to get a better sense of the spatial distribution of this collection of twisted bilge masquerading as discourse and social commentary, we aggregated this complete pile of **** to the county level and normalized it by the total number of tweets in each county. And yes, there were indeed some non-profane tweets so this normalization exercise actually means something.

Bodily Waste (solid); the Act of Evacuation; Pretense/Lies; Expressing Amazement, Incredulity or Annoyance; Something Inferior; Something Superior (the ____)
(vulgar, noun, verb, interjection, n=22,630,879)

The first word in Carlin's sequence, another word for excrement, is by far the most popular of the seven. It accounts for over half of the total number of references in our dataset, with more than 22 million tweets. This word also presents arguably the most interesting finding of our study, in that references to this word are overwhelming concentrated in the American South. While our previous research has shown the South to be unique in its interest in church, racial issues and referring to groups of two or more people as "y'all", it is apparently also unique in its unabashed love for excremental exclamations [4].

Bodily Waste (liquid); the Act of Evacuation; Drunk (____ed); Angry (____ed); Request to leave (____ off)
(vulgar, noun, verb, interjection, n=645,100)

...perhaps we should qualify that last statement, based on our map of the second of Carlin's dirty words, as the geography of liquid excrement seems to be somewhat reversed from our previous map. While much of the South falls back into the lower values, one can also observe a greater concentration of references in central Appalachia and throughout the Rust Belt to its north. Even much of the west coast seems averse to the word, seemingly showing that it is largely the Midwest that is awash in this term.

To Engage in Carnal Congress; To mistreat (____ over) or meddle (____ with); Expressing Disgust, Anger or Rejection (____ you or ____ off); To ruin (____ up); To be concerned, usually negated (give a ____)
(vulgar, verb, noun, interjection, n=19,125,640)

Hopefully is is little more than a coincidence that another word used to refer to carnal congress -- itself the second most popular of the seven dirty words, with nearly half of the total number of hits in our dataset -- in many ways mimics the geography of the most popular of the seven mentioned above, albeit with a less pronounced concentration in the American South. Instead, this word seems to have solid clusters in the northeast and west coast, though the counties with the highest relative values seem more scattered throughout the mountain west and Great Plains while much of the rest of the country doesn't appear to give a ____.

Lady Bits; Pejorative Characterization of Individual (generally women)
(vulgar, noun, n=263,959)

Arguably the most derogatory word of the seven given that its commonly used as a tool of misogyny, this term has no real significant clustering anywhere in the continental US. It is interesting, however, that as much as we've found Southerners to love certain four letter words (see Map #1) there is a distinctly below average frequency of references of this decidedly uncouth term. 

A purveyor of oral invigoration towards a male recipient; An offensive individual
(vulgar, noun, n=6,625)

We're almost heartened by the fact that another word used to refer a purveyor of oral invigoration towards a male recipient has by far the fewest mentions of any of the seven dirty words, perhaps due to a declining societal acceptance of homophobia, which is arguably the most common use of this particular term. References here are scattered at best, thought most seem to be in the Midwest and Great Plains, with some lesser concentrations in the northeast with rural areas tending towards higher relative frequency of use. 

An individual engaged in carnal congress with another who has the status, function or authority associated with female parenting derived via biological reproduction, adoption or legal guardianship; a despicable person 
(vulgar, noun, adjective, n=159,786)

This twelve letter word, which might be used literally to refer to someone who has engaged in carnal congress with another who has the status, function or authority associated with female parenting derived via biological reproduction, adoption or legal guardianship, is definitely the most universal of the seven dirty words, with near uniform usage across the United States. While parts of the northeast and rural Great Plains have higher concentrations, this is pretty much the word you can be sure to hear no matter where you are in the good ole US of A.

Paired glands secreting matter (which is neither gaseous nor solid) for nourishment for progeniture; Given in retaliation (singular form, ___ for tat)
(vulgar, noun (plural), n=254,311)

The last of the seven dirty words, another word referring paired glands secreting matter (which is neither gaseous nor solid) for nourishment for progeniture, has relatively few references within the South, while a handful of counties in the Great Plains states seem to have a fairly significant number of mentions relative to their overall tweeting. 

In conclusion, it is evident that while Carlin saw these words as being united around their prohibition, they remain divided in both their general levels of use and acceptability, as well as in their spatial distribution. While the first and third dirty words in the sequence are much more prevalent than, say, the fifth, their spatial distributions and remarkably different, as we have shown with this series of maps. So even if we've all got stuff to be *i*s*ed off about, we all express it in our special ways. Now **c* off. 

[1]  *h**, **s*, ****, **n*, ************, **********e*, and **t*. 
[2] Imagine adding -ing or -ty, among other things, to the end of some of these words. 
[3] Yay?
[4] Band name.

May 27, 2014

Hey Y'all! Geographies of a Colloquialism

Here at Floatingsheep, we've spent the last several years trying to demonstrate the potentials, as well as the pitfalls, of using user-generated internet content for geographic research [1]. A key focus has been how the online world of social media at times reflects, and at other times distorts, our understandings of the offline, material world. 

With all the recent hoopla around the geographies of language, we wanted to return to this topic, using a relatively straightforward example: the geography of y'all. No, not the geography of each and every one of you, the geography of the word "y'all" (see definition below). 

Rather than conducting a survey to measure the term's usage, we decided (after careful thought and rigorous debate) to do something new and use geotagged tweets [2]. Searching all of the geotagged tweets in the United States from July 2012 through March 2014 for variations of "yall" (the most commonly used y'all, as well as ya'll and yall to capture typos or alternative spellings), we found a total of 1,870,687 tweets using this folksy second-person plural pronoun, more than enough to make some definitive conclusions (or at least some maps).

Using only the absolute number of tweets with references to y'all to begin, this is clearly a geographically-specific phenomenon. While some places are extremely saturated with references (we'll get to these in just a sec!), there are 250 counties in the United States with no y'all tweets whatsoever, and approximately 60% of the country's 3,143 counties had fewer than 100 y'all tweets in the nineteen month period from which our data originates.

Still using only these absolute numbers of tweets referencing y'all, Texas, Georgia, Florida, North Carolina and California make up the Top 5 states, while the cities of Dallas, Houston, Chicago, Philadelphia and Los Angeles make up the Top 5 metro areas. And while not exactly mimicking population distribution, there is something clearly suspect about believing that folks in Los Angeles, Chicago and Philadelphia say y'all more than good old fashioned Southerners do. So, to make the map below, we instead normalized the county-level data by the total number of tweets originating in those counties during the same time period.

Geotagged Tweets Referencing Y'all, July 2012 - March 2014

On the broadest level, all suspicions and previous research on the matter is confirmed using our normalized tweet dataset: y'all is much more likely to be uttered (or tweeted) in the South than in any other part of the United States... or even the world, for that matter, as there are approximately sixteen times more references to the term in the USA than in the rest of the world combined [3]! But even still, there are some interesting anomalies worth commenting on...

Using these normalized values, we can see a new hierarchy emerge at the state level, with Louisiana, Alabama and Georgia having the highest relative number of tweets, much more in line with what one would expect. At the county level, 97 of the top 100 normalized values are located within the south (by practically any definition). The only three counties outside of this region in the top 100 are Boundary County, Idaho, Dawson County, Montana and Goshen County, Wyoming, the first two of which surprisingly rank #2 and #3 overall in these normalized rankings, led only by Talbot County, Georgia, the epicenter of y'all-related tweeting [4]. But even the South isn't homogenous when it comes to the usage of y'all, as the central Appalachian region of eastern Kentucky, West Virginia and southwest Virginia remains relatively untouched by Twitter references to y'all, despite being more-or-less surrounded by them. Indeed, Kentucky (spiritual homeland of Floatingsheep) is relatively sparse in references to y'all, despite selling these extremely expensive sweatshirts that attempt to capitalize on the state's southern charm.

Apart from some of these slight anomalies, much of this should come as no surprise to anyone who has spent much time in -- or even knows somebody from -- the South. So we thought it might be interesting to compare our own map to a handful of similar maps that have been circulating around the internet recently.

Some Other Maps of Y'all

The first map shows a stark north/south divide between the places that say "you guys" and those that say "y'all" (and, well, Pennsylvania, the western portion of which is also known for its use of "yinz"). The second map, taken from the New York Times interactive dialect quizdeveloped by Joshua Katz, largely resembles our own map, but seems to place the epicenter of y'all much further west than our own, in southeast Louisiana, bleeding over somewhat into Mississippi. 

So while there is some general agreement that Louisiana, Mississippi, Alabama, Georgia and Texas form the territorial heart of y'all, our work, along with the data from the Times' dialect survey, disputes the cut-and-dry story told by the first map. While it shows significant portions, if not all, of Missouri, Oklahoma, Arkansas, Kentucky, West Virginia and Tennessee, among others, firmly in y'all country, the dividing line appears to be both quite a bit further south, and quite a bit more squiggly [5] in nature. While some conventionally Southern states have only relatively confined pockets of references to y'all in our dataset (as well as in the Times' data), it's equally important to recognize that there are pockets of y'all densely concentrated in some more far flung areas of the country as well.

But ultimately as long as you have a group of friends worth using a second-person plural pronoun -- contracted or otherwise -- in reference to, we imagine you're doing just fine. 

Y'all come back now, y'hear?!
[1] Wait, wait, wait... there are pitfalls to this?!?!?!?!
[2] This was also the most convenient data to use, since we had them lying around.
[3] The Bahamas and South Africa come in at #2 and #3 globally in references to y'all.
[4] We suspect that Talbot County, Georgia is the epicenter of exactly nothing else. Although we fully expected that someone from there will angrily correct us very shortly.
[5] That's a technical cartographic term.