July 22, 2014

How Many Hobbits Could Chuck Norris Take In a Fight?

Inspired by the (relatively) recent Buzzfeed quiz, "How Many Five Year Old Children Can You Take In a Fight?" [1], we have been wondering about other potential battle royale matchups: Juggalos vs. Bronies, Juggalos vs. polar bears, Justin Bieber vs. Miley Cyrus and even goats vs. llamas

Perhaps our favorite attempt at recreating this kind of scenario is asking: how many hobbits could Chuck Norris take in a fight? The analysis was quite complex as we had to first set rules on the engagement (e.g., what kind of weapons? is mithril armor allowed or not? etc.) and decide which version of Chuck Norris (Walter, Texas Ranger Chuck Norris? Actual current Chuck Norris? Perhaps Delta Force Chuck Norris?) and what kind of hobbits (after all are we talking Brandybucks or Tooks? are these typical Shire hobbits or have they been abroad? etc.) we are talking about here.  

As you can suspect, there was a lot to sort out. But after much discussion and analysis we have come up with a clear answer but sadly, as the actual question has nothing to do with this blog, we've been forced to bury it in the footnotes [2]. What we can do, however, for the purposes of this blog is compare the distribution of references to hobbits, as opposed to references to Chuck Norris, in geotagged tweets. Starting from a 10% sample of all global geotagged tweets from July 2012 through March 2014, we collected all references to "hobbit*" and "Chuck Norris" to enable our comparison.

Hobbits vs. Chuck Norris, July 2012-March 2014

At the global level, there are actually quite comparable numbers of references to hobbits and Chuck Norris, thus making the location and scale of our hypothetical battle all the more important. There are 27,527 references to the man on Superman's pajamas, and 24,145 references to those short little guys with hairy feet.

What is evident, however, is that Chuck Norris isn't particularly popular anywhere but in the United States, as nearly half of the global references to him come from the USA, giving him a nearly 9000 tweet advantage over hobbits. Perhaps not everyone else in the world finds quite as much humor in the many Chuck Norris Facts as Americans do? Or perhaps other countries have their own Chuck Norris-like cult heroes to look up to [3]? The next closest country in terms of Chuck Norris appreciation is France, with just 250 more Chuck Norris tweets than hobbit tweets, followed up by South Africa, Nigeria and Puerto Rico in the top 5 countries favoring the man who predicted 1000 years of darkness were Barack Obama to be re-elected President of the United States.

Meanwhile, the top 5 countries favoring hobbits are Indonesia - where they hold a 2,141 tweet advantage - Turkey, Mexico, Spain and Malaysia, each of which have a greater than 500 tweet advantage for hobbits over Chuck Norris. A total of eleven countries have more than 100 more references to hobbits than Chuck Norris, a considerable feat given that only the top 3 Chuck Norris countries have a more than 100 tweet advantage.

In many ways, the pattern in this map is a replication of that from our recent map comparing references to Bieber and Miley; just as the only places with a real preference for Miley Cyrus were the USA and a smattering of African countries, so too are these the only places with a significant preference for Chuck Norris. Does this mean there is some sort of Chuck-Miley conspiracy afoot? Or that Bieber has taken command of an army of hobbits in his quest for world domination? We'll leave it to you to find out...

[1] See also: How many Justin Biebers could you take in a fight? How many 90 year olds could you take in a fight? How many hipsters could you take in a fight?
[2] The answer is zero.  Because hobbits are actually just fictional characters and Chuck Norris is a real living person. See? Sometimes there are clear and easy answers to tough questions.
[3] Ironically, of course, Kenya seems to display a slight preference for Chuck Norris over hobbits, despite Makmende's imposing presence.

July 08, 2014

A Quick Look at Global Language Patterns on Twitter

Today's post is derived from some testing we were doing within our data on language and since the results were interesting, we thought we'd share. This is a first step of a longer process of comparing language use at the global scale so much remains to be done.

Starting from a 10% sample of all global geotagged tweets from the calendar year 2013, we collected tweets that used a variety of non-Latin characters as a proxy for linguistic prevalence (see the map titles below for the list of characters searched). Using composite counts of what we found to be the five most commonly used characters in each of the given languages, we mapped normalized values at the country level in order to understand where these languages are most dominant. In other words, these maps represent the relative level of tweets containing non-Latin characters compared to all tweets; the US has plenty of tweets with Arabic, Chinese and Korean characters but these numbers are small compared to the overall number of tweets within the country.  

There are some issues with the data we collected -- for instance, we relied on non-definitive sources for our list of the most commonly used characters, and the constraints of the way we've structured our data makes (how we treat boolean queries and computing constraints) make our data somewhat incomplete. But still the initial results provide a reasonable snapshot of where Twitter is being used by people who don't speak languages which can be easily expressed in Latin characters. 

Arabic Characters:   ل   ن   م   ي   ا      

The spatial pattern of Arabic-language tweeting is interesting in that it seems to mimic a conventional distance decay effect. Saudi Arabia is the undoubted center of Arabic tweeting, with its immediate neighbors having relatively lower amounts, with their immediate neighbors having even lower concentrations, with practically no discernible differences once you reach Sub-Saharan Africa to the south, India to the east, or Europe to the north and west.

Chinese Characters:   的   一   是   不   了

While Japan has the highest absolute number of tweets containing Chinese characters, due to the fact that the Japanese language relies on written Chinese characters, the relative measure shows China to, quite unsurprisingly, be the center of Chinese-language tweeting. The territory of Greenland shows up as well, mainly because of the relatively low number of total tweets making the few tweets with Chinese characters relatively more frequent. We could, of course, account for this by requiring certain thresholds but for this initial look, we left it in. Given the increasing dominance of China within the global economy, it's somewhat interesting to see that there is very little Chinese-language tweeting happening in other parts of the world.

Korean Characters:   뭐   그   안   근데   거

The final language we explored was Korean and while it is not surprising that South Korea has by far the most Korean tweeting, it is interesting to note that North Korea, despite its almost complete disconnection to the global system, also appears on the map. Again, it seems that the scattering of relatively high scores for places such as Greenland and Somalia has more to do with the relatively low level of overall tweeting in these places than with some previously unknown concentration of Korean-speakers.

While there's not much definitive here, we believe this to be a useful, if incredibly brief, look at how online spaces such as Twitter remain connected to conventional, offline geographies, such as those of language and culture. And given the recent emergence of domain names in non-Latin characters, these maps might offer clues into the evolving geography of domain names, while also offering some potential for future research using such data.

July 01, 2014

The Drama of Llamas vs. the Gloating of the Goats

It should be no surprise to anyone that we're interested in sheep. But today we want to continue to mine the possibilities of our IronSheep 2014 dataset to bring you an alternative geography of animals as they are discussed and represented in social media [1]. Focusing on the global level and using a 10% sample of all geotagged tweets created between July 2012 and March 2014, we sought out to understand the global distribution of goats as opposed to llamas. 

Because, you know, it's important. Or perhaps because we're a bit bored.

While goats and llamas don't carry the same inherent antagonism as, say, bronies and juggalos [2], we thought it might be interesting to see how the two compare across the world since they are both major competitors to our favored sheep in the world of livestock [3]. At the most general level, llamas are absolutely dominant, with nearly triple the number of tweets as those mentioning goats, with 63,606 references to llamas and 24,322 references to goats. Of course, one does wonder, what all this llama/goat discourse is about? Are people extolling the virtue of their animal, or mentioning a chance sighting, or perhaps talking about what's on for dinner? Or perhaps someone has finally invented a hoof-accessible mobile device and the animals are taking to the net?

In any case, these raw numbers certainly don't tell the whole story, although arguably llamas are much cooler and more interesting than goats, so as to warrant significantly greater tweeting about them.

Global References to Goats and Llamas, July 2012-March 2014

Indeed, by mapping the concentrations of each term relative to the other, we can see that while llamas are dominant overall, their spatial distribution is much more concentrated, while goats, though in smaller numbers, are much more widely dispersed throughout the globe. 

Llamas dominate livestock-related tweeting in Latin America. While perhaps unsurprising given their offline manifestation throughout South America, Spain and Mexico actually have the highest number of both absolute and relative references to llamas, despite neither being a native habitat for the animal. Further, only two countries in the top 20 for relative references to llamas are not predominantly Spanish-speaking: Brazil has 1,189 more references to llamas based on our 10% sample, good for 8th most, while France has 82 more references to llamas, making it the 20th-most llama-est country in the world. Also interesting is the fact that the only three countries in Latin America and the Caribbean which do not favor llamas over goats are not Spanish-speaking: Guyana, Suriname and Haiti.

Meanwhile, the United States and United Kingdom are the only countries worldwide to display significant preference for goats over llamas, with over 10,000 and 3,000 more references respectively, while Nigeria, Canada and Australia all show some moderate preference for goats. The fact that the US also has the fifth-most absolute number of references to llamas just goes to show how much people in the US love their goats. I mean, who doesn't love goats, especially when they sound like humans? Plus, they can eat all of your leftover beer cans!

While much of Africa's preference for goats is also largely unsurprising given that it has some of the highest levels of global goat production next to China and India (which are likely lower on the goat rankings due to linguistic differences), we are somewhat baffled as to why most of Europe has a preference for llamas. But then again, after watching the goat screaming video for awhile it all seems to make sense.

[1] But definitely not an animal geography.
[2] A quick Google search for "goats and llamas" will likely return a number of results for how farmers can use llamas to protect their goat herds. Should these results not show up for you, blame Google and their never-ending drive to collect massive amounts of personal data about you in order to create a personalized experience of the internet for you that never exposes you to such oddities or anything else you might find unseemly.
[3] The less said about cows the better.