Showing posts with label 4th of july. Show all posts
Showing posts with label 4th of july. Show all posts

July 03, 2013

Welcome to 'Merica (or is it 'Murica?)

"Chicken & waffle flavored lays? #Murica."
On this day (well, technically the day before) in which we celebrate our independence from those limey redcoats and their tea-guzzling ways [1], it's time we take on one of the truly great debates tearing at the fabric of our country... 'Merica? or 'Murica?

When dropping the first letter of America (either sarcastically or to preserve our limited supply of vowels), is it more correct to (a) continue as if it were still there and use the term 'Merica? or (b) produce an altogether different word, 'Murica, to express our facetiousness and/or lack of spelling ability?

For instance, the emotionally incensed Twitter user below makes a compelling argument for 'Merica:
"P.s. please stop spelling it #murica or #mericuh or any other variation. It's #MERICA. #northerngirlprobs"
In contrast, this erudite tweeter prefers the more guttural 'Murica spelling:
"I don't know Harry, I heard the French are assholes" true statement. Elated to be back in 'Murica" 
But sadly, there is no consensus around this important issue, which if left unchecked (or at least unmapped) could threaten to undermine the very foundation of the nation. Even more tragic is that someone [2] was so unthoughtful as to bring up this topic on the day in which all 'Mericans/'Muricans should join together in our hatred of everyone who doesn't acknowledge that we're so totally superior to them. As such, we dutifully bring you an investigation of this debate that you may not have even been aware of. You're welcome.

In this endeavor, we collected all geotagged tweets referencing "murica" or "merica" in the United States from July 1, 2012 to June 30, 2013, producing 12,407 references to "murica" and 80,344 references to "merica". If you believe that absolute numbers solve the debate, read no further, as we should obviously err on the side of 'Merica. But if you believe that, you must also believe that "On dit que Dieu est toujours pour les gros bataillons" [3], which we must point out is in FRENCH, and hence your opinion on this day can easily be ignored. Again, you're welcome.

Seeing as there is such a significant preference for 'Merica, we created a normalized measure at the county level to allow for geographic comparison in spite of the massive difference in usage of the terms.  Thus, the maps below illustrate counties' share of tweets for each of the two terms.

For example, Cook County, Illinois had the absolute most tweets for either term, with 201 for "murica" and 782 for "merica". But because its 201 tweets represented 1.6% of all tweets referencing 'Murica, and its 782 were only 0.97% of the tweets refrencing 'Merica, it was determined to have a relatively greater usage of 'Murica, and is shaded as such on the map. So in this first map, the areas that are the darkest shade of red are those places where that place produces a significantly greater share of the overall number of tweets for 'Murica than it does for tweets referencing 'Merica. Confused? You're welcome.

The Misspellings of America

While it might be remarked that this unusual methodology unfairly tilts the linguistic playing field in favor of the much less used 'Murica, we would respond with: who cares? This is our map and we can do what we want with it. Also, we're academics (aka commies) and are totally OK with doing things like changing the rules to benefit the less well-off. Also, note the holiday appropriate color ramp of blues to white to reds. Clever, yes? You're welcome.

As you can see, use of 'Murica tends to be associated with the east and west coasts, with there being fairly little usage of the term, even by relative measures, in the interior of the United States. So it appears that those living in "flyover country" tend to prefer the more simple 'Merica, the coastal elite like to step up their sarcasm an extra notch by exchanging an 'e' for a 'u'.

While some of the country's biggest cities -- Los Angeles, New York City, Chicago, Boston, Phoenix, Minneapolis, Seattle and D.C. -- have a relatively greater amount of 'Murica-ness (or should that be 'Murica-lity), the divide between the two spellings doesn't break down along clear urban/rural lines. Oklahoma City and Indianapolis are two of the biggest users of 'Merica, while parts of the Charlotte and Atlanta metropolitan regions are also on the list of counties who believe that it's spelled 'Merica, not 'Murica.

Indeed, if you further normalize by creating a location quotient -- in effect controlling for absolute size -- a similar picture emerges, albeit one which tends to emphasize the large urban areas much less, regardless of whether they see themselves (or others) as 'Mericans or 'Muricans.

The Misspellings of America (by Location Quotient)

Unlike in the previous map, the most red end of the spectrum here actually shows the places where there is the most parity between the usage of the two spellings, even if there are still a greater number of absolute references to 'Merica than to 'Murica. So less populous counties, or those with many fewer Twitter users, such as Piscataquis County, Maine, with fewer than five or ten overall references to either term, will generally tend to be more red.

But perhaps the most interesting (and actually rather methodologically valid) ways of examining the data is to simply look at a ranked list of the top ten counties for each term. One sees here that the top ten counties for 'Merica are almost exclusively in the South, while the top ten counties for 'Murica are outside the South and within large metropolitan areas. So, our working hypothesis (which we suggest you discuss over beer and burgers on this fine day), is that 'Murica is likely a derivative of 'Merica, used ironically by slow-pour-coffee-drinking, skinny-jean-wearing hipsters in big cities. Our extensive examination of hipsters (n=1) confirms this hypothesis and places the epicenter of this plague somewhere in the Greater Boston area. But you can probably spell it however you'd like.

Happy 4th of July everyone!
-----
[1] No offense intended. Verily, some of the FloatingSheep collective members are British and have yet to make the move to the promised land of 'Merica/'Murica.
[2] That would be us.
[3] "It is said that God is always on the side of the big battalions." -Voltaire

July 04, 2012

Church or Beer? Americans on Twitter

In honor of the anniversary when American colonists kicked out the oppressive British (apologies to Mark and other oppressive Brits) today is the birthday of the United States. Traditionally it is celebrated by attempting to blow up or burn a small part of it with fireworks, and given the dry conditions at the moment, we may very well succeed at this beyond our wildest expectations.

But until #badideaswithfireworks becomes a trending hash tag, we thought we'd use Twitter to explore some of the regional differences that are rending the fabric of society make America great. It also gives us a chance to showcase some of the potential of our nascent DOLLY project (feel free to visit the Knight News Challenge website and comment positively!), which integrates and maps geographic social media and official data sources. DOLLY is still not quite ready for general use, but the backend database is all set which makes it really easy to pull out user generated geocoded data, in this case from Twitter.

So in honor of the 4th of July, we selected all geotagged tweets[1] sent within the continental US between June 22 and June 28 (about 10 million in total) and extracted all tweets containing the word "church" (17,686 tweets of which half originated on Sunday) or "beer" (14,405 tweets which are much more evenly distributed  throughout the week). See below for more technical details[2] or just go straight to the map below to see the relative distribution of the tweets in the U.S.

Relative Number of Tweets containing the terms "church" or "beer" aggregated to the county level, June 22-28, 2012

This map clearly illustrates some fairly big regional divides (more on that in a bit) but it is worth drilling down a bit to see how this plays out at the local level.  San Francisco has the largest margin in favor of "beer" tweets (191 compared to 46 for "church") with Boston (Suffolk county) running a close second. Los Angeles has the distinction of containing the most tweets overall (busy, busy thumbs in Southern California). In contrast, Dallas, Texas wins the FloatingSheep award for most geotagged tweets about "church" with 178 compared to only 83 about "beer."

Of course, since these are tweets, the content is decidedly less spiritual than one might expect given the focus on beer and church.  For example, the most common example of a "church" tweet was simply a report such as "I am at _______ church".  More amusing are what we characterize as "competitive church going" when one person replaces another as the Foursquare "mayor" of a church. "I just ousted Jef N. as the mayor of Dallas Bible Church on @foursquare! 4sq.com/5hNW6x" 

This of course echoes the Sermon on the Mount and the famous verse, "Blessed are those who check in for they shall inherit the badges of righteousness."  Another common category were politically related tweets such as "#ICantDateYou If You Dont Go To Church" or "@____ you're right. It's like separation of church and state. But they really shouldn't be separated. #twitterpolitics". 

Given the cultural content of the "church" tweets, the clustering of relatively more "church" than "beer" content in the southeast relative to the north-east suggests that this could be a good way to identify the contours of regional difference. In order to quantify these splits, we ran a Moran's I test for spatial auto-correlation which proved to be highly significant as well.[3] Without going into too much detail, this test shows which counties with high numbers of church tweets are surrounded by counties with similar patterns (marked in red) and which counties with many beer tweets are surrounded by like-tweeting counties (marked in blue).  Intriguingly there is a clear regional (largely north-south split) in tweeting topics which highlights the enduring nature of local cultural practices even when using the latest technologies for communication.

We also note that this map strongly aligns with the famous 'red state'/'blue state' map from the 2000, 2004, and 2008 elections with a strong "religious right" component in the Southeastern United States (see also The Virtual 'Bible Belt') and a more liberal, or at least beer-tweeting, Northeast and upper Midwest (see also The Beer Belly of America).

In any case, happy 4th of July to our American readership. We hope you enjoy your beer in the north, or your church service if you are tweeting from the south.
----------------------
[1] It is important to note that geotagged tweets are somewhat of an oddity among tweets, as only one to three percent of tweets (depending on the country) are geotagged.  Still a small percentage of a very large number (the total number of tweets) results in a LOT of data.
 
[2] There are a number of technical issues tied to the validity and scale of geography associated with tweets which we won't go into here but it is worth mentioning that we are NOT using user profile locations.  This data is limited to geographic information associated with each tweet, often drawn from a GPS capable device.  While the relevant scale at which analysis can be done differs between tweets about 90 percent of the tweets in this sample are accurate on the city level or lower which works well for this analysis.
 
[3] Based on  IDW matrix for 2.34 decimal degrees (Euclidean distance), this test achieved a z-score of 14.34, implying there is a less than 1% likelihood that this high-clustered pattern could be the result of random chance.