July 04, 2012

Church or Beer? Americans on Twitter

In honor of the anniversary when American colonists kicked out the oppressive British (apologies to Mark and other oppressive Brits) today is the birthday of the United States. Traditionally it is celebrated by attempting to blow up or burn a small part of it with fireworks, and given the dry conditions at the moment, we may very well succeed at this beyond our wildest expectations.

But until #badideaswithfireworks becomes a trending hash tag, we thought we'd use Twitter to explore some of the regional differences that are rending the fabric of society make America great. It also gives us a chance to showcase some of the potential of our nascent DOLLY project (feel free to visit the Knight News Challenge website and comment positively!), which integrates and maps geographic social media and official data sources. DOLLY is still not quite ready for general use, but the backend database is all set which makes it really easy to pull out user generated geocoded data, in this case from Twitter.

So in honor of the 4th of July, we selected all geotagged tweets[1] sent within the continental US between June 22 and June 28 (about 10 million in total) and extracted all tweets containing the word "church" (17,686 tweets of which half originated on Sunday) or "beer" (14,405 tweets which are much more evenly distributed  throughout the week). See below for more technical details[2] or just go straight to the map below to see the relative distribution of the tweets in the U.S.

Relative Number of Tweets containing the terms "church" or "beer" aggregated to the county level, June 22-28, 2012

This map clearly illustrates some fairly big regional divides (more on that in a bit) but it is worth drilling down a bit to see how this plays out at the local level.  San Francisco has the largest margin in favor of "beer" tweets (191 compared to 46 for "church") with Boston (Suffolk county) running a close second. Los Angeles has the distinction of containing the most tweets overall (busy, busy thumbs in Southern California). In contrast, Dallas, Texas wins the FloatingSheep award for most geotagged tweets about "church" with 178 compared to only 83 about "beer."

Of course, since these are tweets, the content is decidedly less spiritual than one might expect given the focus on beer and church.  For example, the most common example of a "church" tweet was simply a report such as "I am at _______ church".  More amusing are what we characterize as "competitive church going" when one person replaces another as the Foursquare "mayor" of a church. "I just ousted Jef N. as the mayor of Dallas Bible Church on @foursquare! 4sq.com/5hNW6x" 

This of course echoes the Sermon on the Mount and the famous verse, "Blessed are those who check in for they shall inherit the badges of righteousness."  Another common category were politically related tweets such as "#ICantDateYou If You Dont Go To Church" or "@____ you're right. It's like separation of church and state. But they really shouldn't be separated. #twitterpolitics". 

Given the cultural content of the "church" tweets, the clustering of relatively more "church" than "beer" content in the southeast relative to the north-east suggests that this could be a good way to identify the contours of regional difference. In order to quantify these splits, we ran a Moran's I test for spatial auto-correlation which proved to be highly significant as well.[3] Without going into too much detail, this test shows which counties with high numbers of church tweets are surrounded by counties with similar patterns (marked in red) and which counties with many beer tweets are surrounded by like-tweeting counties (marked in blue).  Intriguingly there is a clear regional (largely north-south split) in tweeting topics which highlights the enduring nature of local cultural practices even when using the latest technologies for communication.

We also note that this map strongly aligns with the famous 'red state'/'blue state' map from the 2000, 2004, and 2008 elections with a strong "religious right" component in the Southeastern United States (see also The Virtual 'Bible Belt') and a more liberal, or at least beer-tweeting, Northeast and upper Midwest (see also The Beer Belly of America).

In any case, happy 4th of July to our American readership. We hope you enjoy your beer in the north, or your church service if you are tweeting from the south.
----------------------
[1] It is important to note that geotagged tweets are somewhat of an oddity among tweets, as only one to three percent of tweets (depending on the country) are geotagged.  Still a small percentage of a very large number (the total number of tweets) results in a LOT of data.
 
[2] There are a number of technical issues tied to the validity and scale of geography associated with tweets which we won't go into here but it is worth mentioning that we are NOT using user profile locations.  This data is limited to geographic information associated with each tweet, often drawn from a GPS capable device.  While the relevant scale at which analysis can be done differs between tweets about 90 percent of the tweets in this sample are accurate on the city level or lower which works well for this analysis.
 
[3] Based on  IDW matrix for 2.34 decimal degrees (Euclidean distance), this test achieved a z-score of 14.34, implying there is a less than 1% likelihood that this high-clustered pattern could be the result of random chance.

40 comments:

  1. 1) This is an awesome map
    2) Though the data on a regional basis seems to make sense (correlating with red/blue states from elections, the Bible Belt vs. the Beer Belly), on a smaller level it doesn't seem to make much sense. Largely secular places like DC and Philly have "Much more church" whereas my conservative and religious hometown area of Lancaster County, PA has more beer tweets. It just doesn't seem to make sense....

    ReplyDelete
    Replies
    1. DC and Philly are heavily African-American, are they not? And African-Americans don't tend to be "largely secular" at all...

      Delete
    2. To folllow up on my other reply, it's not just DC and Philly either ... Cook County (Chicago), Cleveland, Cincinnatti - all places with a high African-American population and "much more church" tweets.

      You can sort of see the "black belt" (running through the Carolinas, central Georgia, central Alabama and curving up into northern Mississippi) show up with a reddish emphasis too.

      So that does kind of make sense ... you just have to replace "Beltway pundit" as stereotypical DC resident in your mind's eye with the more demographically correct image of a lower-income black family.

      Delete
    3. You should also keep in mind that the data reflects those who take to social media. A particular place may be largely secular, or largely religious, but that may not necessarily reflect on those users who are also on twitter, talking about it.

      Delete
    4. Philly is actually shown there in blue. That big red spot just southwest of Philly is Delaware County. In contrast to Philly, Delco is by vast majority white and a traditional conservative stronghold.

      Delete
  2. You may be assuming that the population in DC is both homogenous, and similar to our stereotype of the DC dweller (beltway types). This group may not be the majority of Church tweeting twitter users. Aditionaly beltway types may self censor, and limit twitter mentions of Beer, while enhancing their public profile with respect to Church.
    Conversly, in Lancaster County, it may be just the Beer types who feel they have to brag about it.

    ReplyDelete
    Replies
    1. Larry,

      But if there is perceived cultural/social pressure regarding the content of tweets, isn't that a pretty good indicator of some kind of cultural gap? Hence it seems your argument reinforces the notions of the cultural contour laid out in the second graph.

      Delete
  3. Do references to Juke Joints count towards "beer?"

    ReplyDelete
  4. Cool idea, but I've never understood people making maps of things like tweets or internet votes on the US county level. If some of the biggest cities in the US (SF & Dallas) only had ~250 tweets in total, how few did all the hundreds of smaller counties have?? As an example, how many tweets did Allen Parish Louisiana have? With a population of less than 30,000, I seriously doubt you had enough tweets that you can confidently say that county is predominantly "more beer" than church.

    Another similar issue - I don't like quantitative-based choropleth maps of the US by county either. Why should counties that are huge area-wise but tiny population-wise get a bigger representation on the map? For example, why should Loving County, TX with a population of 82 be shown as 30x as large as New York County (Manhattan), while it has almost 20,000x the population? I think it would be better to at least show a cartogram as an alternative representation.

    ReplyDelete
    Replies
    1. Good point about the sample size.

      As for the quantitative-choropleth problem, this data is not quantitative, but ratio, as it is number of beer tweets/number of church tweets. Although choropleth maps, as a general rule, are not supposed to be used for count data because they can be misleading exactly for the reason you described (despite the fact that you, an informed viewer of the map, are aware of the problem), this map doesn't fit in to that category.

      Delete
  5. Have a question on the Moran's I test: It seems like the test is fairly useless at the county level once you go west of the Rockies and have counties the size of small states with lower population. Would it by possible to run the same analysis by zip code, or do the tweets only resolve tot eh county level?

    ReplyDelete
    Replies
    1. At the zipcode level, sample sizes may become to small to statistically infer anything. But I hear you about the Moran's-I... the size of the counties themselves are spatially autocorrelated. It would be better to use a method that is based on contiguity rather than distances between centroids.

      Delete
  6. Although it isn't called out in the legend, one might assume the gray color in the Moran's I map is being used for counties with No Data, as in the first map.
    However, there seem to be more gray counties in the Moran's map. How come?

    ReplyDelete
  7. What limiting distance did you use for your analysis? If you used ArcMap and left the threshold variable blank, it uses a distance that guarantees each point has a neighbor. This biases the analysis towards clustering (where clustering may not exist).

    Data collected at the local (person) level should be analyzed at the local level. Aggregation averages out the variance, which I'm sure you know. If you must aggregate, a smaller aerial unit (nearest neighbor distance x nearest neighbor distance?) would provide a better understanding of the clustering.

    Also, comparing the tweet locations to all of America, where tweeting doesn't occur, gives alot of background noise to this analysis. If a tweet never occurs at location X, is it really viable to compare it with locations that do tweet?

    Still, I think it's a great concept to identify political/social issues using social media messages and content analysis.

    ReplyDelete
  8. Brilliant post. If I might ask, what program did you use to create the maps? I have some similar data I have been experimenting with mapping on Google Maps, but this format is much more appealing.

    ReplyDelete
  9. How much work at providing context for the quotes was there? Do quotes about Eric Church count? You may have mistaken country music fans for churchgoers.

    ReplyDelete
  10. Hello!

    I am the Watercooler editor at Before It's News (beforeitsnews.com). Our site is a rapidly growing people-powered news platform currently serving over 3 million visitors a month. We like to call ourselves the "YouTube of news."

    We'd love to republish your RSS feed on our site, with a link back to yours. Our visitors would enjoy your content and getting to know you.

    It's a great opportunity to spread the word about your work and reach new fans. Posting on Before It's News is 100% free.

    Looking forward to hearing from you!

    Best regards,
    Sebastian Clouth
    SClouth@beforeitsnews.com

    ReplyDelete
  11. Hey, could you guys display this data in an area cartogram?

    ReplyDelete
  12. Hi all,

    Thanks for your comments. This is simply an aggregate of tweets that include the words "beer" and "church" normalized to a 0-1 scale without any contextual indicators. So, if somebody tweeted the word "brewski" instead of "beer," the tweet was not counted, and if they used the word "church" to refer to a musician, they were counted for "church." Similarly it's likely that Burlington, Vermont shows more "church" because of tweets referring to the "Church Street Marketplace" in downtown Burlington, having nothing to do with churchgoers.

    As for the spatial statistics questions--the Moran's-I is less useful of a measure when there are giant gaps between areal units, so using zip codes would have resulted in many 0-values for zip codes with no tweets. The size of counties is spatially autocorrelated, as is population density in the United States. As there were not very many tweets in the western United States during the one-week sample, limiting analysis to a spatial regime of the east coast would be more explanatory. The Moran's-I was considering a threshold of 2.34 decimal degrees (roughly 75-miles in mid-latitudes) to determine clustering.

    ESRI ArcMap, Adobe Illustrator, and GeoDa were used for this analysis.

    And yes, we could display this data in a cartogram, thanks for the suggestion @captainentropy.

    I hope that helps, please comment if you have more questions!

    -Monica

    ReplyDelete
  13. Fantastic stuff, Monica. You and your flock have made some righteous maps.

    Good on you for using a week's worth of data. It'd also be interesting to see the temporal component in an animated map. Do Southerners tweet about beer on Friday night, I wonder, or is it all Jack Daniels & coke? Do Bostonians, with so much good beer available, tweet about church, too, on Sundays?

    Keep up the good work.

    ReplyDelete
  14. What happened to Hawaii and Alaska?

    ReplyDelete
  15. Twitter is banned there.

    ReplyDelete
  16. So this is up on CNN right now.

    http://religion.blogs.cnn.com/2012/07/09/study-people-tweet-more-about-church-than-beer/?hpt=hp_t2

    ReplyDelete
  17. It seems a bit odd that "church," a general term for a place of worship traversing a few religions such that it might be used by non-Christians, is compared to "beer," a specific term for one particular kind of alcohol. Regional variations may persist in types of alcohol and may provide further input in the alcohol versus place of worship mentions - q.v. http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/3871bf4c231d11e095f3000255111976/versions/1 regarding stark differences in beer and wine consumptions by state (of course, consumption may not correlate well with mentions).

    Also in some regions brand names for beer may be much higher and may account for some important numbers - e.g., "Bud" may be more common in the south and midwest, for all we know, and might boost "beer" mentions dramatically.

    It's a fun little exercise but the terms used are not really directly comparable in their "width" and what we can surmise is therefore rather limited, if at all useful.

    ReplyDelete
  18. Being a lifelong Geographer (U of TN/Knoxville 78) I stumbled across you site and have really enjoy some of the stuff you have done to make maps fun, which they should be.

    ReplyDelete
  19. What do LL, LH, HL, and HH stand for?

    ReplyDelete
  20. We have a "Churchkey Beer" Microbrewery in Ontario...how would that register?? Cool concept, though...

    http://www.churchkeybrewing.com/

    ReplyDelete
  21. The southern beer drinkers must not be using Geotagging because they are afraid of being caught by the church-goers.

    ReplyDelete
  22. ► Very interesting study...
    We have republished the Merica Dan's post on CNN and redirected this post to our friends. Congratulations!

    ► http://www.adventistreport.com/2012/07/study-people-tweet-more-about-church.html

    ReplyDelete
  23. As a pastor of a Lutheran "church", and knowing that Lutherans as a whole are OK with beer (hey, Luther's wife Katie ran the brewrey), I can't help but wonder...

    How many tweets mentioned both church AND beer, if any?
    Silly curiousity, at least!

    ReplyDelete
  24. In defense of Dallas....there is a nightclub called the "Church".

    ReplyDelete
  25. The best beers in the world are made by churches!

    ReplyDelete
  26. What do LL, LH, HL, and HH stand for?

    facebook Emoticons

    ReplyDelete
  27. Interesting note: 2 of the most liberal areas in the country on here are red - Burlington, VT area and Boulder, CO area. They both have 'church streets' which are where a ton of bars and restaurants are.

    ReplyDelete
  28. And what about those of us who worship at the church of beer??

    ReplyDelete
  29. I live in Burlington, VT, a town with a lot of college kids and a good number of bars per capita in a state with some good microbrews and a lot of non religious liberal types. Made me wonder why its county is red.
    Then I realized that a lot of these bars in burlington are on church street. ;)

    ReplyDelete
  30. Very nice map and analysis. The only thing I would suggest is that when doing any spatial analysis that uses distance to use a projection that uses a linear unit (i.e. feet or meters) since as you note decimal degrees change with latitude. It probably wouldn't effect these results too much, but it can matter if you are using euclidean distance in your analysis.

    ReplyDelete
  31. Something interesting...not sure if you have heard this yet, but my county (Chittenden, VT, home of Burlington) has more church mentions because the main drag in town is Church St., not for religious reasons!

    ReplyDelete
  32. I just hope the beer tweets were more for microbrews, IPAs, pilsners, etc rather than the generic types (Bud, Miller, Coors)....that would be progress! - can you do further research and publish that graph please?

    ReplyDelete
  33. I imagine that the results might have been quite different for Napa County, CA if the comparison had been "wine vs. church" instead.

    ReplyDelete

Note: only a member of this blog may post a comment.