November 05, 2012

Can Twitter Predict the US Presidential Election?

Can Twitter predict the outcome of tomorrow's US presidential election? If the results of our preliminary analysis are anything to go by, then Barack Obama will be easily re-elected. The data presented below, including all geocoded tweets referencing Obama or Romney between October 1st and November 1st, out of a sample of about 30 million, give some insight into the visibility of each of the candidates on Twitter.


We see that if the election were decided purely based on Twitter mentions, then Obama would be re-elected quite handily. In fact, the only states in the electoral college that Romney would win are Maine, Massachusetts, New Mexico, Oregon, Pennsylvania, Utah, and Vermont. Romney also wins in the District of Colombia, and we unfortunately didn't collect data on Alaska or Hawaii. Some of the results seem to be interesting reflections of social and political characteristics of particular places. It makes sense that Romney has captured more of the public imagination in Utah, likely due to the state's considerable conservatism and large Mormon population, and Massachusetts, the state that he governed not all that long ago.

However, this drubbing that Romney receives in the Twitter electoral college belies the close nature of the final popular (Twitter) vote, re-raising the issue of whether the electoral college is the most suitable means of deciding the country's political future. There are a total of 132,771 tweets mentioning Obama and 120,637 mentioning Romney, giving Obama only 52.4% of the total and Romney 47.6%, a breakdown that is remarkably similar to current opinion polls, though not reflected when looking at the state-level aggregations in absolute terms. If you want to explore the data in more detail, please play around with the interactive map below:


We can also visualize the data using a sliding scale, so as to see how close the margin of victory is for each candidate in a given state.


Romney's largest margins of victory are in Pennsylvania and Massachusetts, while Obama's largest victories are in California and, strangely, Texas. The cases of Massachusetts and Texas, not to mention large portions of the south and plain states, likely point to the fact that many references on Twitter would tend to be negative.

It is also worth noting that we compared Twitter mentions of both Vice-Presidential candidates: Biden and Ryan. Ryan, interestingly, wins the head-to-head competition in every single state. This makes for a rather boring map, so we decided to instead compare references to Ryan and Romney in the map below (Romney shaded in grey for his ebullient personality, and Ryan in pink as a result of his staunch support for gay rights).


As might be expected, there are more references to Romney in most states (Kansas, Michigan, North Dakota, Rhode Island, South Dakota, and Vermont being the exceptions here). However, when looking at total references, we again don't see a large gap between the two men. Ryan has 94,707 tweets compared to Romney's 120,637.

What do these data really tell us? Ultimately, I doubt that they will accurately predict the election, as Obama's seeming victory in Texas or Romney's in Massachusetts will almost certainly not come to pass. But they do certainly reveal that many internet users in California, Texas, and much of the rest of the country for that matter, tend to talk more about Obama than Romney. And, of course, in order to truly equate tweets with votes, we would need to employ sentiment analysis or manually read a large number of the election-related tweets in order to figure out whether we are seeing messages of support or more critical posts, as has been done in a couple of interesting projects by Twitter available here and here and another project by Esri available here.

Maybe the most revealing aspect of these data is that the 'popular vote' is split between the two candidates. While the social and political data shadows that we are picking up may not accurately tell us much about the electoral college results, when aggregated across the country they may be a rough indicator of tomorrow's outcome, pointing to the more-or-less equal and evenly divided nature of the American two-party political system. While this work may seem like a contemporary attempt at soothsaying, something we tend to shy away from, the data more appropriately serve as a useful benchmark in order to allow us to analyze what social media data shadows might actually reflect, as no matter the level of participation, they remain distorted mirrors on the offline material world.

4 comments:

  1. Just curious on how you gather the Paul Ryan tweets. Was there a concerted effort to narrow "Ryan" tweets to those referencing Paul Ryan or did you just straight up search for tweets with "Ryan" in them? I only ask because, and this is speaking as a Ryan, it's a fairly common name.

    Nonetheless, cool stuff!

    ReplyDelete
  2. One of the interesting trends we've seen in similar data is that citizens like to bad mouth the candidate they dislike more than they support the candidate we like. Especially in polarized states like Texas and Massachusetts. So Obama gets mentioned heavily in Texas because there is a lot of hating going on and the same for Romney in Massachusetts. Sentiment is super tricky to do well but important in this particular application of Twitter data. That said the only good approaches I've seen I've seen is using humans to score data via mechanical turk then base an algorithm off that like a naive Bayesian classifier to do sentiment well.....

    ReplyDelete
  3. This is so flawed! Vermont was the first state to give its electoral votes to Obama, and has the highest % (67%) for Obama than any other state. (DC was in the 90s but it is not a state.) So what does this information really tell us????

    ReplyDelete
    Replies
    1. I don't think its any good as a predictor, but it least serves as an indicator of who uses Twitter.

      Delete

Note: only a member of this blog may post a comment.