floatingsheep: Mapping the Eastern Kentucky Earthquake

Last week's post on racist tweets in the wake of the US presidential election received much more attention than we ever expected. A number of questions about and critiques of our method were raised, which we attempted to respond to in a special FAQ with the post (first time we had to do that). Nonetheless, we thought it might be useful to demonstrate the utility of our technique on a less controversial subject in order to demonstrate how we can leverage a relatively small number of geocoded tweets in order to understand particular offline phenomena, and maybe even assuage some concerns about such an approach.

The 4.3 magnitude earthquake that occurred on Saturday, November 10th around 12:08pm EST, about eight miles west of Whitesburg, Kentucky, provides just such an example. Given our own connections to Kentucky, and the significant number of our own friends and family who tweeted or updated their statuses about the earthquake, we were naturally interested in what we might be able to bring to such an analysis.

But before showing our own results, it is useful to note that the US Geological Survey also collects user-generated data on earthquakes through their "Did You Feel It?" reporting system in which individuals contribute their location and experience with quake. The USGS then aggregates these reports into a crowd sourced map like the one below in order to visualize an approximation of how the earthquake was experienced in different locations.

Rather than use such a direct system of user-generated data collection, we fired up DOLLY in order to gather geocoded tweets referencing the earthquake in its immediate aftermath. We were able to collect 795 geotagged tweets referencing "earthquake" from 12:08pm -- where the first tweet we uncovered near Hyden in Leslie County, KY simply said "EARTHQUAKE HOLY SHAT" -- until around 4:05pm in an area comprising most of central and eastern Kentucky, southern Ohio, West Virginia, southwest Virginia, western North Carolina and east Tennessee (we limited our query based on a bounding box drawn around the epicenter of the quake).

This area includes several cities such as Louisville and Lexington in Kentucky and Knoxville, TN, as well as many more rural areas. As much of our earlier work has clearly shown, population centers typically possess a greater level of online activity simply by virtue of population size, so it was important to look beyond just the raw numbers of earthquake-related tweeting. Therefore, in order to normalize the data, we also collected a 1% sample of all geotagged tweets from the month of October within in the same area. This totaled 30,699 tweets, which we used to normalize the tweets about the earthquake and construct a location quotient measurement in exactly the same way as with the racist tweet analysis [1]. We again aggregated from individual tweets to a larger areal unit, in this case, counties.

First and foremost, though we did not use an entirely contiguous area, it is easy to notice that our map roughly conforms with the map of crowdsourced reports from the USGS, generally confirming the relevance of a relatively small set of user-generated data to understanding such an event.

Second, by looking at the blue dots representing each individual tweet, we can see concentrations within the counties containing the largest cities in the specified search area. These include Knox Co., TN (Knoxville), Jefferson Co., KY (Louisville), Fayette Co., KY (Lexington), Madison Co., KY (Richmond), and Cabell Co., WV (Huntington). None of these localities are particularly close to the epicenter of the quake in eastern Kentucky, but are more likely is a product of the higher population in these cities (increasing the likelihood that Twitter users would feel the quake and take to Twitter to report it), as well as their importance as regional centers with close social and economic connections to eastern Kentucky.

Third, and interestingly enough, there were only six counties where there were more earthquake tweets than there were tweets within the given 1% sample from October [2]. Leading this group of counties is Letcher County, where the earthquake epicenter was located. Letcher County also has a location quotient of nearly 100, indicating the fact that the earthquake generated a much greater than average number of tweets in Letcher County than one would expect on average. Each of the other counties, though possessing many fewer tweets both in the earthquake and reference datasets, are also located in close proximity to Letcher County and the epicenter of the earthquake. These include Bath Co., KY, Leslie Co., KY, Polk Co., TN, Johnson Co., TN and Rockingham Co., VA.

We can also look at patterns of tweets without aggregating to an administrative unit. In this case, we estimate the intensity of the earthquake tweet pattern (again normalized for what would be expected based on a random sample of tweets) in the region using Gaussian kernel smoothing. Interestingly, the 'epicenter' of earthquake tweets is only 6.7 miles away from the real epicenter of the earthquake (indicated by the red star). Not coincidentally, the center of intensity of our tweet map is located in the nearby town of Hazard, KY, which has a higher population density (resulting in more twitter users) than the more rural town of Whitesburg, the epicenter as measured by the USGS.

Ultimately, these results are not necessarily surprising, as they indicate both the extremely localized nature of a phenomenon like reporting an earthquake as evidenced by the greater location quotient values nearer the epicenter, as well as the essentially networked nature of such phenomena mediated by the internet in the clustering of user-generated internet content in cities quite distant from the earthquake's origin.

From a methodological standpoint, it shows that the fairly simple technique of calculating location quotients, or even the more involved technique of Gaussian kernel smoothing, can provide powerful ways of uncovering the spatial dimensions of online reflections of essentially offline phenomena.

We hope that this example -- which uses about the same number of tweets (particularly relative to the number of administrative units) as our racist tweets map -- will help alleviate some of the methodological concerns raised in our previous post.
---------
[1] The equation used to calculate the location quotient is as follows:

# of tweets referencing "earthquake" per county / total # of tweets referencing "earthquake"

------------------------------------------------

# of reference tweets per county / total # of reference tweets

[2] We should note that this doesn't mean that there were more earthquake-related tweets in the given time period on Saturday than total tweets in the entire month of October. Rather, this simply represents an indicator of how many earthquake-related tweets there were relative to the expected amount of content in that place.