The Guardian just published a short post by Mark which looks at the discourses surrounding 'big data.'
In it he argues that:
Gender, geography, race, income, and a range of other social and economic factors all play a role in how information is produced and reproduced. People from different places and different backgrounds tend to produce different sorts of information. And so we risk ignoring a lot of important nuance if relying on big data as a social/economic/political mirror.
We can of course account for such bias by segmenting our data. Take the case of using Twitter to gain insights into last summer's London riots. About a third of all UK Internet users have a twitter profile; a subset of that group are the active tweeters who produce the bulk of content; and then a tiny subset of that group (about 1%) geocode their tweets (essential information if you want to know about where your information is coming from).
Despite the fact that we have a database of tens of millions of data points, we are necessarily working with subsets of subsets of subsets. Big data no longer seems so big. Such data thus serves to amplify the information produced by a small minority (a point repeatedly made by UCL's Muki Haklay), and skew, or even render invisible, ideas, trends, people, and patterns that aren't mirrored or represented in the datasets that we work with.
Big data is undoubtedly useful for addressing and overcoming many important issues face by society. But we need to ensure that we aren't seduced by the promises of big data to render theory unnecessary.
We may one day get to the point where sufficient quantities of big data can be harvested to answer all of the social questions that most concern us. I doubt it though. There will always be digital divides; always be uneven data shadows; and always be biases in how information and technology are used and produced.
And so we shouldn't forget the important role of specialists to contextualise and offer insights into what our data do, and maybe more importantly, don't tell us.
You can check out the full piece here.