The map below is a visualisation of references to places within 423,846 biography articles in the English version of Wikipedia. The definition of these bolded terms and the methodology used to obtain these data is discussed in more detail below.
Now compare this map to the below map of actual population density.
Source: Wikipedia Commons
The differences are quite astonishing. What one sees is that articles about people in Wikipedia are highly likely to reference particular parts of the world (the US and Western Europe). This is a geography of people that is in no way reflective of the actual distribution of population on our planet.
Of course, because the data only includes biography articles in the English version of Wikipedia it is biased towards English speaking countries. This fact helps explain the concentration of articles that reference the US and the UK. However, language alone does not explain why countries where English is widely used (e.g. India) have a smaller presence than non-English speaking countries in Western Europe.
Most importantly, it is clear that Wikipedia has not yet attained its goal of storing the "the sum of all human knowledge." Wikipedia guidelines specify that biographies should only be about notable people and this map suggests that there are more notable people in Europe and North America (at least in the eyes of Wikipedians). Not to knock our home continents but it seems likely (especially after looking at some of the people deemed "notable") that Wikipedia is simply reflecting its user base who are disproportionally from these places.
In any case it shows that there are likely still a lot of possibilities out there for new Wikipedia articles (despite claims that Wikipedians are running out of new topics to write about).
And in the big picture it again raises questions about who participates in online discussions and what is discussed and documented in these conversations.
The data used to create these maps were collected by Adrian Popescu and are available here for anyone interested in playing with them. The data were actually collected through a rather complicated process that we'll explain below.
First of all, we need to define biography articles; basically, any article about a person in Wikipedia (e.g. Angela Merkel, Ron Jeremy or Gary Brolsma). A list of biographies was created using data harvested from the list of occupations.
We then geolocated each biography article. This was done counting the number of references to place names in each person's biography and then mapping only the most mentioned place in each article. Ranking of placenames was conducted not only using the English version of the article, but also using the equivalent in up to seven languages (English, German, French, Dutch, Spanish, Italian and Portuguese). The thinking behind this method of ranking is simple: the more article versions mention a given location, the more relevant for the concept that location is. We have, however, also done some analysis with the 2nd, 3rd etc. most mentioned places in each article and will be publishing a post on this work soon (along with analysis of Wikipedia data by century and the geography of specific occupations (e.g. artists, politicians and footballers) within the encyclopaedia).
It is clear that this method favors European locations at the expense of places in the rest of the world. Japanese and Arabic Wikipedias, for example, probably have a very different geography (something we are also working on mapping). The fact remains though, that the English language Wikipedia offers us a very particular worldview rather than access to "the sum of all human knowledge" (for the time being at least).
Hmmm....that reminds us, we should start up a Floatingsheep page at Wikipedia some time soon.
Adrian's analysis of Wikipedia: Adrian Popescu, Gregory Grefenstette Spatiotemporal Mapping of Wikipedia Concepts, JCDL 2010, June 21 - 25, Brisbane, Australia
...and some of our previous work on mapping Wikipedia here.