November 14, 2011

Mapping Wikipedia Globally

Wikipedia is an incredibly impressive coming-together of human labour on a scale that the world rarely sees. Over the last few years, we've also seen a few maps of the encyclopedia (including some work on this blog) which have shown that the project is far from complete (whatever that might mean).

That doesn't mean we should stop mapping the project though, and as part of a multi-year project to study Wikipedia in the Middle East, North Africa, and East Africa, we present this global-scale maps of every article in the November 2011 version of the English Wikipedia.
The English encyclopedia is by far the largest, and currently hosts almost 700,000 geotagged articles (click on the image for a larger and more detailed version):

Each one of these yellow dots represents human effort that has gone into describing some aspect of a place. The density of this layer of information over some parts of the world is astounding. Some of our future posts will look more closely at measures of inequality in Wikipedia, but it is still hard not to be awed by this cloud of information about hundreds of thousands of events and places around the globe.What we can also do is compare the English Wikipedia to the Arabic, French, Hebrew, and Swahili versions (these languages are chosen because they are the subject of the research project mentioned above).

This map should be interpreted with caution for a few reasons. First, it only displays content from six Wikipedias (there are currently 282 of them). Second, many articles in multiple languages appear in the same place. The reason for this is that they are articles about the same feature, event, or place: albeit in different languages. This means that when mapping those features, the dots in each language will show up on the map in exactly the same place. As such, we get a lot of overlapping dots. And dots that higher up in the legend will then necessarily show up on top of others.

The map still remains useful to show some of the different geographical foci of different linguistic groups. In Iran, for instance, there are more articles in Persian than any other languages in our sample. We see more articles about Quebec and parts of North Africa in French, and then a complicated mix of Arabic, Hebrew, English and French in the Levant.Nonetheless it remains that there are far more English language articles than articles in any other language. As such, it remains that if your primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world inevitably looks very different to you than if you were accessing knowledge through the English Wikipedia. There are far more absences and many parts of the world simply don't exist in the representations that are available to you.


  1. I am wondering why there seem to be a lot of Persian articles in the Texas area.

  2. It seems to me that the English language articles should be plotted on the bottom to not obscure the less-frequent dots of the other colors...

  3. Also, can we someday get an interactive map where the dots link to the articles? I'd like to know what the scattering of dots throughout the worlds' oceans are...they're not all islands, are they?

  4. Great work, but why not bind the points to country (or lower) level boundaries and compare the number of articles? I did this on a national level for three languages (German, Portuguese, Spanish) for the same data dumps, and I think it makes the pattern of self-bias very clear. It doesn't look nearly as cool as your map, however.
    Keep up the great work.


Note: only a member of this blog may post a comment.