February 03, 2011

Wikipedia Demographics

We've written a fair amount about the geographic and linguistic clusters of Wikipedia authors but were reminded today (via New York Times "Room for Debate" forum") that there are plenty of other clusters along social and economic dimensions. Last year a survey of Wikipedia users was conducted which highlights some interesting fissures within the user group.

One of the most provocative findings (and the one highlighted by the New York Times forum) is that less than 15 percent of the regular contributors to Wikipedia are women. This really grabs one's attention but a closer look at the data report (see also here and here) makes us wonder if this figure accurately reflects the Wikipedia community. Some of the questions are:

  • What was the sampling method used? Nothing is listed in the reports.
  • What is the bias in the sample? For example, Russia and Russian speakers are the largest language and country groups represented in the survey even though the Russian section of Wikipedia is only the 8th largest linguistic group. (English, German, French, Italian, Polish, Japanese and Spanish are all larger).
  • Did women have a lower participation rate then men in the survey? There were three times as many male respondents as female respondents. Does this accurately reflect the makeup of the Wikipedia audience? Given the unexpected results for language and country, it is not clear if there might be gender bias as well.
All this said, we find the question of an imbalance in gender participation very intriguing and important. We just don't know if the survey methods used are such that we can be confident in the magnitude of the highlighted differences. Anyone who can shed some light on this would be more than welcome to comment.


  1. The methodology questions is huge. With out understanding how they arrived at the conclusions they did, you almost can't trust the results. I remember at one point they did a similar study and it involved survey research. My recollection was that they had used surveys to arrive at their conclusions. If that's the case, they likely had self pollutions that would probably need to be controlled for.

    To a certain degree, I wonder if they couldn't almost use a methodology like I used at http://ozziesport.com/2010/10/expanded-profile-of-australian-en-wp-users/ to begin to derive an answer, as such a methodology involves the use of public data and would involve getting repeatable results.

  2. This is a fascinating debate about gender and contributions to the internet. While the methods are questionable, I think the results are plausible. What are the demographics of floatingsheep contributors? How often are women represented in contributions to online arguments on other sites or listserves? Perhaps women just don’t feel the need for affirmation of our intellect through responding to opensource calls for information….

  3. Had an Andy Gray thought then pushed it away. There are tons of other sites where most of the community are female ... my guess is that they contribute in other opensource projects -

  4. Laura - thanks for the link to the method. It seems robust especially given the limits to wikipedia data, I just wish we had a better sense of how the "official" survey worked.

    Monica - point well taken! If only the FS collective could hold off on our need for affirmation!

    OLLI - I think it is good question. I'd love to see more work like Laura's on these kind of questions.

  5. It does help in a very definitive way to have such survey results circulated. They provide a fine insight into the workings that take place.


