Showing posts with label density. Show all posts
Showing posts with label density. Show all posts

May 11, 2011

The Density of GeoNames

Geonames is a crowdsourced database of geographical names for places. Essentially a crowd-sourced gazetteer which is available for free. A very cool resource. But what really attracted our attention was the map of the density of the features in their database. The dark spots in the map below show places that have very few place names. These are largely uninhabited areas.

BUT geonames also notes that "India (0.0114) and China (0.0113) have surprisingly few GeoNames features compared to their high population density."

Yet another example of the varying density of geo-tagged information on the Internet.




April 12, 2010

Mapping Wikipedia biographies by region

Following on from our last post on mapping Wikipedia biographies, here is a quick overview of biography articles visualised by country. Again, wiki bibliographies are intended to be entries about notable people, you can see more about this definition at our previous post.

The US comes out on top of the list here, with more than half of all biography articles prominently mentioning the country or a location within the country. On the other and of the scale, there are about fifty countries and territories (examples include Bhutan, Gambia and the Maldives) which are prominently mentioned in fewer than ten biographies. It is also interesting to examine the data normalised by the populations of each country (see below).

At the top of the list is the Vatican City. There are 242 articles in which the Vatican City is prominently mentioned for every 1000 people in the country. This isn't really that surprising given the tiny population of the country and the long list of famous people associated with the Vatican. If we remove city-states and island nations from the list, we see that Iceland, Ireland and the UK have the most per capita mentions in Wikipedia biographies (all have roughly three articles in which the country is prominently mentioned per every 1000 people in each country). Also interesting is the fact that Australia, Canada and all of the Nordic countries also have very high numbers of Wikipedia articles mentioning those places per every 1000 people in those countries.

At the other end of the list we have Guinea which has only two mentions in Wikipedia biographies per every one million people in the country (most African countries have similar per capita numbers). China also has an extremely low number of biography articles per capita - although this owes much to the enormous population of China rather than an low total number of mentions in Wikipedia biographies.

These figures are likely skewed by the fact that we are measuring biographies of people from all centuries. This would seem to give higher numbers of articles to parts of the world with thousands of years of recorded history (e.g. the Vatican) rather than places with a smaller amount of recorded history (e.g. New Zealand). That being said, it is interesting that many of the countries in the "New World" (e.g. the US and Canada) are far better represented in Wikipedia than countries like Greece, Iran and Egypt which are home to some of the earliest recorded histories of humankind.

We nonetheless plan to post some more detailed versions of the maps by century in the near future.



See also:

Our post on Mapping Wikipedia Biographies for the methodology that we used to create these maps.

The data used to create these maps were collected by Adrian Popescu and are available here.


April 09, 2010

Mapping Wikipedia Biographies

The map below is a visualisation of references to places within 423,846 biography articles in the English version of Wikipedia. The definition of these bolded terms and the methodology used to obtain these data is discussed in more detail below.


Now compare this map to the below map of actual population density.


The differences are quite astonishing. What one sees is that articles about people in Wikipedia are highly likely to reference particular parts of the world (the US and Western Europe). This is a geography of people that is in no way reflective of the actual distribution of population on our planet.

Of course, because the data only includes biography articles in the English version of Wikipedia it is biased towards English speaking countries. This fact helps explain the concentration of articles that reference the US and the UK. However, language alone does not explain why countries where English is widely used (e.g. India) have a smaller presence than non-English speaking countries in Western Europe.

Most importantly, it is clear that Wikipedia has not yet attained its goal of storing the "the sum of all human knowledge." Wikipedia guidelines specify that biographies should only be about notable people and this map suggests that there are more notable people in Europe and North America (at least in the eyes of Wikipedians). Not to knock our home continents but it seems likely (especially after looking at some of the people deemed "notable") that Wikipedia is simply reflecting its user base who are disproportionally from these places.

In any case it shows that there are likely still a lot of possibilities out there for new Wikipedia articles (despite claims that Wikipedians are running out of new topics to write about).

And in the big picture it again raises questions about who participates in online discussions and what is discussed and documented in these conversations.

The data used to create these maps were collected by Adrian Popescu and are available here for anyone interested in playing with them. The data were actually collected through a rather complicated process that we'll explain below.

First of all, we need to define biography articles; basically, any article about a person in Wikipedia (e.g. Angela Merkel, Ron Jeremy or Gary Brolsma). A list of biographies was created using data harvested from the list of occupations.

We then geolocated each biography article. This was done counting the number of references to place names in each person's biography and then mapping only the most mentioned place in each article. Ranking of placenames was conducted not only using the English version of the article, but also using the equivalent in up to seven languages (English, German, French, Dutch, Spanish, Italian and Portuguese). The thinking behind this method of ranking is simple: the more article versions mention a given location, the more relevant for the concept that location is. We have, however, also done some analysis with the 2nd, 3rd etc. most mentioned places in each article and will be publishing a post on this work soon (along with analysis of Wikipedia data by century and the geography of specific occupations (e.g. artists, politicians and footballers) within the encyclopaedia).

It is clear that this method favors European locations at the expense of places in the rest of the world. Japanese and Arabic Wikipedias, for example, probably have a very different geography (something we are also working on mapping). The fact remains though, that the English language Wikipedia offers us a very particular worldview rather than access to "the sum of all human knowledge" (for the time being at least).

Hmmm....that reminds us, we should start up a Floatingsheep page at Wikipedia some time soon.

See also:

Adrian's analysis of Wikipedia: Adrian Popescu, Gregory Grefenstette Spatiotemporal Mapping of Wikipedia Concepts, JCDL 2010, June 21 - 25, Brisbane, Australia

...and some of our previous work on mapping Wikipedia here.

July 10, 2009

The Virtual ‘Bible Belt’

The size of the dots in this map represents the relative number of mentions of the word “church” in placemarks uploaded to Google. Results for the word “church” have been divided by the "0" and "1" baseline measure (see the last two blog posts), thus highlighting the parts of North America in which mentions of the word “church” are over- and under- represented. Interestingly, while the “bible belt” in the physical world is often talked about as being synonymous with the American South, the virtual “bible-belt” additionally incorporates large parts of the Midwest. Less surprising is the fact that the Northeast and the West have relatively low scores. The GeoWeb is in many ways a mirror (albeit a distorted one) of the physical places that it represents.

June 22, 2009

Information Inequality

Following on from the last post, here are some examples of Google placemark inequality:

First of all, China offers perhaps one of the most striking examples of regional disparities. Beijing, Shanghai, and the Pearl River Delta Region all are characterized by heavy information densities. In other words, a lot of information has been created and uploaded about these places. However, much of the rest of the country has very little cyber-presence within the Google Geoweb. In the map below, the height of each bar is an indicator the number of placemarks in each location.


The U.S.-Mexico border along the Rio Grande river offers a similarly striking contrast between high and low information densities.


The border between North and South Korea offers another example of placemark density not being correlated to population density. For obvious reasons, very little information is being created and uploaded about North Korea. In the map below (top), each dot represents 100+ placemarks. Interestingly, there are strong similarities between the map of placemarks on the Korean Peninsula, and satellite maps of lights visible from the Peninsula at night (bottom).


image source: globalsecurity.org

Information inequalities are clearly a defining characteristic of the Geoweb. Some places are highly visible, while others remain a virtual terra incognita. In particular, Africa, South America, and large parts of Asia are being left out of the flurry of mapping that is happing online (e.g. the Tokyo/Yokohama metro region has almost three times as many 0/1 placemark hits (923,034) as the entire continent of Africa (311,770)). Some of the geographical implications of cyber-visibility and invisibility have been examined in part (e.g. here and here), but there is clearly a lot more to be discussed. In particular, because Google allows any keyword to be searched for (not only "0" and "1"), we are able to explore not only the raw amounts of information attached to each place, but also the contents of that information.

June 15, 2009

Global Placemark Intensity

The following map shows the intensity of google placemarks on a global scale. Using custom-designed software, a dataset was created based on a 1/4 degree grid of all the land mass in the world (roughly 250,000 points). For each point a search was run on the numbers “0” and “1” in order to create a baseline measure of the amount of online geo-referenced content in each place. In the below map, every place with more than 100 placemarks is highlighted with a yellow dot.


The same method was used to create a map that highlights every place on the globe containing more than 1000 placemark hits:


When compared to a map of population density (see the map below), the distinct geographies of placemarks become apparent.

Image source: NASA

These maps suggest that the GeoWeb is far from being a simple mirror of population density or human activity. Online representations of the physical world are highly concentrated in North America, Western Europe, and the more affluent parts of East Asia and Australasia. Maps displaying placemark density on regional and local scales will be explored in more detail in the next post.