Thursday, January 21, 2010

My own review of Wikipedia's accuracy

The study in the journal Nature proclaiming Wikipedia's accuracy has been a favorite tool of Wikipedia apologists. Many have pointed out that the study was from the start handicapped in Wikipedia's favor. Since the question of Wikipedia accuracy, or rather, lack thereof, remains an unsettled question in the minds of many, it is necessary to do more studies.

That is why I have decided to do my own study of Wikipedia's accuracy. I hope that by publishing which articles I intend to use for the purpose of my study and clearly stating my methodology, others can see for themselves whether my conclusions are supported by the data or if I have just shaped the data to fit my conclusion.

My first instinct was to use random articles with the "Random article" link provided on all Wikipedia pages. The problem with that is that perhaps the selection of articles would not be as representative as I would like. My next idea was to use "ancient" pages, but the Ancient Pages report has not been refreshed in a long time.

So what I'm going to do is this: on the Main Page, there are eight "portals" listed: Arts, Biography, Geography, History, Mathematics, Science, Society, Technology, and also a link to a list of all the portals. A Wikipedia portal looks much like the Main Page, but all its content is dedicated to a given topic. I don't know if the eight portals I've just listed are the portals that are always listed, but they're the portals listed as of today. From each portal, I'm going to choose either an article that has has barely been edited in a year, or in the case of the "Did you know..." and "On this day..." boxes, articles which have had very few edits for a year prior to being chosen for those features, or I'm going to choose an article that has been nominated for one of those features which has similarly laid unedited for a long time.

It could be argued that this is biased against Wikipedia from the start, since it would enable me to blow mistakes out of proportion saying that they stood uncorrected for months. Maybe so. To compensate, I will completely ignore spelling and grammar errors if they have no impact on the factual accuracy of the article. For example, "Martin Luther King, Jr. was asasssinated in 1968" would be acceptable since anyone reading that would understand what is meant regardless of whether or not they notice the misspelled word. By contrast, "The researcher was injured when a stalagmite that had hung from the ceiling of the cave for centuries fell down" would not be acceptable because by misspelling "stalactite" a factual error has been introduced into the text. (I'm sure stalagmites can fall down, too, but it would be a very different danger).

This is the list of articles I will use for this review:

1. From the Arts Portal: Sculptural Ensemble of Constantin Brâncuşi in Târgu Jiu (two edits in 2009, one of them by a robot).
2. From the Biography Portal: Regimental Sergeant Major (five edits in 2009, none so far this year to date).
3. From the Geography Portal: Tulsa Port of Catoosa (four edits in 2009, none so far this year to date).
4. From the History Portal: Jus exclusivae (five edits in 2009, none so far this year).
5. From the Mathematics Portal: Permutohedron (only one edit in 2009! Could it possibly be any good?)
6. From the Science Portal: Jon Lomberg (five edits in 2009, two so far this year).
7. From the Society Portal: Benjamin Franklin Burch (article created March 5 of last year, edited five more times over the next two months and not again since).
8. From the Technology Portal: GWR 1076 Class (three edits in 2009, two of which were by robots!)

These are all articles I am fairly certain I had never read prior to embarking on this review. I plan to read one of these each week, and examine its factual accuracy while abstaining from commenting on its quality as literature. For each of this I will use the version of the article as it was on the date of this post. It would be just fine if people decided to improve these articles before I got to them in my review. However, it might be possible to defeat my study by falsifying the edit histories for these articles, by dating major improvements prior to the date of this post).

