Wikipedia’s best content is mainly where its readers aren’t. For instance, the article about weddings is seen thousands of times every day, yet the community labels it “quite incomplete”, its prose “distinctly unencyclopedic”, and a call for additional sources to verify its content has been featured prominently at the top of the article for over four years. It turns out that this is not uncommon; each month Wikipedia’s articles are viewed billions of times, and over 40% of these views are to articles that would be of significantly higher quality if the encyclopaedia’s contributors followed their readers.
In our research paper soon to appear at the 9th International AAAI Conference on Web and Social Media, we studied article popularity across four of Wikipedia’s large and successful language editions: English, French, Russian, and Portuguese. We found a huge difference between the interest of Wikipedia’s readers and where the encyclopaedia’s contributors produce high-quality content. This is, perhaps accidentally, exemplified by the English Wikipedia’s main page, seen in top right image. The right column (in blue) offers encyclopaedic context to current and topical news, while the left column (in green) is dominated by a Featured Article, an article that has gone through an intensive review process where experienced Wikipedia contributors ensure that they are of the highest quality. There are around 4,500 such Featured Articles in the English Wikipedia. Yet, only 218 of them (5%) were also among the most popular articles. This small overlap is visualised in the Venn diagram on the right.
You might expect to find Featured Articles on some of the more popular topics, such as Email, the Vietnam War, or the article we introduced earlier, Wedding. However, none of these specific topics are labelled as more than average quality. Instead, almost half of all the Featured Articles are not particularly popular, making Featured Articles about a niche topic a common occurrence in the encyclopedia. In the chart below we attempt to map out how this disconnect impacts Wikipedia’s readers. Wikipedia’s most popular articles are incredibly popular—the 4,500 most popular English articles get on average over 4,000 views/day, making them over 1,000 times more popular than the bottom two million articles, which average 3 views/day. The result is that articles that are of significantly lower quality than their readership suggests, the rightmost column in purple colour, account for only 2.3% of all articles, but 42.7% of all article views. This amounts to 2 billion potentially-improved article views every month.
Low-popularity Featured Articles land on the opposite side of the spectrum in the diagram, Excessive Quality (leftmost, green colour). As we can see from the diagram, they only account for 2.3% of all article views. In the middle column are articles where readership and quality are balanced. Two articles that fall into this category are Barack Obama, which is both high-quality and very popular, and Speech balloon, a medium-quality and medium-popularity article. We can also see that this category makes up the majority of all articles (64.2%, in parentheses below the category name), but this category is dominated by all four language editions having huge numbers of low-quality articles (e.g. stubs, those articles which are “too short to provide encyclopedic coverage of a subject”) with correspondingly low popularity.
Popularity changes over time, sometimes very quickly. In our study we found that this affects many of the most popular articles, but not the majority of them. We mentioned previously that Wikipedia’s main page devotes a column to current news, offering contextualized links. It is perhaps this column that contributed to the sudden rise in popularity of the article for the film “The World According to Garp,” as Robin Williams’ death occurred during the course of our study. As might be expected, the article for “The World According to Garp” was more than 50 times more popular the day after Williams’ death, than the day before. However, a slight majority of articles considered popular (53.6%, to be exact), are always popular, or in a stable state of high demand. This means that Wikipedia contributors face two challenges if they want to provide quality where their readers are. The first challenge is to ensure that those always-popular articles are also of the highest quality. The second challenge is to keep track of popularity spikes such as those of “The World According to Garp” and have contributor resources available that can quickly produce quality content (research by Brian Keegan suggests that some Wikipedia contributors are “ambulance chasers”, but a more organised effort could be necessary).
While our study argues that Wikipedia contributors should produce content where their readers are, it is worth noting that popularity does not equal encyclopaedic importance. There are other perspectives that should be considered as well, and Wikipedia contributors have several different approaches to defining what is important. One project works on creating published versions of the encyclopaedia, and has a specific set of criteria for selecting articles. Another project is working on defining what the core set of encyclopaedic articles should be. Lastly, there are WikiProjects, topic-based volunteer projects with Wikipedia contributors as members. Each of these WikiProjects have their own list of which articles are important within that topic. What our work has shown is that overall, over 40% of Wikipedia’s enormous audience is presented content of insufficient quality, and Wikipedia’s contributors have an opportunity to make a significant impact by focusing their attention on the same things as their readers.
For more details about or study and its results, please see our paper, to be published in the proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM) and presented at the conference in Oxford, England, May 26-29, 2015.
This blog post has been written by Morten Warncke-Wang (firstname.lastname@example.org, nettrom@twitter), in collaboration with Daina Davenport-Fey.