Technology Review on Wikipedia’s decline

By on

wikipedia decline

The number of active editors is plotted over time for the English Language Wikipedia

 

Tom Simonite at Technology Review just published a great piece covering “The Decline of Wikipedia” where they cite my my work (published in American Behavioral Scientist, see also the free preprint) with GeigerMorgan and Riedl exploring potential reasons for Wikipedia’s declining pool of editors (see figure above).  In that work, we manually categorized newcomers to Wikipedia by the quality of their edits and built a set of models to predict which high quality newcomers would continue editing and which ones would leave the project.  We showed that the reason for the decline is not due to the the quality of newcomers but rather the reception they receive; newcomers whose work is immediately rejected and who are sent warning messages about their behavior don’t come back.  It looks like the dramatic change in 2007 corresponds to the introduction of counter-vandalism robots and automated tools in Wikipedia that were used to reject newcomers’ edits.

(more…)

Similarity Functions for User-User Collaborative Filtering

By on

Typically, user-user collaborative filtering has used Pearson correlation to compare users. Early work tried Spearman correlation and (raw) cosine similarity, but found Pearson to work better, and the issue wasn’t revisited for quite some time.

When I was revisiting some of these algorithmic decisions for the LensKit paper, I tried cosine similarity on mean-centered vectors (sometimes called ‘Adjusted Cosine’) and found it to work better (on our offline evaluation metrics) than Pearson correlation, even without any significance weighting. So now my recommendation is to use cosine similarity over mean-centered data. But why the change, and why does it work?

(more…)

GroupLens to have five papers at CSCW 2014

By on

We’re happy to announce that GroupLens had five papers accepted at ACM CSCW 2014, a high-profile social computing conference:

  • “Specialization, Homophily, and Gender in a Social Curation Site: Findings from Pinterest” – Shuo Chang (GroupLens), Vikas Kumar (GroupLens), Eric Gilbert (Georgia Tech), Loren Terveen (GroupLens)
  • “Managing Political Differences in Social Media” – Catherine Grevet (Georgia Tech), Loren Terveen (GroupLens), Eric Gilbert (Georgia Tech)
  • “Leveraging the Contributory Potential of User Feedback” – Mikhil Masli (GroupLens), Loren Terveen (GroupLens)
  • “Capturing Quality: Retaining Provenance for Curated Volunteer Monitoring Data” – S. Andrew Sheppard (GroupLens), Andrea Wiggins (Cornell University), Loren Terveen (GroupLens)
  • “To Search or to Ask: The Routing of Information Needs Between Traditional Search Engines and Social Networks” – Anne Oeldorf-Hirsch (Northwestern University), Brent Hecht (GroupLens), Merrie Morris (Microsoft Research), Jaime Teevan (Microsoft Research), Darren Gergle (Northwestern University)

Special thanks to our collaborators at Georgia Tech’s comp.social lab, Northwestern’s CollabLabDataONE, and Microsoft Research. Stay tuned for preprints and blog posts on each paper!

Tell Me More: An Actionable Quality Model for Wikipedia

By on

Say you want to contribute to Wikipedia. You sit down in front of your computer after dinner with a nice cup of coffee and wonder what you can do to help. English Wikipedia has an extensive set of cleanup templates that can help you find articles requiring specific improvements. WikiProjects or a tool like SuggestBot can help you find articles related to your interests. We want to combine these and give you a list of interesting articles while at the same time show whether there’s an opportunity for contribution and specific tasks for improving the article.

Some of the recent research examining what makes articles high quality in Wikipedia has taken an editor-based approach, looking at for instance diversity and coordination (Wilkinson and Huberman, 2007)(Kittur and Kraut, 2008), and editor reputation (Adler and de Alfaro, 2007)(Halfaker et al., 2009). While these models of article quality are informative about work practices, they won’t be able to give you straightforward suggestions of what you can do to help on any particular article. Instead we prefer actionable features, those which easily lend themselves to being acted upon by a contributor. Research has shown that the amount of article content has a strong relationship with article quality (see for instance Blumenstock, 2008). This means that “add more” is a reasonable suggestion for improving quality, but we prefer suggestions to cover a variety of tasks, what specific types of more is this article in need of?

(more…)

Kids these days: the quality of new Wikipedia editors over time

By on

I just posted an entry in Wikimedia’s blog explaining part of a study I’m working on with some Wikimedians (Wikipedians working at the Wikimedia Foundation). In response to speculation that the English Wikipedia’s editor decline could be the result of a general decrease in the quality of newcomers to the site, we performed a hand-coded evaluation of the first few edits performed by editors over time.

Overall, we found that the quality of newcomers has not substantially decreased since 2006. While the rate at which these good newcomers have their contributions reverted or deleted has been rising over time, the survival rate of good new editors has been falling. This supports our working hypothesis that the increased rate of rejection for new editors is causally related to the decline in the survival of new editors.

See the full report here: http://meta.wikimedia.org/wiki/Research:Newcomer_quality

This analysis is part of a larger contribution in submission to a special issue of American Behavioral Scientist on Wikis. Stay tuned.