MovieLens Datasets: Context and History

By on

The MovieLens datasets are full of data describing how people rate movies. As it turns out, these datasets have been useful to lots of folks, from recommender systems researchers to the readers of popular-press programming books. Though it is difficult to measure the full extent of the datasets’ impact, we see that they were downloaded more than 140,000 times in 2014, and that the keyword “movielens” currently results in over 8,900 results in Google Scholar.

It is tempting to view these collections of ratings as a cohesive whole. However, the truth of the matter is that the datasets are the product of 17 years of member activity in a web site that has seen its fair share of changes and experimental features. Given the extent of attention — research and otherwise — given to these datasets, it seems worth exploring the relationship between the system and the resulting data.

(more…)

YouthTube: Youth Video Authorship on YouTube and Vine

By on

It’s 2015, do you know what your kids are posting online? Children and teenagers use public video platforms like YouTube and Vine to share their stories. Knowing more about what and how they share could help us design tools that encourage creativity and self-expression while helping young people reflect on online safety and privacy. To find out more about what youth video authors do online, we conducted a study that looked at over 300 recently-shared youth authored videos.

(more…)

Putting Users in Control of their Recommendations

By on

The music services that I subscribe to don’t understand me very well. Pandora, which puts together personalized radio stations, seems to think that I only like the very most popular music, which I don’t. Spotify, which offers a new personalized playlist for me each week, seems to think that I only like quite obscure music. But neither of them get it right, and I wish that I could tell them to change.

screenshots of pandora and spotify
Pandora and Spotify are “black box” recommenders, where it is difficult to know how to act to repair bad recommendations.

(more…)

Wasted Effort and Missed Opportunities: Content Production and Reader Interest in Wikipedia

By on

English Wikipedia's Main pageWikipedia’s best content is mainly where its readers aren’t. For instance, the article about weddings is seen thousands of times every day, yet the community labels it “quite incomplete”, its prose “distinctly unencyclopedic”, and a call for additional sources to verify its content has been featured prominently at the top of the article for over four years. It turns out that this is not uncommon; each month Wikipedia’s articles are viewed billions of times, and over 40% of these views are to articles that would be of significantly higher quality if the encyclopaedia’s contributors followed their readers.

(more…)