New MovieLens data set available

By on

A new MovieLens data set was made available today.  Known internally as the 10M100K data set, it contains 10,000,000 movie ratings and 100,000 tags.  Previous MovieLens data sets have all contained user ratings data, but this new set is ten times as large as the last. 

This new release also contains, for the first time, tag data.  Tags are small bits of user generated metadata about movies.  MovieLens first added tagging features two years ago, in January 2006, and has since grown an active movie-tagging community.

Also included in the release is a tool for splitting the ratings data into subsets for cross-validation of prediction algorithms.

The read-me file and the data are available for download on the MovieLens Data Sets page.


By on

An article in Information Week discusses a call from the DoD for proposals to create “virtual parents” to talk to the young children of service men and women while they are deployed, and unable to talk in person.  Though I enjoyed Diamond Age as much as the next person, this idea seems completely bonkers.  Given the limitations on our understanding of AI and child psychology, it seems more likely we’ll do real damage than that we’ll create a positive experience for the child.  This seems to me a great example of the type of research that professionals should just refuse to do.

For the most part I’m a supporter of the view that knowledge for it’s own sake is valuable and should be pursued.  Further, it’s not implausible that eventually we’ll be ready to build applications such as the proposed one.  However, as Catherine Caldwell-Harris, the thoughtful critic quoted in the article, points out, there are plenty of other directions for researchers interested in this problem to pursue in the short-term, many of which are likely to bring short-term benefits, while moving the science forward.  For instance, a researcher might develop a system for teaching a foreign language to a young child.  Simulating a parent seems flat-out dangerous, though!

What do you think?