MovieLens 1B Synthetic Dataset

MovieLens 1B is a synthetic dataset that is expanded from the 20 million real-world ratings from ML-20M, distributed in support of MLPerf. Note that these data are distributed as .npz files, which you must read using python and numpy.

README
ml-20mx16x32.tar (3.1 GB)
ml-20mx16x32.tar.md5

The code for the expansion algorithm is available here: https://github.com/mlperf/training/tree/master/data_generation

To create the dataset above, we ran the algorithm (using commit 1c6ae725a81d15437a2b2df05cac0673fde5c3a4) as described in the README under the section “Running instructions for the recommendation benchmark”.

Permalink: https://grouplens.org/datasets/movielens/movielens-1b/

MovieLens 1B Synthetic Dataset

Datasets