MovieLens 1B is a synthetic dataset that is expanded from the 20 million real-world ratings from ML-20M, distributed in support of MLPerf. Note that these data are distributed as .npz files, which you must read using python and numpy.

The code for the expansion algorithm is available here: https://github.com/mlperf/training/tree/master/data_generation

To create the dataset above, we ran the algorithm (using commit 1c6ae725a81d15437a2b2df05cac0673fde5c3a4) as described in the README under the section “Running instructions for the recommendation benchmark”.

Permalink: https://grouplens.org/datasets/movielens/movielens-1b/