Similarity Functions for User-User Collaborative Filtering
By Michael Ekstrand on
Typically, user-user collaborative filtering has used Pearson correlation to compare users. Early work tried Spearman correlation and (raw) cosine similarity, but found Pearson to work better, and the issue wasn’t revisited for quite some time.
When I was revisiting some of these algorithmic decisions for the LensKit paper, I tried cosine similarity on mean-centered vectors (sometimes called ‘Adjusted Cosine’) and found it to work better (on our offline evaluation metrics) than Pearson correlation, even without any significance weighting. So now my recommendation is to use cosine similarity over mean-centered data. But why the change, and why does it work?