GroupLens would like to congratulate Ed Chi (a Ph.D. graduate from our lab) and Patrick Baudisch (a former visiting graduate student to our lab) on being named ACM Distinguished Scientists! We wish them all the best and are very proud of their continued accomplishments!
GroupLens students and alumni successfully interview at Google on a regular basis. Several current GroupLens students have interned at the company, and our alumni have become Google research scientists and software engineers. I collected the following technical interview preparation tips from Google recruiters and engineers. Please ask your recruiter if you need confirmation of anything below, as the interview structure changes over time. This advice applies to other companies that are interested in similar problems or which hire the same types of engineers. And if you’re interviewing for an internship instead of a full-time job, there will be a different standard; not necessarily harder or easier, just different, based on the type of student you are and your other interests.
Technical interviews are each about 45 minutes long. There is no dress code. You will code on a whiteboard, showing the interviewer your thought process by talking through decisions and assumptions. Occasionally a video chat and collaborative document allows you to interview from a distance, or a piece of paper substitutes for the whiteboard during in-person interviews. Interview topics may cover anything on your résumé, especially where you claim expertise. Fundamental computer science knowledge is required for all engineering roles at Google and will form the basis for almost all interview questions. Google wants to see if you can take a hard, big problem for which you don’t know an obvious solution, and break it down into manageable solvable parts for which you can provide reasonable runtime and space bounds.
For years MovieLens has required users to enter 15 ratings before they are allowed to get personalized recommendations. This design makes sense: how can we make recommendations for a user we know nothing about? That said, we don’t know if this provides users with the best experience. Why should users have to enter fifteen ratings, why not ten or five? What would happen if we let users into the system without any ratings? To answer these questions we need to understand how our algorithms behave for users with very few ratings.
To understand how algorithms behave for users just joining the system, we looked at historic MovieLens ratings. We trained three popular recommender algorithms:
SVD on this rating data. While training, we limited some users to have only a small number of ratings. We used the ratings that were not given to the algorithm to measure several things:
- How accurate are the predictions? Can the algorithm accurately predict the user’s future ratings?
- How good are the recommendations? Does the algorithm suggest movies for the user that the user would like?
- What type of recommendations does the algorithm generate? Is there a good diversity of movies? Are the movies popular, or more obscure?
In this article, we show how to use LensKit to evaluate a recommender written in Python. We wrote this article to help people who want to use LensKit’s built-in evaluation capabilities and comparison algorithms, but don’t want to implement their own algorithms in Java. Evaluating an external recommender — whether in R, Python, or MatLab, involves three primary steps:
- Writing the recommender. We will need a simple recommender written in language other than Java (Python in this case) that can take test data to build up a simple model and generate recommendations for a given list of test users.
- Setting up a shim class. We will need to write a small class that teaches LensKit how to use our external algorithm.
- Setting up LensKit evaluation. Finally we show how we setup an experiment using the shim class in a LensKit eval script to evaluate the external recommender.
Note, that the data we will use to test this recommender is a MovieLens rating dataset. The data consists of movie ratings with each row being <userId,itemId,rating>. You can read more about the dataset here. (more…)