An analysis of the 2014 RecSys Challenge

AutorDaniele Loiacono, Andreas Lommatzsch, and Roberto Turrin
QuelleProceedings of the 2014 Recommender Systems Challenge 

The RecSys challenge 2014 focuses on the engagement generated by the tweets posted by the users of the IMDb application for smartphones. Such engagement depends on attributes concerning: the user who posts the message (e.g., his role in the social network), the tweet content (e.g., the rating), and the movie object of the tweet (e.g., the popularity of the movie). In this work we provide an analysis of the dataset and of the task to help participants better understand the challenge. Furthermore, we propose a baseline prediction algorithm. We split our analysis into three stages: (i) data enrichment, (ii) knowledge extraction, and (iii) engagement prediction.
Initially, we enriched the dataset with additional movie attributes extracted from IMDb and Freebase. Successively, we analyzed the statistics of the main tweet attributes and we used some machine learning techniques to extract some knowledge from the data. Finally, we defined a predictor on the basis of the main outcomes of the previous analyses. We define a linear regression model on attributes such as: user rating score, the presence of mentions, and whether the tweet is a retweet or it has already been retweeted. Such predictor led to an nDCG@10 equals to 0.835187.