# Recommender Systems

Prem Melville, Machine Learning, IBM T.J. Watson Research Center, USA.

Vikas Sindhwani, Machine Learning, IBM T.J. Watson Research Center, USA.

# Bibtex

@incollection{melville:2010,
author    = {Prem Melville and Vikas Sindhwani},
title     = {Recommender Systems},
booktitle = {Encyclopedia of Machine Learning},
year      = {2010},
pages     = {829-838},
ee        = {http://dx.doi.org/10.1007/978-0-387-30164-8_705},
crossref  = {DBLP:reference/ml/2010},
bibsource = {DBLP, http://dblp.uni-trier.de}
}


# Contents

1. Definition
2. Motivation and Background
3. Structure of Learning System
1. Collaborative filtering
2. Neighborhood-based Collaborative filtering
3. Item-based Collaborative filtering
4. Significance Weighting
5. Default Voting
6. Inverse User Frequency
7. Case Amplification
8. Model-based Collaborative filtering
9. Content-based Recommending
10. Hybrid Approaches
11. Evaluation Metrics
12. Challenges and Limitations

# Notes

## Definition

The goal of a recommender system is to generate meaningful recommendations to a collection of users for items or products that might interest them. The design of such recommendation engines depends on the domain and the particular characteristics of the data available.

Collaborative filtering systems analyse historical interactions alone, while Content-based filtering systems are based on profile attributes. In content filtering, recommendations are not "collaborative" in the sens that suggestions made to a user do not explicitly utlise information across the entire user-base.

## Structure of Learning System

A typical case is shown in figure below:

 X 1 2 ... i ... m 1 5 3 1 2 2 2 4 ... 5 u 3 4 2 1 ... 4 n 3 2
 a 3 5 ? 1

In this figure, the task is to predict the missing rating r(a, j) for the active user a.

• Collaborative filtering (CF): a user is recommended items based on the past ratings of all users collectively.
• Content-based recommending: recommend items that are similar in content to items the user has liked in the past, or matched to predefined attributes of the user.
• Hybrid approaches: combine both collaborative and content-based approaches.

### Collaborative Filtering

Collect user feedback in the form of ratings for items and exploiting similarities in rating behaviour. CF methods can be subdivided into neighborhood-based (or memory-based) and model-based approaches.

### Neighborhood-based collaborative filtering

Scenario is the following:

1. Assign a weight to all users with respect to similarity with the active user.
2. Select k users that have the highest similarity with the active user - commonly called the neighborhood.
3. Compute a prediction from a weighted combination of the selected neighbors' ratings.

wa,u is a measure of similarity between the user u and the active user a.

The most common measure of similarity is the Pearson correlation coefficient:

$w_{a, u} = \frac{ \sum_{i \in I} ( r_{a, i} - \overline{r}_{a}) (r_{u, i} - \overline{r}_{u})}{ \sqrt{ \sum_{i \in I} (r_{a, i} - \overline{r}_{a})^2 \sum_{i \in I} (r_{u, i} - \overline{r}_{u})^2 } }$

In step 3, predictions are computed as weighted average of deviation from the neighbor's mean, as in:

$p_{a, i} = \overline{r}_a + \frac{ \sum_{u \in K} ( r_{u, i} - \overline{r}_{u}) \times w_{a, u} }{ \sum_{u \in K} w_{a, u} }$

Where pa,i is the prediction for the active user a for item i, wa,u is the similarity between users a and u, and K is the neighborhood or set of similar users.

Another method used is the cosine similarity, by treating ratings of two users as a vector in an m-dimensional space:

$w_{a, u} = \cos (\vec{r}_a, \vec{r}_u) = \frac{ \vec{r}_a . \vec{r}_u }{ \| \vec{r}_a \| _2 \times \| \vec{r}_u \| _2}$

$w_{a, u} = \frac{ \sum_{i=1}^m r_{a, i} r_{u, i} }{ \sqrt{\sum_{i=1}^m r_{a, i}^2} \sqrt{\sum_{i=1}^m r_{u, i}^2} }$

Other similarity measures are: Spearman rank correlation, Kendall's τ correlation, mean squared differences, entropy, and ajusted cosine similarity[1][2].

Item-based collaborative filtering

When applied to large volume previous algorithms do not scale well. [3] [4]proposed item-to-item collaborative filtering, where rather than matching similar users, they match a user's rated items to similar items.

In this approach, similarities between pairs of items i and j are computed off-line using Pearson correlation, given by:

$w_{i, j} = \frac{ \sum_{u \in U} (r_{u, i} - \overline{r}_i) (r_{u, j} - \overline{r}_j) }{ \sqrt{\sum_{u \in U} (r_{u, i} - \overline{r}_i)^2} \sqrt{\sum_{u \in U} (r_{u, j} - \overline{r}_j)^2} }$

Where U is the set of all users who have rated both items, ru,i is the rating of user u for item i and $\overline{r}_i$ is the average rating of the ith item across users.

Rating for item i for user a can be predicted using a simple weighted average, as in:

$p_{a, i} = \frac{ \sum_{j \in K} r_{a, j} w_{i, j} }{ \sum_{j \in K} |w_{i, j}| }$

Where K is the neighborhood set of the k items rated by a that are most similar to i.

Significance weighting

If active user has few co-rated items with other users, these neighbors may be bad predictors. One approach is to multiply the similarity weight by a significance weighting factor, which devalues the correlations based on few co-rated items[1].

Default voting

Another approach to dealing with correlations based on very few co-rated items is to put a default value for non-rated items[5].

Inverse User Frequency

Items that have been rated by all (and unviersally liked or disliked) are not as useful as less common items. Breese[5] introduced the notion of inverse user frequency, which is computed as:

fi = log n / ni, where ni is the number of users who have rated item i out of te total number of n users. Multiply original rating for i with fi.

Case amplification

In order to favor users with high similarity to the active user, Breese[5] changed the original weighting with:

w'a,u = wa,u. | wa,u | ρ − 1

where ρ is the amplification factor, and $\rho \geq 1$.

### Model-based collaborative filtering

Model-based techniques provide recommendations by estimating parameters of statistical models for user ratings. Billsus and Pazzani[6] build a classifier for each active user.

Latent factor and matrix factorization models[7]

Matrix factorization techniques are a class of widely successful latent factor models where users and items are simultaneously represented as unknown feature vectors $w_u, h_i \in \Re^k$ along k latent dimensions. These feature vectors are learnt so that inner products $w_u^T h_i$ approximate the known preference rating ru,i with respect to some loss measure.

$J(W, H) = \sum_{(u, i) \in L} (r_{u, i} - w_i^T h_i ) ^2$

Where $W = [w_1 \dots w_n]^T$ is an n x k matrix, $H = [h_1 \dots h_m]$ is a k x m matrix, and L is the set of user-item pairs for which the ratings are known.

Standard optimization procedures include gradient-based techniques, or procedures like alternating least squares [...]. Fixing either W or H turns the problem of estimating the other into a weighted linear regression task. By minimizing the objective function:

$J(W, H) + \gamma \|W\|^2 + \lambda \|H\|^2$

Once W, H are learnt, the product WH provides an approximate reconstruction of the rating matrix from where recommendations can be directly read off.

The last advances in the field include:

• The use of additional user-specific and item-specific parameters to account for systematic biases in the ratings such as popular movies receiving higher ratings on average.
• Incorporating tomporal dynamics of rating behaviour by introducing time-dependent variables.

The nature of data has a great impact on recommender systems: like/dislike gives more knowledge than statistics about orders, for example, as orders do not reflect what people disliked.

### Content-based Recommending

Pure collaborative filtering recommenders utilize the user rating matrix, either directly, or to induce a collaborative model, without regard to the specifics of individual users of items.

Content-based recommenders rather provide recommendations by comparing representations of content describing an item to representations of content that interests the user.

Many approaches have treated the problem as an Information Retrieval task, by querying the user attributes (preferences).

An alternative is to treat recommending as a classification task: each example represents the content of an item, and past ratings are used as labels for these examples.

Mooney and Roy[8] have applied naïve Bayes classifier. Other classification algorithms include k-nearest neighbor, decision trees, and neural networks (Pazzani & Billsus, 1997[9]).

### Hybrid approaches

To leverage the strengths of content-based and collaborative recommenders, both methods can be mixed:

• Allow both filtering methods to produce separate ranked lists of recommendations, and then merge their results[10], optionally with an adaptive weigthed average combination[11]. Melville[12] proposed a general framework for content-boosted collaborative filtering, where content-based predictions are applied to convert a sparse user ratings matrix into a full ratings matrix, and then apply CF method to provide recommendations.

[13] even improved this method by using a stronger content-predictor, TAN-ELR, and unweighted Pearson collaborative filtering.

Several hybrid approaches treat recommending as a classification task, and incorporate collaborative elements in this task. Basu et al.[14] use a rule induction system to learn a function that predicts if a user will like or dislike a movie. They combine collaborative and content information to produce features such as users who liked movies of genre X.

Popescul[15] incorporate a three-way co-occurrence data among users, items, and item content. Schein[16] focus on making recommendations for items that have not been rated by any user.

### Evaluation metrics

The quality of a recommender system can be evaluated by comparing recommendations to a test set of known user ratings. The most commonly used metric in the literature is Mean Absolute Error (MAE) defined as the average absolute difference between predicted ratings and actual ratings, given by:

$MAE = \frac{ \sum_{\{u, i\}} | p_{u, i} - r_{u, i} | }{ N }$

Where pu,i is th predicted rating for user u on item i, ru,i is the actual rating, and N is the total number of ratings in the test set.

Another commonly used metric is the Root Mean Squared Error (RMSE):

$RMSE = \sqrt{\frac{ \sum_{\{u, i\}} (p_{u, i} - r_{u, i})^2 }{ N }}$

Other predictive accuracy metrics include:

• Precision,
• Recall,
• Area Under the ROC Curve (AUC),
• F1-measure,
• Pearson's product-moment correlation, Kendall's τ, Mean Average precision, half-life utility, and normalized distance-based performance measure[17].

# References

1. 1.0 1.1 Herlocker, J., Konstan, J., Borchers, A., & Riedl, J.T. (2004). An algorithmic framework for performing collaborative filtering. In Proceedings of 22nd international ACM SIGIR conference on research and development in information retrieval, Berkeley, California (pp. 230-237). New York: ACM.
2. Su, X., & Koshgoftaar, T.M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009, 1-20.
3. Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7 (1), 76-80.
4. Sarwar, B., Karypis, G., Konstan, J., & Reidl, J. (2001). Item-based collaborative filtering recommendation algorithms. In WWW '01: Proceedings of the tenth international conference on World Wide Web (pp. 285-295). New York: ACM. Hong Kong.
5. 5.0 5.1 5.2 Breese, J.S., Heckerman, D., & Kadie, C. (July 1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the fourteenth conference on uncertainty in artificial intelligence, Madison, Wisconsin.
6. Billsus, D., & Pazzani, M.J. (1998). Learning collaborative information filters. In Proceedings of the fifteenth international conference on machine learning (ICML-98), Madison, Wisconsin (pp. 46-54). San Fransisco: Morgan Kaufmann.
7. Bell, R., Koren, Y., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. IEEE Computer 42(8): 30-37.
8. Mooney, R. J. & Roy, L. (June 2000). Content-based book recommending using learning for text categorization. In Proceedings of the fifth ACM Conference on digital libraries, San Antonio, Texas, (pp. 195-204).
9. Pazzani, M.J., & Billsus, D. (1997). Learning and revising user profiles: The identification of interesting web sites. Machine Learning, 27(3), 313-331.
10. Cotter, P., & Smyth, B. (2000). PTV: Intelligent personalized TV guides. In Twelfth conference on innovative applications of artificial intelligence, Austin, Texas (pp. 957-964).
11. Claypool, M., Gokhale, A., & Miranda, T. (1999). Combining content-based and collaborative filters in an online newspaper. In Proceedings of the SIGIR-99 workshop on recommender systems: algorithms and evaluation.
12. Melville, P., Mooney, R.J., & Nagarajan, R. (2002). Content-boosted collaborative filtering for improved recommendations. In Proceedings of the eighteenth national conference on artificial intelligence (AAAI-02), Edmonton, Alberta (pp. 187-192).
13. Su, X., Greiner, R., Khoshgoftaar, T.M. & Zhu, X. (2007). Hybrid collaborative filtering algorithms using a mixture of experts. In Web Intelligence (pp. 645-649).
14. Basu, C., Hirsh, H., & Cohen, W. (July 1998). Recommendation as classification: Using social and content-based information in recommendation. In Proceedings of the fifteenth national conference on artificial intelligence (AAAI-98), Madison, Wisconsin (pp. 714-720).
15. Popescul, A., Ungar, L., Pennock, D.M., & Lawrence, S. (2001). Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the seventeenth conference on uncertainity in artificial intelligence. University of Washington, Seattle.
16. Schein, A.I., Popescul, A., Ungar, L.H., & Pennock, D.M. (2002). Methods and metrics for cold-start recommendations. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 253-260). New York: ACM. Tampere, Finland.
17. Herlocker, J.L., Konstan, J.A., Terveen, L.G., & Riedl, J.T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5-53.