Recommender System

methods:

Collaborative filtering: find a group of users presenting similar behavior and use group's behavior to predict. NMF, nearest neighbors, etc.
Content-Based
Hybrid

Modeling

https://datajobs.com/data-science-repo/recommender-systems-%5bnetflix%5d.pdf
user $u$ rated item $i$ with a rating $r_{ui}$, and we assume $r_{u,i}=q^T_ip_u$. $q_i p_u$ are typically short vectors representing items/users in latent space.
therefore the cost function $L = \sum_{u,i} \{(r_{u,i} - q^T_ip_u)^2 + \lambda(||q_i||^2 + ||p_u||^2)\}$ for all known (u,i) pairs.

SGD Optimizing

take the derivative of $L$ wrt $q_i$: $\frac{dL}{dq_i} = -p_u(r_{u,i} - q^T_ip_u) + \lambda q_i$, so the updating rule for $q_i$ is $q_i = q_i +\gamma(p_u(r_{u,i} - q^T_ip_u)-\lambda q_i)$.
We have the error term $e_{u,i} = r_{u,i} - q^T_ip_u$
and the updating rules are $q_i = q_i +\gamma(p_ue_{u,i}-\lambda q_i)$, and $p_u = p_u +\gamma(q_ie_{u,i}-\lambda p_u)$

Alternative Least Square

Since matrix is extremely sparse, when structing the data, only ratings (as well as its user/item) should be stored in memory.

Some results I got on Movielens 1M data (used Java. all took only a few seconds to finish):

plain NMF: initial value for P/Q: 1.0. gamma = 0.005, lambda = 0.00001 #iters: 100 K=20 rmse: 0.8399908248055877
plain NMF: initial value for P/Q: 1.0. gamma = 0.005, lambda = 0.00001 #iters: 100 K=40 rmse: 0.865461767601043
plain NMF: initial value for P/Q: 0.1. gamma = 0.005, lambda = 0.00001 #iters: 100 K=20 rmse: 0.831252749229948
plain NMF: initial value for P/Q: 0.1. gamma = 0.0005, lambda = 0.000001 #iters: 100 K=20 rmse: 0.8208844660311445
plain NMF: initial value for P/Q: 0.1. gamma = 0.0005, lambda = 0.000001 #iters: 200 K=20 rmse: 0.8182223706268318
plain NMF: initial value for P/Q: 0.1. gamma = 0.0005, lambda = 0.000001 #iters: 1000 K=20 rmse: 0.8168452801151522
plain NMF: initial value for P/Q: 0.1. gamma = 0.00005, lambda = 0.0000001 #iters: 1000 K=20 rmse: 0.8184550998312972
plain NMF: initial value for P/Q: 0.1. gamma = 0.00005, lambda = 0.0000001 #iters: 10000 K=20 rmse: 0.8144844568711578

http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html : recommender system with Spark
http://sifter.org/~simon/journal/20061211.html SGD implementation