-
Notifications
You must be signed in to change notification settings - Fork 54
Description
❓ Questions & Help
Details
How does DLRM compare to models like XGBoost in terms of predictive performance in ranking in recommender systems? Are deep learning methods consistently superior? I've been able to replicate AUROC of DLRM on Criteo, and got around 0.79 AUROC. I then trained the model on my employer's proprietary dataset, and got an even better AUROC value of 0.88. However, NDCG@5 on the internal dataset shows that DLRM (NDCG@5 of 0.04) does substantially worse than XGBoost (NDCG@5 of 0.19) and worse than our basic manual heuristics ranker (NDCG@5 of 0.17). Our internal dataset uses similar features to those found in Criteo: user ID embeddings, product ID embeddings, dense features, etc.
Is this more common across the industry than I realized? I would have expected DLRM to do substantially better given that it uses deep learning. However, after doing more research it seems recent recommender systems challenges on Kaggle show mostly gradient boosting models amongst the winners. Nvidia researchers seemed to notice a similar trend here and here. Is this still the case?