Skip to content

[QST] DLRM predictive performance vs older models (e.g., XGBoost) #1247

@RyanZotti

Description

@RyanZotti

❓ Questions & Help

Details

How does DLRM compare to models like XGBoost in terms of predictive performance in ranking in recommender systems? Are deep learning methods consistently superior? I've been able to replicate AUROC of DLRM on Criteo, and got around 0.79 AUROC. I then trained the model on my employer's proprietary dataset, and got an even better AUROC value of 0.88. However, NDCG@5 on the internal dataset shows that DLRM (NDCG@5 of 0.04) does substantially worse than XGBoost (NDCG@5 of 0.19) and worse than our basic manual heuristics ranker (NDCG@5 of 0.17). Our internal dataset uses similar features to those found in Criteo: user ID embeddings, product ID embeddings, dense features, etc.

Is this more common across the industry than I realized? I would have expected DLRM to do substantially better given that it uses deep learning. However, after doing more research it seems recent recommender systems challenges on Kaggle show mostly gradient boosting models amongst the winners. Nvidia researchers seemed to notice a similar trend here and here. Is this still the case?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions