[QST] DLRM predictive performance vs older models (e.g., XGBoost)

# ❓ Questions & Help

## Details

How does DLRM compare to models like XGBoost in terms of predictive performance in ranking in recommender systems? Are deep learning methods consistently superior? I've been able to replicate AUROC of DLRM on Criteo, and got around 0.79 AUROC. I then trained the model on my employer's proprietary dataset, and got an even better AUROC value of 0.88. However, NDCG@5 on the internal dataset shows that DLRM (NDCG@5 of 0.04) does substantially worse than XGBoost (NDCG@5  of 0.19) and worse than our basic manual heuristics ranker (NDCG@5  of 0.17). Our internal dataset uses similar features to those found in Criteo: user ID embeddings, product ID embeddings, dense features, etc.

Is this more common across the industry than I realized? I would have expected DLRM to do substantially better given that it uses deep learning. However, after doing more research it seems recent recommender systems challenges on [Kaggle](https://www.kaggle.com/competitions?tagIds=13311-Recommender+Systems) show mostly gradient boosting models amongst the winners. Nvidia researchers seemed to notice a similar trend [here](https://docs.nvidia.com/deeplearning/performance/recsys-best-practices/index.html) and [here](https://dl.acm.org/doi/10.1145/3415959.3416001). Is this still the case?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] DLRM predictive performance vs older models (e.g., XGBoost) #1247

❓ Questions & Help

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] DLRM predictive performance vs older models (e.g., XGBoost) #1247

Description

❓ Questions & Help

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions