In this paper we challenge the standard ways of how text-based recommender systems are trained and evaluated. We highlight the necessity of focusing on long-tail users and items, as those are the cases where the text-based prediction can potentially win over collaborative filtering methods. We also raise concerns of choosing datasets and data preparation for recommender training and evaluation. Finally, we reconsider the issue of how recommenders are evaluated, and propose drilling-down into different groups of users and items, as well as search-based evaluation as an alternative to solely measuring a global metric for context-free test points.
Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum. Unveiling Challenging Cases in Text-based Recommender Systems. In Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2023), September 19th, 2023, co-located with the 17th ACM Conference on Recommender Systems, Singapore, Singapore.
Recommender systems have achieved impressive results on benchmark datasets. However, the numbers are often influenced by assumptions made on the data and evaluation mode. This work questions and revises these assumptions, to study and improve the quality, particularly for the difficult case of search-based recommendations. Users start with a personally liked item as a query and look for similar items that match their tastes. User satisfaction requires discovering truly unknown items: new authors of books rather than merely more books of known writers. We propose a unified system architecture that combines interaction-based and content-based signals and leverages language models for Transformer-powered predictions. We present new techniques for selecting negative training samples, and investigate their performance in the underexplored search-based evaluation mode.
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. Search-based Recommendation: the Case for Difficult Predictions. In WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023, pages 318–321, April 2023.
Poster can be downloaded here.
The experiments for this work are carried out on Amazon reviews for the book domain. The original dataset is obtained from the UCSD repository at https://nijianmo.github.io/amazon/index.html, based on the work of Jianmo Ni, Jiacheng Li, Julian McAuley: Justifying recommendations using distantly-labeled reviews and fined-grained aspect, EMNLP 2019.
Here we share the pre-processed user-item sets and their train-dev-test splits under the constraint that train and test samples have disjoint authors for each user. Further, we document the test-time selection of positive and negative datapoints, for three different modes of evaluation (standard, profile-based, and search-based).
Note that we avoid redundant data sharing, and publish only user and item identifiers that link to the original dataset where the full information is given. User-item reviews are in the "reviews" file, and item metadata (including title, category, and description) are in the "metadata" file in the original dataset repository.
|Data split with disjoint authors per user:|
|Negative test points for different evaluation modes:|
Code repository: https://github.com/ghazalehnt/SBR-Framework-difficult-case-study
Prior work on personalized recommendations has focused on exploiting explicit signals from user-specific queries, clicks, likes and ratings. This paper investigates tapping into a different source of implicit signals of interests and tastes: online chats between users. The paper develops an expressive model and effective methods for personalizing search-based entity recommendations. User models derived from chats augment different methods for re-ranking entity answers for medium-grained queries. The paper presents specific techniques to enhance the user models by capturing domain-specific vocabularies and by entity based expansion. Experiments are based on a collection of online chats from a controlled user study covering three domains: books, travel, food. We evaluate different configurations and compare chat-based user models against concise user profiles from questionnaires. Overall, these two variants perform on par in terms of NCDG@20, but each has advantages on certain domains.
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. You Get What You Chat: Using Conversations to Personalize Search-based Recommendations. In Proceedings of the 43nd European Conference on Information Retrieval (ECIR 2021), pages 207-223, March 2021.
The dataset contains user data (filled questionnaires, chats, assessments) is downloadable here:
YGWYC_dataset_012021.zip (released on January 2021)
Please refer to the README for more details.
This data is licensed under Creative Commons BY-NC 4.0.
Prior work on personalizing web search results has focused on considering query-and-click logs to capture users' individual interests. For product search, extensive user histories about purchases and ratings have been exploited. However, for general entity search, such as for books on specific topics or travel destinations with certain features, personalization is largely underexplored. In this paper, we address personalization of book search, as an exemplary case of entity search, by exploiting sparse user profiles obtained through online questionnaires. We devise and compare a variety of re-ranking methods based on language models or neural learning. Our experiments show that even very sparse information about individuals can enhance the effectiveness of the search results.
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. Personalized Entity Search by Sparse and Scrutable User Profiles. Proceedings of the Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2020), pages 427-431, March 2020.