Personalized Search and Recommendation

CUP: a Framework for Resource-Efficient Review-Based Recommenders

Recommender systems perform well for popular items and users with ample interactions (likes, ratings etc.). This work addresses the difficult and underexplored case of users who have very sparse interactions but post informative review texts. This setting naturally calls for encoding user-specific text with large language models (LLM). However, feeding the full text of all reviews through an LLM has a weak signal-to-noise ratio and incurs high costs of processed tokens. This paper addresses these two issues. It presents a light-weight framework, called CUP, which first computes concise user profiles and feeds only these into the training of transformer-based recommenders. For user profiles, we devise various techniques to select the most informative cues from noisy reviews. Experiments, with book reviews data, show that fine-tuning a small language model with judiciously constructed profiles achieves the best performance, even in comparison to LLM-generated rankings.

Publication
Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum, Andrew Yates. CUP: a Framework for Resource-Efficient Review-Based Recommenders. In Proceedings of the 47th European Conference on Information Retrieval (ECIR'25). Pages 360-375. April 2025.

STAR: Sparse Text Approach for Recommendation

In this work we propose to adapt Learned Sparse Retrieval, an emerging approach in IR, to text-centric content-based recommendations, leveraging the strengths of transformer models for an efficient and interpretable user-item matching. We conduct extensive experiments, showing that our LSR-based recommender, dubbed STAR, outperforms existing dense bi-encoder baselines on three recommendation domains. The obtained word-level representations of users and items are easy to examine and result in over 10x more compact indexes.

Publication
Anna Tigunova, Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. STAR: Sparse Text Approach for Recommendation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM'24). October 2024.

SIRUP: Search-based Book Recommendation Playground

This work presents a playground platform to demonstrate and interactively explore a suite of methods for utilizing user review texts to generate book recommendations. The focus is on search-based settings where the user provides situative context by focusing on a genre, a given item, her full user profile, or a newly formulated query. The platform allows exploration over two large datasets with various methods for creating concise user profiles.

Please try it at: sirup.mpi-inf.mpg.de

Publication
Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum. SIRUP: Search-based Book Recommendation Playground. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM'24). Pages 1062–1065. March 2024.

Unveiling Challenging Cases in Text-based Recommender Systems

In this paper we challenge the standard ways of how text-based recommender systems are trained and evaluated. We highlight the necessity of focusing on long-tail users and items, as those are the cases where the text-based prediction can potentially win over collaborative filtering methods. We also raise concerns of choosing datasets and data preparation for recommender training and evaluation. Finally, we reconsider the issue of how recommenders are evaluated, and propose drilling-down into different groups of users and items, as well as search-based evaluation as an alternative to solely measuring a global metric for context-free test points.

Publication
Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum. Unveiling Challenging Cases in Text-based Recommender Systems. In Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2023), September 19th, 2023, co-located with the 17th ACM Conference on Recommender Systems, Singapore, Singapore.

Please find the teaser video here.
The experimental data can be downloaded here.

Search-based Recommendation: the Case for Difficult Predictions

Recommender systems have achieved impressive results on benchmark datasets. However, the numbers are often influenced by assumptions made on the data and evaluation mode. This work questions and revises these assumptions, to study and improve the quality, particularly for the difficult case of search-based recommendations. Users start with a personally liked item as a query and look for similar items that match their tastes. User satisfaction requires discovering truly unknown items: new authors of books rather than merely more books of known writers. We propose a unified system architecture that combines interaction-based and content-based signals and leverages language models for Transformer-powered predictions. We present new techniques for selecting negative training samples, and investigate their performance in the underexplored search-based evaluation mode.

Publication
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. Search-based Recommendation: the Case for Difficult Predictions. In WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023, pages 318–321, April 2023.

Poster can be downloaded here.

Data

The experiments for this work are carried out on Amazon reviews for the book domain. The original dataset is obtained from the UCSD repository at https://nijianmo.github.io/amazon/index.html, based on the work of Jianmo Ni, Jiacheng Li, Julian McAuley: Justifying recommendations using distantly-labeled reviews and fined-grained aspect, EMNLP 2019.

Here we share the pre-processed user-item sets and their train-dev-test splits under the constraint that train and test samples have disjoint authors for each user. Further, we document the test-time selection of positive and negative datapoints, for three different modes of evaluation (standard, profile-based, and search-based).
Note that we avoid redundant data sharing, and publish only user and item identifiers that link to the original dataset where the full information is given. User-item reviews are in the "reviews" file, and item metadata (including title, category, and description) are in the "metadata" file in the original dataset repository.

Data split with disjoint authors per user
train_ids.csv
validation.csv
test.csv

Negative test points for different evaluation modes
test_negatives_standard_evaluation_100.csv (14G)
test_negatives_profile_based_evaluation_100.csv (14G)
test_negatives_search_based_evaluation_100.csv (20G)

Code repository: https://github.com/ghazalehnt/SBR-Framework-difficult-case-study

You Get What You Chat: Using Conversations to Personalize Search-based Recommendations

Prior work on personalized recommendations has focused on exploiting explicit signals from user-specific queries, clicks, likes and ratings. This paper investigates tapping into a different source of implicit signals of interests and tastes: online chats between users. The paper develops an expressive model and effective methods for personalizing search-based entity recommendations. User models derived from chats augment different methods for re-ranking entity answers for medium-grained queries. The paper presents specific techniques to enhance the user models by capturing domain-specific vocabularies and by entity based expansion. Experiments are based on a collection of online chats from a controlled user study covering three domains: books, travel, food. We evaluate different configurations and compare chat-based user models against concise user profiles from questionnaires. Overall, these two variants perform on par in terms of NCDG@20, but each has advantages on certain domains.

Publication
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. You Get What You Chat: Using Conversations to Personalize Search-based Recommendations. In Proceedings of the 43nd European Conference on Information Retrieval (ECIR 2021), pages 207-223, March 2021.

Dataset
The dataset contains user data (filled questionnaires, chats, assessments) is downloadable here: YGWYC_dataset_012021.zip (released on January 2021)
Please refer to the README for more details. This data is licensed under Creative Commons BY-NC 4.0.

Personalized Entity Search by Sparse and Scrutable User Profiles

Prior work on personalizing web search results has focused on considering query-and-click logs to capture users' individual interests. For product search, extensive user histories about purchases and ratings have been exploited. However, for general entity search, such as for books on specific topics or travel destinations with certain features, personalization is largely underexplored. In this paper, we address personalization of book search, as an exemplary case of entity search, by exploiting sparse user profiles obtained through online questionnaires. We devise and compare a variety of re-ranking methods based on language models or neural learning. Our experiments show that even very sparse information about individuals can enhance the effectiveness of the search results.

Publication
Ghazaleh Haratinezhad Torbati, Andrew Yates, Gerhard Weikum. Personalized Entity Search by Sparse and Scrutable User Profiles. Proceedings of the Fifth ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2020), pages 427-431, March 2020.