SBR Data

Amazon Books Review Dataset

The experiments for this work are carried out on Amazon reviews for the book domain. The original dataset is obtained from the UCSD repository at https://nijianmo.github.io/amazon/index.html, based on the work of Jianmo Ni, Jiacheng Li, Julian McAuley: Justifying recommendations using distantly-labeled reviews and fined-grained aspect, EMNLP 2019.

Here we share the pre-processed user-item sets and their train-dev-test splits under the constraint that train and test samples have disjoint authors for each user. Further, we document the test-time selection of positive and negative datapoints, for three different modes of evaluation (standard, profile-based, and search-based).
Note that we avoid redundant data sharing, and publish only user and item identifiers that link to the original dataset where the full information is given. User-item reviews are in the "reviews" file, and item metadata (including title, category, and description) are in the "metadata" file in the original dataset repository.

Data split with disjoint authors per user:

train_ids.csv

validation.csv

test.csv

Negative test points for different evaluation modes:

test_negatives_standard_evaluation_100.csv (14G)

test_negatives_profile_based_evaluation_100.csv (14G)

test_negatives_search_based_evaluation_100.csv (20G)

Code repository: https://github.com/ghazalehnt/SBR-Framework-difficult-case-study