Amazon Dataset

Date Updated: 10/22/2014 (Please check back regularly for next 3-4 days on update to data)

OVERVIEW

Data provided with this set contains Amazon product details approximately 400,000 products ranging across ~40,000 categories

https://s3.amazonaws.com/stanford-project/amazon_products.gz​

Data provided with this set contains all amazon product heirarchies (This can be used as reference)

https://s3.amazonaws.com/stanford-project/p1/AmazonHeirarchy.json

From your linux machine, you can uncompress the file with 'unzip' command. Mac users can double click to open the file.

Note: We are currently working with SNAP to convert their bigger amazon dataset (2.5M products) and label them hierarchically so that we have more testing/traning data. This is expected to be done beofore this weekend.

DATA DESCRIPTION

{
  "PrunedEditorialReviews": [
    {
      "Content": "GE's NSF-certified MWF replacement refrigerator water filter is the improved version of the GWF model, providing you and your loved ones cleaner, healthier, and better-tasting drinking water at home. By reducing contaminants like mercury, toxaphene, p-dichlorobenzene, carbofuran, alachlor, benzene, lead, cryptosporidium, and giardia, the MWF is a safe and affordable way to contribute to a healthy lifestyle. Newly tested and verified to filter 5 trace pharmaceuticals including ibuprofen, progesterone, atenolol, trimethoprim, and fluoxetine ( The contaminants or other substances removed or reduced by this water filter are not necessarily in all users' water)\"",
      "Source": "Product Description",
      "IsLinkSuppressed": "0"
    },
    {...}
  ],
  "ASIN": "B000AST3AK",
  "ItemAttributes": {...},
  "PrunedReviews": [
    {
      "Content": "Shipped immediately, took it out of the box, ignored all the instructions and warnings as men do, unscrewed the other one, put this one in, filled up 2 big glasses of water to \"flush it\" and that was it. Maybe 5mins worth of effort? Not sure what else to say in a review of this item, worked great in our GE fridge."
    },
    {...},
    {...},
    {...},
    {...},
    {...},
    {...},
    {...},
    {...},
    {...}
  ],
  "BrowseNodes": {
    "BrowseNode": [
      {
        "Ancestors": {
          "BrowseNode": {
            "Name": "Home & Kitchen",
            "BrowseNodeId": 1055398
          }
        },
        "Name": "Kitchen & Dining Features",
        "Children": {
          "BrowseNode": {
            "Name": "Featured Categories",
            "BrowseNodeId": 51552011
          }
        },
        "BrowseNodeId": 13900821
      },
      {...},
      {...},
      {...},
      {...}
    ]
  }
}

Example json object for a product above. Important fields that needs to be considered expanded. If array, one item is expanded to understand the inner fields. Each product can belong to multiple amazon categories and hence the array BrowseNodes. Each BrowseNodes element describes the hierarchy of its parent categories and subcategories leading upto the parent. You can also get this information from heirarchy file given eacg BrowseNodeId for a product

More details on various elements: ASIN: An token distributed by Amazon that uniquely identifies an item. BrowseNodeId: A positive integer that uniquely identifies a product category. Feature: An item's feature PrunedReviews: Reviews written by editors PrunedEditorialReviews: Description/review written by amazon and trusted parties For further details on attributes refer to following URL: http://docs.aws.amazon.com/AWSECommerceService/latest/DG/CHAP_response_elements.

Social Conversations Dataset

OVERVIEW

Data (Social_Conversations_AmazonLabel) provided with this set consists of approximately 100,000 social conversations (content extracted and masked appropriately from twitter & ebay to comply with TOS) along with a manual amazon label attached to each of them. In the first phase of the project, we will be mostly operating on the text. Other relevant metadata will be exposed on need basis.

DATA DESCRIPTION

{"conversation_id": "523961719358750721",
  "conversation_src": "tw",
  "conversation_text": "RT @authorwhillman: She had it all, wealth, looks & prosperity He had a dog-named Harold. She needed a pretend husband & he needed a life h\u2026",
  "Amazon_Browsenodes": {"BrowseNode": [{"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Books", "BrowseNodeId": 283155}}, "Name": "Subjects", "IsCategoryRoot": 1, "BrowseNodeId": 1000}}, "Name": "Literature & Fiction", "BrowseNodeId": 17}}, "Name": "Action & Adventure", "BrowseNodeId": 720360}}, "Name": "Mystery, Thriller & Suspense", "BrowseNodeId": 10159268011}}, "Name": "Mystery", "BrowseNodeId": 10159270011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Books", "BrowseNodeId": 283155}}, "Name": "Subjects", "IsCategoryRoot": 1, "BrowseNodeId": 1000}}, "Name": "Literature & Fiction", "BrowseNodeId": 17}}, "Name": "Action & Adventure", "BrowseNodeId": 720360}}, "Name": "Mystery, Thriller & Suspense", "BrowseNodeId": 10159268011}}, "Name": "Thriller & Suspense", "BrowseNodeId": 10159271011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Books", "BrowseNodeId": 283155}}, "Name": "Subjects", "IsCategoryRoot": 1, "BrowseNodeId": 1000}}, "Name": "Literature & Fiction", "BrowseNodeId": 17}}, "Name": "Action & Adventure", "BrowseNodeId": 720360}}, "Name": "Romance", "BrowseNodeId": 10159272011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Books", "BrowseNodeId": 283155}}, "Name": "Subjects", "IsCategoryRoot": 1, "BrowseNodeId": 1000}}, "Name": "Romance", "BrowseNodeId": 23}}, "Name": "Romantic Suspense", "BrowseNodeId": 13389}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Books", "BrowseNodeId": 283155}}, "Name": "Subjects", "IsCategoryRoot": 1, "BrowseNodeId": 1000}}, "Name": "Romance", "BrowseNodeId": 23}}, "Name": "Romantic Comedy", "BrowseNodeId": 9059885011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Kindle Store", "BrowseNodeId": 133140011}}, "Name": "Categories", "IsCategoryRoot": 1, "BrowseNodeId": 133141011}}, "Name": "Kindle eBooks", "BrowseNodeId": 154606011}}, "Name": "Literature & Fiction", "BrowseNodeId": 157028011}}, "Name": "Action & Adventure", "BrowseNodeId": 157055011}}, "Name": "Mystery, Thriller & Suspense", "BrowseNodeId": 7588731011}}, "Name": "Mystery", "BrowseNodeId": 7588733011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Kindle Store", "BrowseNodeId": 133140011}}, "Name": "Categories", "IsCategoryRoot": 1, "BrowseNodeId": 133141011}}, "Name": "Kindle eBooks", "BrowseNodeId": 154606011}}, "Name": "Literature & Fiction", "BrowseNodeId": 157028011}}, "Name": "Action & Adventure", "BrowseNodeId": 157055011}}, "Name": "Mystery, Thriller & Suspense", "BrowseNodeId": 7588731011}}, "Name": "Suspense", "BrowseNodeId": 7588734011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Kindle Store", "BrowseNodeId": 133140011}}, "Name": "Categories", "IsCategoryRoot": 1, "BrowseNodeId": 133141011}}, "Name": "Kindle eBooks", "BrowseNodeId": 154606011}}, "Name": "Literature & Fiction", "BrowseNodeId": 157028011}}, "Name": "Action & Adventure", "BrowseNodeId": 157055011}}, "Name": "Romance", "BrowseNodeId": 7588736011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Kindle Store", "BrowseNodeId": 133140011}}, "Name": "Categories", "IsCategoryRoot": 1, "BrowseNodeId": 133141011}}, "Name": "Kindle eBooks", "BrowseNodeId": 154606011}}, "Name": "Romance", "BrowseNodeId": 158566011}}, "Name": "Mystery & Suspense", "BrowseNodeId": 6487839011}}, "Name": "Suspense", "BrowseNodeId": 158574011}, {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Ancestors": {"BrowseNode": {"Name": "Kindle Store", "BrowseNodeId": 133140011}}, "Name": "Categories", "IsCategoryRoot": 1, "BrowseNodeId": 133141011}}, "Name": "Kindle eBooks", "BrowseNodeId": 154606011}}, "Name": "Romance", "BrowseNodeId": 158566011}}, "Name": "Romantic Comedy", "BrowseNodeId": 6487841011}]}}

conversation_id: Unique conversation id conversation_src: tw/ebay conversation_text: Actual text of review or tweet Amazon_Browsenodes: List of amazon categories it belongs to

Reference Papers (Will be updated frequently)