Amazon Recommender System Case Study

Presented on 31st October 2021 by Jihyun

Recommender System
1. What Is Recommender System?
2. Type of Recommendations
How Amazon Uses Recommender System

Recommender System

What Is Recommender System?

Recommendation System predicts a user’s preference for a product and serves personalized recommendations based on the prediction.

Relationships Among Users and Products

User-Product Relationship - Does this user have a preference for this product?
Product-Product Relationship - Is product A similar to product B?
User-User Relationship - Is user A similar to user B?

Relationships Provide Insight

User-Product Relationship - What books will an user like?
Product-Product Relationship - If an user buys a phone, what else will they buy?
User-User Relationship - If Heath knows Taeim, and Taeim knows Jung, will Heath know Jung?

Insight Is Monetizable

When a service recommends more personalized offers, users are more likely to make a purchase. This is how many businesses use the insignt for monetization. In fact, 35% of Amazon’s transactions come from recommendation algorithm.

As the service acquires more users, they gain a larger dataset and therefore a better insight.
The recommendations become more accurate.
Users are satisfied with the recommendations, and this attracts more users.

Type of Recommendations

Content Based Filtering

Finding products with similar attributes

User purchases item A.
Find items that have similar attributes with item A.
Recommend these items to the user.

Collaborative Filtering

Find products liked by users with similar interests

Jung purchases item A and B.
Taeim purchases item A, B and C.
Machine learning model learns that Jung and Taeim have common interests.
Jung might also like item C. Recommend item C to Jung.

Association Rules Learning

Find complementary products

Many people who buy phones also buy phone cases.
Machine learning model learns the association rule between phones and phone cases.
Recommend phone cases to users who purchase phones.

How Amazon Uses Recommender System

Step 1: Data Processing

Two Types of Data Processing

Batch Processing
- Processes large volume of data all at once (periodically)
- Use cases: Payroll, billing, orders from customers
- Google Dataflow, Amazon Kinesis, Azure Stream Analytics
Stream Processing
- Processes large volume of data all at once (periodically)
- Use cases: Payroll, billing, orders from customers
- Google Dataflow, Amazon Kinesis, Azure Stream Analytics

Gathering Basic Data - Batch Processing

Batch processing is mainly used to gather basic data that don’t change often or don’t need to be updated in real time.

User delivery address
Item price, brand, color, size, weight, material, etc.
Availability based on location
Special deals

Gathering Behavioral Data - Streaming

Streaming is mainly used to gather behavioral data that need to be updated in real time.

Browsing history
Purchase history

Step 2: Data Transformation

Gathered data are transformed into appropriate format (e.g. JSON, CSV, etc.) and provided to the machine learning model.

Step 3: Machine Learning Model Renewal

Machine learning model is trained from the data provided in step 2.
Model accuracy is evaluated based on user feedback (e.g. User purchased 2 itmes out of 10 recommendations).
Item scoring is used to rank the recommendations. Scoring can depend on many factors, examples include:
- Given an item is displayed to a user, how likely is the user to click the item?
- Given a user clicked an item, how likely is the user to purchase the item?
- Given a user purchased an item, how likely is the user to be satisfied with the item?

The Big Picture

Data Pipeline regularly copies the full contents of a DynamoDB(NoSQL DB) table as JSON into an S3.
Exported JSON files are converted to CSV format to use as a data source for SageMaker(AI platform).
SageMaker renews the model artifact.
The converted CSV is available for ad hoc queries with Amazon Athena(data warehouse).
Data Pipeline repeats the cycle based on the schedule defined by customer requirements.

Resources

AWS Big Data Blog