Extreme data sparsity and class imbalance (most ads are not clicked).
For high-scale systems (like YouTube or Instagram feeds), scoring millions of items in real-time is impossible due to strict latency limits. The standard industry pattern splits this into two stages:
How do we measure success? (e.g., CTR, conversion rate, latency). Scale: How many users? How much data? Step 2: High-Level System Design
Do we have labeled data? What is the volume of data available? Step 2: High-Level Architecture (The Data and ML Pipelines)
Extreme data sparsity and class imbalance (most ads are not clicked).
For high-scale systems (like YouTube or Instagram feeds), scoring millions of items in real-time is impossible due to strict latency limits. The standard industry pattern splits this into two stages:
How do we measure success? (e.g., CTR, conversion rate, latency). Scale: How many users? How much data? Step 2: High-Level System Design
Do we have labeled data? What is the volume of data available? Step 2: High-Level Architecture (The Data and ML Pipelines)