AI-Assisted Vertical Video Recommender: End-to-End Example with Training and Serving
Build a pragmatic, low-latency vertical-video recommender: simple features, FAISS recall, compact ranker, ClickHouse analytics, and A/B testing tips for 2026.
Hook: Stop reinventing vertical-video ranking—build a fast, pragmatic recommender you can ship
Short-form, vertical-video platforms force a narrow set of constraints: users decide in seconds; on-device latency matters; metrics move fast. If you’re an engineering manager or ML engineer tired of heavyweight recommender stacks, this guide gives a compact, production-ready pattern: simple feature engineering, clear ranking logic, a low-latency serving layer, and an evaluation + A/B testing plan that plugs into ClickHouse analytics. By the end you’ll have runnable code and a checklist to scale safely in 2026.
Executive summary — what you’ll build and why it matters in 2026
Goal: A two-stage recommender for vertical videos: (1) candidate retrieval (fast, recall-focused) and (2) ranking (real-time scoring with lightweight model + business rules). The stack targets 50–200 ms tail latency for mobile clients.
Why now? Mobile-first vertical streaming exploded through 2024–2025 (see industry moves like Holywater expanding AI-driven vertical content). OLAP systems such as ClickHouse are now mainstream in 2026 for high-throughput analytics. Feature stores, vector search, and lightweight model formats (ONNX/treelite) make low-latency MLOps practical.
Architecture overview
- Offline pipeline: event ingestion → ClickHouse OLAP for analytics & aggregation → feature materialization (Parquet / feature store)
- Candidate retrieval: tag/genre filters + vector ANN (FAISS) for semantic recall
- Ranker: compact model (LightGBM exported to ONNX or treelite) + business rules
- Serving: low-latency scoring microservice (FastAPI / Go) with Redis caching and gRPC for edge
- Evaluation & A/B: instrumentation to ClickHouse, offline replay, and online A/B with significance testing
Key design decisions and trade-offs
- Simplicity over novelty: prioritize explainable, small models for latency and debuggability.
- Two-stage pipeline: avoids heavy ranking on all candidates — recall first, rank second.
- Use ClickHouse for analytics: fast, cost-effective event aggregation and user cohort queries for 2026-scale workloads.
- Feature freshness: combine online features (Redis) and offline materialized features (Parquet/feature store).
Step 1 — Feature engineering you can ship today
Keep features compact. For vertical-video, the most predictive signals in 2026 remain:
- Short-term consumption: last 1h, 6h, 24h watch counts/CTR
- User preferences: weighted genre/tag affinities (decay by time)
- Content freshness: upload recency
- Engagement signals: average watch percent, completion rate
- Context: device OS, network type, time of day
- Semantic relevance: vector embedding similarity between user history and candidate
Example: materialize a tag-affinity feature
Illustrative Python logic for aggregating tag affinities from a ClickHouse events table. This runs in the offline pipeline and writes features to Parquet.
# python: tag affinity aggregation (pseudo runnable)
import pandas as pd
from clickhouse_driver import Client
client = Client('clickhouse-host')
# Event schema: user_id, video_id, tags (array), event_type, ts
q = """
SELECT user_id, arrayJoin(tags) AS tag, sum(if(event_type='watch',1,0)) as watch_count
FROM video_events
WHERE ts >= now() - INTERVAL 7 DAY
GROUP BY user_id, tag
"""
rows = client.execute(q)
df = pd.DataFrame(rows, columns=['user_id','tag','watch_count'])
# normalize to affinity score
df['affinity'] = df.groupby('user_id')['watch_count'].transform(lambda x: x / x.sum())
# write Parquet for feature store
for uid, g in df.groupby('user_id'):
g[['tag','affinity']].to_parquet(f"/features/user_tag_aff/{uid}.parquet")
Step 2 — Candidate retrieval (recall-focused)
Combine cheap filters with vector ANN. In practice:
- Apply business filters: availability, region, age rating
- Boolean recall: top N by tag overlap using precomputed inverted index
- Semantic recall: FAISS ANN on video embeddings for last-k watched videos
FAISS snippet (Python) to retrieve 200 candidates
import faiss
import numpy as np
# load index (built offline)
index = faiss.read_index('video_embeddings.index')
# user_embedding: aggregate of last watched video embeddings
D, I = index.search(user_embedding.astype('float32'), 200) # distances, ids
candidates = I[0].tolist()
Step 3 — Ranking model: small, fast, explainable
For vertical video, a compact gradient boosted tree or logistic regression often hits a sweet spot. Train on per-impression labeled data (label = click or watch >= 50%). Use features from offline materialization + online state.
Training example with LightGBM (simplified)
import lightgbm as lgb
import pandas as pd
# load training data from Parquet/ClickHouse export
train = pd.read_parquet('training_data.parquet')
X = train[feature_cols]
y = train['label']
dtrain = lgb.Dataset(X, label=y)
params = {'objective': 'binary', 'metric': 'auc', 'num_leaves': 31, 'learning_rate': 0.05}
model = lgb.train(params, dtrain, num_boost_round=200)
model.save_model('ranker.txt')
Export to a low-latency format. Options in 2026:
- Treelite for CPU-optmized compiled predictors
- ONNX if you need cross-language scoring
- TorchScript only when using small NN models and you accept slightly higher latency
Scoring function (pseudo)
# on the server: assemble features for candidate and call compiled predictor
def score_candidate(user_features, candidate_features):
feature_vector = assemble(user_features, candidate_features)
score = predictor.predict(feature_vector)
# final business rule: penalize stale content
if candidate_features['age_hours'] > 72:
score *= 0.9
return score
Step 4 — Low-latency serving layer
Requirements: 95th/99th percentile latency under 200 ms; throughput to match peak DAU. Simple architecture:
- Edge gateway (gRPC or HTTP/2)
- Stats service for A/B labels + routing
- Candidate service: queries FAISS (local shard) + inverted index in Redis
- Ranker service: compiled model loaded in memory, assembles features from Redis + materialized store
- Response cache: Redis near client to serve frequent requests
FastAPI example: synchronous, under 100ms for scoring
from fastapi import FastAPI
import uvicorn
app = FastAPI()
@app.post('/recommend')
def recommend(req: dict):
user_id = req['user_id']
user_features = redis_get_user_features(user_id)
candidates = retrieve_candidates(user_features)
scored = [(c, score_candidate(user_features, c)) for c in candidates]
scored.sort(key=lambda x: x[1], reverse=True)
return {'items': [c[0] for c in scored[:10]]}
if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8080)
Notes: In production, use a compiled Python runtime (PyPy) or Go for the retrieval layer, keep model code in native C via treelite, and pre-join features to avoid network roundtrips.
Instrumentation & ClickHouse analytics
ClickHouse has become a de-facto standard in 2026 for high-volume event analytics (see ClickHouse funding momentum). Use ClickHouse for:
- Aggregating impression/click/watch events in real time
- Computing offline training labels and cohort metrics
- Running A/B test analysis with fast group-bys
Example ClickHouse schema & query
-- schema: video_events
CREATE TABLE video_events (
ts DateTime,
user_id UInt64,
video_id UInt64,
event_type String, -- 'impression','click','watch'
watch_pct Float32,
device String,
country String,
placement String
) ENGINE = MergeTree() ORDER BY (ts);
-- 24h CTR per video
SELECT video_id,
countIf(event_type='click')/countIf(event_type='impression') AS ctr
FROM video_events
WHERE ts >= now() - INTERVAL 1 DAY
GROUP BY video_id
ORDER BY ctr DESC
LIMIT 100;
Evaluation: offline and online
Build a reproducible evaluation pipeline:
- Offline replay: compute predicted scores on historical impressions and measure AUC/precision@k
- Counterfactual analysis: inverse propensity scoring if serving policy changed
- Online A/B: expose variants via edge routing and log events to ClickHouse
- Statistical tests: sequential testing with pre-specified metrics (watch time, retention)
Offline replay example (pseudo)
from sklearn.metrics import roc_auc_score
preds = model.predict(X_test)
auc = roc_auc_score(y_test, preds)
print('AUC', auc)
Designing A/B tests for vertical video
- Primary metric: incremental 7-day watch time per DAU
- Secondary metrics: CTR, completion rate, retention (D1/D7)
- Safety guardrails: no >5% drop in retention or content moderation flags
- Use ClickHouse to compute daily cohorts and running significance (p-values and MDE)
Feature store considerations
In 2026, mature feature stores (Feast, Tecton) simplify online/offline feature parity. If you cannot run a full-feature store, implement a hybrid approach:
- Materialize features to Parquet/ClickHouse daily for training
- Load hot user features into Redis for online joins
- Use a consistent feature schema and versioning
Scaling: from single-region to global
Key scaling levers and operational notes:
- Caching: use short TTL response caches near your CDN/edge to cut repeated rank calls.
- Sharding: shard FAISS/embedding index by user segment or popularity to reduce search cost.
- Autoscaling: pre-warm ranker instances during predictable spikes (evenings) rather than pure reactive autoscaling.
- Vector search at scale: an HNSW FAISS index with quantization or approximate search reduces memory and latency.
- Model updates: adopt rolling deploys and use shadow traffic to validate new models at scale before full rollout.
- Data governance: enforce feature lineage and privacy-preserving aggregations (DP noise for small cohorts) — regulatory attention is higher in 2026.
Operational checklist
- Stream events into ClickHouse and verify event completeness
- Materialize user and content features daily; load hot features into Redis
- Build offline training pipeline, version models, export to optimized runtime
- Deploy candidate + ranker services with 95/99 latency SLAs and caching
- Instrument all impressions/clicks/watches with A/B labels & log to ClickHouse
- Run daily A/B analyses and safety checks (content quality, filter integrity)
Security, privacy and licensing notes
Protect PII and follow modern privacy requirements (consent, data deletion). When using third-party embeddings or pre-trained models, track licenses and attribution. In 2026, regulators and platforms expect explicit logging of algorithmic decisions for auditability.
Advanced strategies and 2026 trends
Consider these advanced moves as your platform matures:
- Multimodal signals: combine video frame embeddings + audio + text (captions) to improve cold-start recall. Lightweight cross-modal encoders are practical in 2026 when quantized.
- Personalized exploration budgets: learn per-user exploration rates to maximize discovery without hurting retention.
- On-device ranking: push tiny models to client when network is unreliable; sync updated embeddings periodically.
- Real-time policy controls: dynamic business rules via a feature flag system integrated with the serving layer.
- Observability: use model explainability tools to monitor feature drift and emergent biases.
Practical rule: start with what you can measure reliably. In 2026, teams that ship simple, observable recommenders beat those that perfect opaque models.
Case study sketch: incremental rollout timeline (6 weeks)
- Week 1: Event pipeline + ClickHouse baseline metrics
- Week 2: Offline feature engineering and initial model (LR/LightGBM)
- Week 3: Candidate retrieval + FAISS index build
- Week 4: Ranker service and local load testing (p95/p99)
- Week 5: Canary and shadow traffic to production for 1% of traffic
- Week 6: A/B test, monitor, roll to 100% if safe
Common pitfalls and how to avoid them
- Too-large models: increase latency and complicate debugging. Use compact models first.
- Feature mismatch: ensure online/offline parity with schema enforcement and unit tests.
- Confounded A/B tests: avoid routing users repeatedly between variants; use hard allocation or user-level bucketing.
- Ignoring freshness: vertical video thrives on novelty—surface recency as an explicit feature.
Actionable takeaways
- Ship a two-stage recommender: recall (FAISS + filters) → rank (small GBDT/logistic) for low latency.
- Use ClickHouse for event analytics and A/B evaluation — it's fast and cost-effective at scale in 2026.
- Materialize features offline and keep hot features in Redis for online joins to meet 100–200 ms SLAs.
- Instrument everything and run reproducible offline replays before online experiments.
- Plan for scaling: caching, sharding, compiled model formats, and strict feature schemas.
Next steps and resources
Start with a minimal pipeline: stream impressions into ClickHouse, compute simple tag affinities, train a LightGBM ranker, and deploy a FastAPI ranker that loads features from Redis. Add FAISS retrieval when you need semantic recall. For feature stores, evaluate Feast or Tecton if you need large-scale feature parity.
Closing call-to-action
If you want the runnable starter kit that mirrors this architecture — including Dockerfiles for FastAPI, a small LightGBM training script, and ClickHouse ingestion SQL — download the repo and a one-page ops checklist. Jumpstart your vertical-video recommender with a pragmatic stack optimized for 2026 realities.
Related Reading
- Gifts for the Minimalist: Compact Powerhouses Like the Mac mini M4
- Why a Spike in Global Grain Prices Can Drive Gold — A Macro Guide for Investors
- Luxury Pet Accessories That Double as Personal Fashion Statements
- Client-Facing Playbook: How Brokers Should Respond to a Major Carrier Outage
- Very Chinese Time — A Microfiction Challenge Turning Memes into Empathy
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Building Powerful CI/CD Pipelines: Overcoming Common Roadblocks with Automation Tools
Evaluating Tech Products: The Importance of Performance Reviews in Your Stack
Customizing Linux for Developers: A Beginner's Guide to Creating the Ideal Environment
Optimizing Edge Inference for Logistics: A Guide to Real-Time Decision Making
Preparing for the Future: How to Adapt Your Code for Upcoming Tech Trends
From Our Network
Trending stories across our publication group