AI-Assisted Vertical Video Recommender: End-to-End Example with Training and Serving
AIRecommenderAnalytics

AI-Assisted Vertical Video Recommender: End-to-End Example with Training and Serving

UUnknown
2026-03-05
10 min read
Advertisement

Build a pragmatic, low-latency vertical-video recommender: simple features, FAISS recall, compact ranker, ClickHouse analytics, and A/B testing tips for 2026.

Hook: Stop reinventing vertical-video ranking—build a fast, pragmatic recommender you can ship

Short-form, vertical-video platforms force a narrow set of constraints: users decide in seconds; on-device latency matters; metrics move fast. If you’re an engineering manager or ML engineer tired of heavyweight recommender stacks, this guide gives a compact, production-ready pattern: simple feature engineering, clear ranking logic, a low-latency serving layer, and an evaluation + A/B testing plan that plugs into ClickHouse analytics. By the end you’ll have runnable code and a checklist to scale safely in 2026.

Executive summary — what you’ll build and why it matters in 2026

Goal: A two-stage recommender for vertical videos: (1) candidate retrieval (fast, recall-focused) and (2) ranking (real-time scoring with lightweight model + business rules). The stack targets 50–200 ms tail latency for mobile clients.

Why now? Mobile-first vertical streaming exploded through 2024–2025 (see industry moves like Holywater expanding AI-driven vertical content). OLAP systems such as ClickHouse are now mainstream in 2026 for high-throughput analytics. Feature stores, vector search, and lightweight model formats (ONNX/treelite) make low-latency MLOps practical.

Architecture overview

  1. Offline pipeline: event ingestion → ClickHouse OLAP for analytics & aggregation → feature materialization (Parquet / feature store)
  2. Candidate retrieval: tag/genre filters + vector ANN (FAISS) for semantic recall
  3. Ranker: compact model (LightGBM exported to ONNX or treelite) + business rules
  4. Serving: low-latency scoring microservice (FastAPI / Go) with Redis caching and gRPC for edge
  5. Evaluation & A/B: instrumentation to ClickHouse, offline replay, and online A/B with significance testing

Key design decisions and trade-offs

  • Simplicity over novelty: prioritize explainable, small models for latency and debuggability.
  • Two-stage pipeline: avoids heavy ranking on all candidates — recall first, rank second.
  • Use ClickHouse for analytics: fast, cost-effective event aggregation and user cohort queries for 2026-scale workloads.
  • Feature freshness: combine online features (Redis) and offline materialized features (Parquet/feature store).

Step 1 — Feature engineering you can ship today

Keep features compact. For vertical-video, the most predictive signals in 2026 remain:

  • Short-term consumption: last 1h, 6h, 24h watch counts/CTR
  • User preferences: weighted genre/tag affinities (decay by time)
  • Content freshness: upload recency
  • Engagement signals: average watch percent, completion rate
  • Context: device OS, network type, time of day
  • Semantic relevance: vector embedding similarity between user history and candidate

Example: materialize a tag-affinity feature

Illustrative Python logic for aggregating tag affinities from a ClickHouse events table. This runs in the offline pipeline and writes features to Parquet.

# python: tag affinity aggregation (pseudo runnable)
import pandas as pd
from clickhouse_driver import Client

client = Client('clickhouse-host')

# Event schema: user_id, video_id, tags (array), event_type, ts
q = """
SELECT user_id, arrayJoin(tags) AS tag, sum(if(event_type='watch',1,0)) as watch_count
FROM video_events
WHERE ts >= now() - INTERVAL 7 DAY
GROUP BY user_id, tag
"""
rows = client.execute(q)
df = pd.DataFrame(rows, columns=['user_id','tag','watch_count'])

# normalize to affinity score
df['affinity'] = df.groupby('user_id')['watch_count'].transform(lambda x: x / x.sum())

# write Parquet for feature store
for uid, g in df.groupby('user_id'):
    g[['tag','affinity']].to_parquet(f"/features/user_tag_aff/{uid}.parquet")

Step 2 — Candidate retrieval (recall-focused)

Combine cheap filters with vector ANN. In practice:

  • Apply business filters: availability, region, age rating
  • Boolean recall: top N by tag overlap using precomputed inverted index
  • Semantic recall: FAISS ANN on video embeddings for last-k watched videos

FAISS snippet (Python) to retrieve 200 candidates

import faiss
import numpy as np

# load index (built offline)
index = faiss.read_index('video_embeddings.index')

# user_embedding: aggregate of last watched video embeddings
D, I = index.search(user_embedding.astype('float32'), 200)  # distances, ids
candidates = I[0].tolist()

Step 3 — Ranking model: small, fast, explainable

For vertical video, a compact gradient boosted tree or logistic regression often hits a sweet spot. Train on per-impression labeled data (label = click or watch >= 50%). Use features from offline materialization + online state.

Training example with LightGBM (simplified)

import lightgbm as lgb
import pandas as pd

# load training data from Parquet/ClickHouse export
train = pd.read_parquet('training_data.parquet')
X = train[feature_cols]
y = train['label']

dtrain = lgb.Dataset(X, label=y)
params = {'objective': 'binary', 'metric': 'auc', 'num_leaves': 31, 'learning_rate': 0.05}
model = lgb.train(params, dtrain, num_boost_round=200)
model.save_model('ranker.txt')

Export to a low-latency format. Options in 2026:

  • Treelite for CPU-optmized compiled predictors
  • ONNX if you need cross-language scoring
  • TorchScript only when using small NN models and you accept slightly higher latency

Scoring function (pseudo)

# on the server: assemble features for candidate and call compiled predictor
def score_candidate(user_features, candidate_features):
    feature_vector = assemble(user_features, candidate_features)
    score = predictor.predict(feature_vector)
    # final business rule: penalize stale content
    if candidate_features['age_hours'] > 72:
        score *= 0.9
    return score

Step 4 — Low-latency serving layer

Requirements: 95th/99th percentile latency under 200 ms; throughput to match peak DAU. Simple architecture:

  • Edge gateway (gRPC or HTTP/2)
  • Stats service for A/B labels + routing
  • Candidate service: queries FAISS (local shard) + inverted index in Redis
  • Ranker service: compiled model loaded in memory, assembles features from Redis + materialized store
  • Response cache: Redis near client to serve frequent requests

FastAPI example: synchronous, under 100ms for scoring

from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.post('/recommend')
def recommend(req: dict):
    user_id = req['user_id']
    user_features = redis_get_user_features(user_id)
    candidates = retrieve_candidates(user_features)
    scored = [(c, score_candidate(user_features, c)) for c in candidates]
    scored.sort(key=lambda x: x[1], reverse=True)
    return {'items': [c[0] for c in scored[:10]]}

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=8080)

Notes: In production, use a compiled Python runtime (PyPy) or Go for the retrieval layer, keep model code in native C via treelite, and pre-join features to avoid network roundtrips.

Instrumentation & ClickHouse analytics

ClickHouse has become a de-facto standard in 2026 for high-volume event analytics (see ClickHouse funding momentum). Use ClickHouse for:

  • Aggregating impression/click/watch events in real time
  • Computing offline training labels and cohort metrics
  • Running A/B test analysis with fast group-bys

Example ClickHouse schema & query

-- schema: video_events
CREATE TABLE video_events (
  ts DateTime,
  user_id UInt64,
  video_id UInt64,
  event_type String, -- 'impression','click','watch'
  watch_pct Float32,
  device String,
  country String,
  placement String
) ENGINE = MergeTree() ORDER BY (ts);

-- 24h CTR per video
SELECT video_id, 
       countIf(event_type='click')/countIf(event_type='impression') AS ctr
FROM video_events
WHERE ts >= now() - INTERVAL 1 DAY
GROUP BY video_id
ORDER BY ctr DESC
LIMIT 100;

Evaluation: offline and online

Build a reproducible evaluation pipeline:

  1. Offline replay: compute predicted scores on historical impressions and measure AUC/precision@k
  2. Counterfactual analysis: inverse propensity scoring if serving policy changed
  3. Online A/B: expose variants via edge routing and log events to ClickHouse
  4. Statistical tests: sequential testing with pre-specified metrics (watch time, retention)

Offline replay example (pseudo)

from sklearn.metrics import roc_auc_score

preds = model.predict(X_test)
auc = roc_auc_score(y_test, preds)
print('AUC', auc)

Designing A/B tests for vertical video

  • Primary metric: incremental 7-day watch time per DAU
  • Secondary metrics: CTR, completion rate, retention (D1/D7)
  • Safety guardrails: no >5% drop in retention or content moderation flags
  • Use ClickHouse to compute daily cohorts and running significance (p-values and MDE)

Feature store considerations

In 2026, mature feature stores (Feast, Tecton) simplify online/offline feature parity. If you cannot run a full-feature store, implement a hybrid approach:

  • Materialize features to Parquet/ClickHouse daily for training
  • Load hot user features into Redis for online joins
  • Use a consistent feature schema and versioning

Scaling: from single-region to global

Key scaling levers and operational notes:

  • Caching: use short TTL response caches near your CDN/edge to cut repeated rank calls.
  • Sharding: shard FAISS/embedding index by user segment or popularity to reduce search cost.
  • Autoscaling: pre-warm ranker instances during predictable spikes (evenings) rather than pure reactive autoscaling.
  • Vector search at scale: an HNSW FAISS index with quantization or approximate search reduces memory and latency.
  • Model updates: adopt rolling deploys and use shadow traffic to validate new models at scale before full rollout.
  • Data governance: enforce feature lineage and privacy-preserving aggregations (DP noise for small cohorts) — regulatory attention is higher in 2026.

Operational checklist

  1. Stream events into ClickHouse and verify event completeness
  2. Materialize user and content features daily; load hot features into Redis
  3. Build offline training pipeline, version models, export to optimized runtime
  4. Deploy candidate + ranker services with 95/99 latency SLAs and caching
  5. Instrument all impressions/clicks/watches with A/B labels & log to ClickHouse
  6. Run daily A/B analyses and safety checks (content quality, filter integrity)

Security, privacy and licensing notes

Protect PII and follow modern privacy requirements (consent, data deletion). When using third-party embeddings or pre-trained models, track licenses and attribution. In 2026, regulators and platforms expect explicit logging of algorithmic decisions for auditability.

Consider these advanced moves as your platform matures:

  • Multimodal signals: combine video frame embeddings + audio + text (captions) to improve cold-start recall. Lightweight cross-modal encoders are practical in 2026 when quantized.
  • Personalized exploration budgets: learn per-user exploration rates to maximize discovery without hurting retention.
  • On-device ranking: push tiny models to client when network is unreliable; sync updated embeddings periodically.
  • Real-time policy controls: dynamic business rules via a feature flag system integrated with the serving layer.
  • Observability: use model explainability tools to monitor feature drift and emergent biases.
Practical rule: start with what you can measure reliably. In 2026, teams that ship simple, observable recommenders beat those that perfect opaque models.

Case study sketch: incremental rollout timeline (6 weeks)

  1. Week 1: Event pipeline + ClickHouse baseline metrics
  2. Week 2: Offline feature engineering and initial model (LR/LightGBM)
  3. Week 3: Candidate retrieval + FAISS index build
  4. Week 4: Ranker service and local load testing (p95/p99)
  5. Week 5: Canary and shadow traffic to production for 1% of traffic
  6. Week 6: A/B test, monitor, roll to 100% if safe

Common pitfalls and how to avoid them

  • Too-large models: increase latency and complicate debugging. Use compact models first.
  • Feature mismatch: ensure online/offline parity with schema enforcement and unit tests.
  • Confounded A/B tests: avoid routing users repeatedly between variants; use hard allocation or user-level bucketing.
  • Ignoring freshness: vertical video thrives on novelty—surface recency as an explicit feature.

Actionable takeaways

  • Ship a two-stage recommender: recall (FAISS + filters) → rank (small GBDT/logistic) for low latency.
  • Use ClickHouse for event analytics and A/B evaluation — it's fast and cost-effective at scale in 2026.
  • Materialize features offline and keep hot features in Redis for online joins to meet 100–200 ms SLAs.
  • Instrument everything and run reproducible offline replays before online experiments.
  • Plan for scaling: caching, sharding, compiled model formats, and strict feature schemas.

Next steps and resources

Start with a minimal pipeline: stream impressions into ClickHouse, compute simple tag affinities, train a LightGBM ranker, and deploy a FastAPI ranker that loads features from Redis. Add FAISS retrieval when you need semantic recall. For feature stores, evaluate Feast or Tecton if you need large-scale feature parity.

Closing call-to-action

If you want the runnable starter kit that mirrors this architecture — including Dockerfiles for FastAPI, a small LightGBM training script, and ClickHouse ingestion SQL — download the repo and a one-page ops checklist. Jumpstart your vertical-video recommender with a pragmatic stack optimized for 2026 realities.

Advertisement

Related Topics

#AI#Recommender#Analytics
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-05T02:42:24.652Z