AI-Assisted Vertical Video Recommender Guide

Build a pragmatic, low-latency vertical-video recommender: simple features, FAISS recall, compact ranker, ClickHouse analytics, and A/B testing tips for 2026.

Hook: Stop reinventing vertical-video ranking—build a fast, pragmatic recommender you can ship

Short-form, vertical-video platforms force a narrow set of constraints: users decide in seconds; on-device latency matters; metrics move fast. If you’re an engineering manager or ML engineer tired of heavyweight recommender stacks, this guide gives a compact, production-ready pattern: simple feature engineering, clear ranking logic, a low-latency serving layer, and an evaluation + A/B testing plan that plugs into ClickHouse analytics. By the end you’ll have runnable code and a checklist to scale safely in 2026.

Executive summary — what you’ll build and why it matters in 2026

Goal: A two-stage recommender for vertical videos: (1) candidate retrieval (fast, recall-focused) and (2) ranking (real-time scoring with lightweight model + business rules). The stack targets 50–200 ms tail latency for mobile clients.

Why now? Mobile-first vertical streaming exploded through 2024–2025 (see industry moves like Holywater expanding AI-driven vertical content). OLAP systems such as ClickHouse are now mainstream in 2026 for high-throughput analytics. Feature stores, vector search, and lightweight model formats (ONNX/treelite) make low-latency MLOps practical.

Architecture overview

Offline pipeline: event ingestion → ClickHouse OLAP for analytics & aggregation → feature materialization (Parquet / feature store)
Candidate retrieval: tag/genre filters + vector ANN (FAISS) for semantic recall
Ranker: compact model (LightGBM exported to ONNX or treelite) + business rules
Serving: low-latency scoring microservice (FastAPI / Go) with Redis caching and gRPC for edge
Evaluation & A/B: instrumentation to ClickHouse, offline replay, and online A/B with significance testing

Key design decisions and trade-offs

Simplicity over novelty: prioritize explainable, small models for latency and debuggability.
Two-stage pipeline: avoids heavy ranking on all candidates — recall first, rank second.
Use ClickHouse for analytics: fast, cost-effective event aggregation and user cohort queries for 2026-scale workloads.
Feature freshness: combine online features (Redis) and offline materialized features (Parquet/feature store).

Step 1 — Feature engineering you can ship today

Keep features compact. For vertical-video, the most predictive signals in 2026 remain:

Short-term consumption: last 1h, 6h, 24h watch counts/CTR
User preferences: weighted genre/tag affinities (decay by time)
Content freshness: upload recency
Engagement signals: average watch percent, completion rate
Context: device OS, network type, time of day
Semantic relevance: vector embedding similarity between user history and candidate

Example: materialize a tag-affinity feature

Illustrative Python logic for aggregating tag affinities from a ClickHouse events table. This runs in the offline pipeline and writes features to Parquet.

# python: tag affinity aggregation (pseudo runnable)
import pandas as pd
from clickhouse_driver import Client

client = Client('clickhouse-host')

# Event schema: user_id, video_id, tags (array), event_type, ts
q = """
SELECT user_id, arrayJoin(tags) AS tag, sum(if(event_type='watch',1,0)) as watch_count
FROM video_events
WHERE ts >= now() - INTERVAL 7 DAY
GROUP BY user_id, tag
"""
rows = client.execute(q)
df = pd.DataFrame(rows, columns=['user_id','tag','watch_count'])

# normalize to affinity score
df['affinity'] = df.groupby('user_id')['watch_count'].transform(lambda x: x / x.sum())

# write Parquet for feature store
for uid, g in df.groupby('user_id'):
    g[['tag','affinity']].to_parquet(f"/features/user_tag_aff/{uid}.parquet")

Step 2 — Candidate retrieval (recall-focused)

Combine cheap filters with vector ANN. In practice:

Apply business filters: availability, region, age rating
Boolean recall: top N by tag overlap using precomputed inverted index
Semantic recall: FAISS ANN on video embeddings for last-k watched videos

FAISS snippet (Python) to retrieve 200 candidates

import faiss
import numpy as np

# load index (built offline)
index = faiss.read_index('video_embeddings.index')

# user_embedding: aggregate of last watched video embeddings
D, I = index.search(user_embedding.astype('float32'), 200)  # distances, ids
candidates = I[0].tolist()

Step 3 — Ranking model: small, fast, explainable

For vertical video, a compact gradient boosted tree or logistic regression often hits a sweet spot. Train on per-impression labeled data (label = click or watch >= 50%). Use features from offline materialization + online state.

Training example with LightGBM (simplified)

import lightgbm as lgb
import pandas as pd

# load training data from Parquet/ClickHouse export
train = pd.read_parquet('training_data.parquet')
X = train[feature_cols]
y = train['label']

dtrain = lgb.Dataset(X, label=y)
params = {'objective': 'binary', 'metric': 'auc', 'num_leaves': 31, 'learning_rate': 0.05}
model = lgb.train(params, dtrain, num_boost_round=200)
model.save_model('ranker.txt')

Export to a low-latency format. Options in 2026:

Treelite for CPU-optmized compiled predictors
ONNX if you need cross-language scoring
TorchScript only when using small NN models and you accept slightly higher latency

Scoring function (pseudo)

# on the server: assemble features for candidate and call compiled predictor
def score_candidate(user_features, candidate_features):
    feature_vector = assemble(user_features, candidate_features)
    score = predictor.predict(feature_vector)
    # final business rule: penalize stale content
    if candidate_features['age_hours'] > 72:
        score *= 0.9
    return score

Step 4 — Low-latency serving layer

Requirements: 95th/99th percentile latency under 200 ms; throughput to match peak DAU. Simple architecture:

Edge gateway (gRPC or HTTP/2)
Stats service for A/B labels + routing
Candidate service: queries FAISS (local shard) + inverted index in Redis
Ranker service: compiled model loaded in memory, assembles features from Redis + materialized store
Response cache: Redis near client to serve frequent requests

FastAPI example: synchronous, under 100ms for scoring

from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.post('/recommend')
def recommend(req: dict):
    user_id = req['user_id']
    user_features = redis_get_user_features(user_id)
    candidates = retrieve_candidates(user_features)
    scored = [(c, score_candidate(user_features, c)) for c in candidates]
    scored.sort(key=lambda x: x[1], reverse=True)
    return {'items': [c[0] for c in scored[:10]]}

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=8080)

Notes: In production, use a compiled Python runtime (PyPy) or Go for the retrieval layer, keep model code in native C via treelite, and pre-join features to avoid network roundtrips.

Instrumentation & ClickHouse analytics

ClickHouse has become a de-facto standard in 2026 for high-volume event analytics (see ClickHouse funding momentum). Use ClickHouse for:

Aggregating impression/click/watch events in real time
Computing offline training labels and cohort metrics
Running A/B test analysis with fast group-bys

Example ClickHouse schema & query

-- schema: video_events
CREATE TABLE video_events (
  ts DateTime,
  user_id UInt64,
  video_id UInt64,
  event_type String, -- 'impression','click','watch'
  watch_pct Float32,
  device String,
  country String,
  placement String
) ENGINE = MergeTree() ORDER BY (ts);

-- 24h CTR per video
SELECT video_id, 
       countIf(event_type='click')/countIf(event_type='impression') AS ctr
FROM video_events
WHERE ts >= now() - INTERVAL 1 DAY
GROUP BY video_id
ORDER BY ctr DESC
LIMIT 100;

Evaluation: offline and online

Build a reproducible evaluation pipeline:

Offline replay: compute predicted scores on historical impressions and measure AUC/precision@k
Counterfactual analysis: inverse propensity scoring if serving policy changed
Online A/B: expose variants via edge routing and log events to ClickHouse
Statistical tests: sequential testing with pre-specified metrics (watch time, retention)

Offline replay example (pseudo)

from sklearn.metrics import roc_auc_score

preds = model.predict(X_test)
auc = roc_auc_score(y_test, preds)
print('AUC', auc)

Designing A/B tests for vertical video

Primary metric: incremental 7-day watch time per DAU
Secondary metrics: CTR, completion rate, retention (D1/D7)
Safety guardrails: no >5% drop in retention or content moderation flags
Use ClickHouse to compute daily cohorts and running significance (p-values and MDE)

Feature store considerations

In 2026, mature feature stores (Feast, Tecton) simplify online/offline feature parity. If you cannot run a full-feature store, implement a hybrid approach:

Materialize features to Parquet/ClickHouse daily for training
Load hot user features into Redis for online joins
Use a consistent feature schema and versioning

Scaling: from single-region to global

Key scaling levers and operational notes:

Caching: use short TTL response caches near your CDN/edge to cut repeated rank calls.
Sharding: shard FAISS/embedding index by user segment or popularity to reduce search cost.
Autoscaling: pre-warm ranker instances during predictable spikes (evenings) rather than pure reactive autoscaling.
Vector search at scale: an HNSW FAISS index with quantization or approximate search reduces memory and latency.
Model updates: adopt rolling deploys and use shadow traffic to validate new models at scale before full rollout.
Data governance: enforce feature lineage and privacy-preserving aggregations (DP noise for small cohorts) — regulatory attention is higher in 2026.

Operational checklist

Stream events into ClickHouse and verify event completeness
Materialize user and content features daily; load hot features into Redis
Build offline training pipeline, version models, export to optimized runtime
Deploy candidate + ranker services with 95/99 latency SLAs and caching
Instrument all impressions/clicks/watches with A/B labels & log to ClickHouse
Run daily A/B analyses and safety checks (content quality, filter integrity)

Security, privacy and licensing notes

Protect PII and follow modern privacy requirements (consent, data deletion). When using third-party embeddings or pre-trained models, track licenses and attribution. In 2026, regulators and platforms expect explicit logging of algorithmic decisions for auditability.

Advanced strategies and 2026 trends

Consider these advanced moves as your platform matures:

Multimodal signals: combine video frame embeddings + audio + text (captions) to improve cold-start recall. Lightweight cross-modal encoders are practical in 2026 when quantized.
Personalized exploration budgets: learn per-user exploration rates to maximize discovery without hurting retention.
On-device ranking: push tiny models to client when network is unreliable; sync updated embeddings periodically.
Real-time policy controls: dynamic business rules via a feature flag system integrated with the serving layer.
Observability: use model explainability tools to monitor feature drift and emergent biases.

Practical rule: start with what you can measure reliably. In 2026, teams that ship simple, observable recommenders beat those that perfect opaque models.

Case study sketch: incremental rollout timeline (6 weeks)

Week 1: Event pipeline + ClickHouse baseline metrics
Week 2: Offline feature engineering and initial model (LR/LightGBM)
Week 3: Candidate retrieval + FAISS index build
Week 4: Ranker service and local load testing (p95/p99)
Week 5: Canary and shadow traffic to production for 1% of traffic
Week 6: A/B test, monitor, roll to 100% if safe

Common pitfalls and how to avoid them

Too-large models: increase latency and complicate debugging. Use compact models first.
Feature mismatch: ensure online/offline parity with schema enforcement and unit tests.
Confounded A/B tests: avoid routing users repeatedly between variants; use hard allocation or user-level bucketing.
Ignoring freshness: vertical video thrives on novelty—surface recency as an explicit feature.

Actionable takeaways

Ship a two-stage recommender: recall (FAISS + filters) → rank (small GBDT/logistic) for low latency.
Use ClickHouse for event analytics and A/B evaluation — it's fast and cost-effective at scale in 2026.
Materialize features offline and keep hot features in Redis for online joins to meet 100–200 ms SLAs.
Instrument everything and run reproducible offline replays before online experiments.
Plan for scaling: caching, sharding, compiled model formats, and strict feature schemas.

Next steps and resources

Start with a minimal pipeline: stream impressions into ClickHouse, compute simple tag affinities, train a LightGBM ranker, and deploy a FastAPI ranker that loads features from Redis. Add FAISS retrieval when you need semantic recall. For feature stores, evaluate Feast or Tecton if you need large-scale feature parity.

Closing call-to-action

If you want the runnable starter kit that mirrors this architecture — including Dockerfiles for FastAPI, a small LightGBM training script, and ClickHouse ingestion SQL — download the repo and a one-page ops checklist. Jumpstart your vertical-video recommender with a pragmatic stack optimized for 2026 realities.

AI-Assisted Vertical Video Recommender: End-to-End Example with Training and Serving

Hook: Stop reinventing vertical-video ranking—build a fast, pragmatic recommender you can ship

Executive summary — what you’ll build and why it matters in 2026

Architecture overview

Key design decisions and trade-offs

Step 1 — Feature engineering you can ship today

Example: materialize a tag-affinity feature

Step 2 — Candidate retrieval (recall-focused)

FAISS snippet (Python) to retrieve 200 candidates

Step 3 — Ranking model: small, fast, explainable

Training example with LightGBM (simplified)

Scoring function (pseudo)

Step 4 — Low-latency serving layer

FastAPI example: synchronous, under 100ms for scoring

Instrumentation & ClickHouse analytics

Example ClickHouse schema & query

Evaluation: offline and online

Offline replay example (pseudo)

Designing A/B tests for vertical video

Feature store considerations

Scaling: from single-region to global

Operational checklist

Security, privacy and licensing notes

Advanced strategies and 2026 trends

Case study sketch: incremental rollout timeline (6 weeks)

Common pitfalls and how to avoid them

Actionable takeaways

Next steps and resources

Closing call-to-action

Related Topics

codenscripts

Up Next

How to Format SQL Queries for Readability: Rules, Examples, and Team Conventions

Environment Variables in JavaScript Apps: Local Setup, Build-Time Behavior, and Security Tips

CORS Error Fix Guide: Common Causes, Debugging Steps, and Safe Server Configurations

Hook: Stop reinventing vertical-video ranking—build a fast, pragmatic recommender you can ship

Executive summary — what you’ll build and why it matters in 2026

Architecture overview

Key design decisions and trade-offs

Step 1 — Feature engineering you can ship today

Example: materialize a tag-affinity feature

Step 2 — Candidate retrieval (recall-focused)

FAISS snippet (Python) to retrieve 200 candidates

Step 3 — Ranking model: small, fast, explainable

Training example with LightGBM (simplified)

Scoring function (pseudo)

Step 4 — Low-latency serving layer

FastAPI example: synchronous, under 100ms for scoring

Instrumentation & ClickHouse analytics

Example ClickHouse schema & query

Evaluation: offline and online

Offline replay example (pseudo)

Designing A/B tests for vertical video

Feature store considerations

Scaling: from single-region to global

Operational checklist

Security, privacy and licensing notes

Advanced strategies and 2026 trends

Case study sketch: incremental rollout timeline (6 weeks)

Common pitfalls and how to avoid them

Actionable takeaways

Next steps and resources

Closing call-to-action

Related Reading

Related Topics

codenscripts

Up Next

How to Format SQL Queries for Readability: Rules, Examples, and Team Conventions

Environment Variables in JavaScript Apps: Local Setup, Build-Time Behavior, and Security Tips

CORS Error Fix Guide: Common Causes, Debugging Steps, and Safe Server Configurations