Demos
Data pipeline • scraping → summarization
News Analyzer
I built this pipeline to turn raw local news into a concise daily briefing. It scrapes, extracts, indexes, and summarizes—then ships the result through my Kubernetes stack.
Loading visualization...
The particles represent batches moving between stages.
Under the Hood
Automated Scraping
Uses Playwright for auth + navigation on dynamic e-editions. Downloads source documents and keeps sessions stable across runs.
Index + Retrieval
Extracted text gets chunked + embedded, then written to Qdrant / Weaviate for retrieval and dedupe across editions.
AI Summarization
Uses LLMs through LiteLLM to generate a compact, scannable briefing and extract key entities for follow-up queries.
What I optimize for
- Idempotent runs (safe to retry)
- Dedupe + provenance (know what changed)
- Stable prompts (same input → same shape output)
Common failure modes
The “hard” part isn’t the model—it’s everything around it: logins, flaky sources, and making sure partial data doesn’t poison the index.
session expiryPDF layout driftduplicate editionsrate limits