ML Projects for Experienced Professionals
Production-grade project ideas with architecture, datasets, evaluation, and business impact — chosen to stretch senior ML engineers.
What "experienced-level" means here
- End-to-end ownership: data → training → serving → monitoring.
- Production constraints: latency, cost, drift, reproducibility.
- Business framing: pick a metric the business actually cares about.
- Failure modes: explicit error analysis, not just leaderboard scores.
Project catalogue
Detect fraudulent credit-card transactions in <50ms with extreme class imbalance (~0.17% positive) and concept drift over time.
KafkaFeastLightGBMFastAPIMLflowEvidentlyDockerKubernetesBuild a Retrieval-Augmented Generation system that answers questions over a private corpus (10k+ PDFs) with citations and zero hallucination tolerance.
LangGraphQdrant / pgvectorBGE embeddingsvLLMRagasLangSmithForecast 14-day demand for 50k SKU-store pairs with promotions, holidays, and intermittent demand.
LightGBMstatsforecastmlforecastOptunaPrefectDuckDBGiven a user-uploaded photo, return visually + semantically similar products from a catalog of 5M items.
PyTorchOpenCLIPFAISS / QdrantONNX RuntimeTritonRecommend items to 10M+ users with millions of items, including handling new users/items appearing every minute.
TensorFlow Recommenders / PyTorchScaNNBigQueryVertex AIAirflowBuild a self-serve platform where data scientists can train, register, deploy, and monitor models with full reproducibility and CI/CD.
KubernetesArgoCDMLflowKubeflowKServePrometheusTerraformIdentify which customers should receive a promotion to maximize *incremental* revenue, not just predicted purchase.
EconMLDoWhyscikit-learnCausalMLPandasPortfolio packaging tips
- One repo per project with a clean README: problem, results, architecture diagram, how to reproduce.
- Show the metric curve, not just the final number — and a baseline you beat.
- Write a 1-page case study in business language; engineers read code, hiring managers read prose.
- Deploy something live — a Streamlit / Gradio demo or a small HTTP API beats a static notebook.
- Talk about trade-offs: what you tried, what failed, what you'd do with more time/data.
Suggested 90-day plan
- Weeks 1–2: scope, data acquisition, baseline model, eval harness.
- Weeks 3–6: iterate on modeling, ablations, error analysis.
- Weeks 7–9: serving, monitoring, drift, CI/CD.
- Weeks 10–12: write-up, demo video, blog post, polish.