End-to-end Data Science
Planning → data contracts → Airflow/dbt/Spark
features → training/validation
(scikit-learn • XGBoost • PyTorch
) → packaging & CI/CD → monitored deploy with MLflow
.
I’m a Data Scientist with 3.5+ years of professional experience building end-to-end data science products — from problem scoping and data contracts to feature pipelines, model training/validation, CI/CD, and monitored deployment. I’ve shipped forecasting, anomaly detection, and decision intelligence solutions used by business leaders.
I love tackling hard, ambiguous problems and turning them into usable, data-driven products. My goal is to leverage my skills to help companies empower users and confidently go after what’s next.
Planning → data contracts → Airflow/dbt/Spark
features → training/validation
(scikit-learn • XGBoost • PyTorch
) → packaging & CI/CD → monitored deploy with MLflow
.
Hierarchical/time-series (Prophet, ARIMA
), change-point & anomaly detection,
uplift modeling and causal inference
(DoWhy
) with backtesting & guardrails.
Executive dashboards in Power BI
with curated DAX
, drill-through, RLS
,
and a maintainable semantic layer aligned to business metrics.
RAG architectures
(chunking, embeddings, vector DBs), tool-use AI agents,
eval harnesses, prompt safety, and observability for production quality.
Production-style pipeline for imbalanced classification: cost-sensitive training, threshold tuning, PR-AUC monitoring, and analyst workflow integration. GitHub →
Joined FAA/ASOS weather with flight histories; engineered lag features, evaluated with ROC-AUC across seasonal splits, and designed reproducible preprocessing. GitHub →
End-to-end MLOps with experiment tracking, model registry, packaged inference, and deployment patterns for batch + near-real-time prediction. GitHub →
Content-based retrieval with vectorization + cosine similarity; fast prototyping, explainable results, and clean offline evaluation notebook. GitHub →
Real-time feature updates drive calibrated win probability; showcases feature selection, calibration, and intuitive model outputs. GitHub →
Executive KPIs for Patients, LOS, Cost/Stay with semantic model, curated DAX measures, and secure row-level security. GitHub →
Statistically sound experimentation: Z-tests, lift & CIs, sanity checks, outlier handling, and reproducible workflow for marketing teams. GitHub →
Historical Olympics EDA with tidy joins, feature discovery, and clear visual storytelling for non-technical stakeholders. GitHub →
Clean visualization of exoplanet catalogs covering scales, outliers, and correlations with disciplined chart design. GitHub →
Market KPIs for revenue, genres, ROI; robust CSV cleaning and reusable visualization helpers. GitHub →
Cohort and margin drill-downs; optimized extracts and pragmatic fact/dim semantics for speed. GitHub →
LLM-assisted EDA & charting for CSVs with guardrails; prompts → insights + visuals for faster exploration. GitHub →
RAG agent with tool use; strong observability, error-handled boundaries, and clean tracing for debugging. GitHub →
Multi-model agent (Groq + OpenAI) with optional web search; clean API boundary and helpful UI. GitHub →
Query & visualize airline mix, busiest airports, and daily frequencies with portable DB helpers. GitHub →
End-to-end pipeline: transcription, chunking, ranked summaries, and a clean “summary UX”. GitHub →
Streaming discovery helper using resilient scraping + parsing; lightweight CLI/app patterns. GitHub →
Click a category to filter. Images live in assets/projects/
.
Email: mayurdalvi.5@gmail.com
Location: Denver, CO (Open to relocate)