Data scientist and researcher with a strong statistical foundation (probability, Bayesian methods, time-series). I build the full lifecycle — from data pipelines and feature engineering to model training, rigorous validation, and deployment — across machine learning, deep learning, and LLMs. Recent work spans commodity-return forecasting, urban-mobility analytics, and domain-specific Spanish language models.
- 🔭 Currently: applying ML/DL to real problems and shipping reproducible, documented projects.
- 🌱 Comfortable from classic ML & statistics to LLM fine-tuning + RAG.
- 🗣️ Spanish (native) · English (intermediate, conversational).
- Machine Learning & Deep Learning — predictive modeling, gradient boosting, neural nets.
- Time Series & Forecasting — walk-forward validation, backtesting, high-frequency data.
- Statistical Modeling — inference, Bayesian methods, uncertainty quantification.
- LLMs / NLP — fine-tuning (LoRA/QLoRA), retrieval-augmented generation (RAG).
- Applied Data Science — data pipelines, dashboards, and clear communication of results.
Languages
ML / Deep Learning
LLMs / NLP
Data & Tooling
Quantitative research: do weather & climate-risk features improve commodity return
forecasts? Walk-forward validation, XGBoost/LightGBM, Bayesian methods, cost-aware
backtesting (Sharpe, IC, drawdown).
Python · XGBoost · LightGBM · PyMC · time-series · backtesting
Reproducible pipeline + interactive dashboard for Mexico City mobility (GTFS, ECOBICI
GBFS, C5). Ingestion, data-quality reports, KPIs, and 7-day demand forecasting.
Python · Streamlit · data-pipeline · XGBoost / LightGBM / CatBoost
Domain-specific Spanish LLM pipeline: dataset cleaning/deduplication, QLoRA
fine-tuning (HuggingFace + TRL + PEFT), and RAG on PostgreSQL + pgvector with an
evaluation suite.
PyTorch · Hugging Face · QLoRA · RAG · pgvector
“Transforming data into actionable insights is not just my profession, it's my passion.”


