Skip to content
View GaiskaSalomon's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report GaiskaSalomon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
GaiskaSalomon/README.md

👋 Hi, I'm Gaiska Salomón

🎓 Ph.D. Candidate in Statistics & Data Science — Machine Learning · Time Series · LLMs

LinkedIn Location


🚀 About Me

Data scientist and researcher with a strong statistical foundation (probability, Bayesian methods, time-series). I build the full lifecycle — from data pipelines and feature engineering to model training, rigorous validation, and deployment — across machine learning, deep learning, and LLMs. Recent work spans commodity-return forecasting, urban-mobility analytics, and domain-specific Spanish language models.

  • 🔭 Currently: applying ML/DL to real problems and shipping reproducible, documented projects.
  • 🌱 Comfortable from classic ML & statistics to LLM fine-tuning + RAG.
  • 🗣️ Spanish (native) · English (intermediate, conversational).

🔬 Focus Areas

  • Machine Learning & Deep Learning — predictive modeling, gradient boosting, neural nets.
  • Time Series & Forecasting — walk-forward validation, backtesting, high-frequency data.
  • Statistical Modeling — inference, Bayesian methods, uncertainty quantification.
  • LLMs / NLP — fine-tuning (LoRA/QLoRA), retrieval-augmented generation (RAG).
  • Applied Data Science — data pipelines, dashboards, and clear communication of results.

🛠️ Tech Stack

Languages

Python R SQL

ML / Deep Learning

PyTorch scikit-learn XGBoost LightGBM CatBoost PyMC statsmodels

LLMs / NLP

Hugging Face Transformers LoRA / QLoRA RAG

Data & Tooling

pandas NumPy PostgreSQL Streamlit Docker Git Linux


📌 Featured Projects

Quantitative research: do weather & climate-risk features improve commodity return forecasts? Walk-forward validation, XGBoost/LightGBM, Bayesian methods, cost-aware backtesting (Sharpe, IC, drawdown). Python · XGBoost · LightGBM · PyMC · time-series · backtesting

Reproducible pipeline + interactive dashboard for Mexico City mobility (GTFS, ECOBICI GBFS, C5). Ingestion, data-quality reports, KPIs, and 7-day demand forecasting. Python · Streamlit · data-pipeline · XGBoost / LightGBM / CatBoost

Domain-specific Spanish LLM pipeline: dataset cleaning/deduplication, QLoRA fine-tuning (HuggingFace + TRL + PEFT), and RAG on PostgreSQL + pgvector with an evaluation suite. PyTorch · Hugging Face · QLoRA · RAG · pgvector


📊 GitHub Stats

Gaiska's GitHub stats Top Languages


“Transforming data into actionable insights is not just my profession, it's my passion.”

Pinned Loading

  1. agrollm-es agrollm-es Public

    Spanish domain LLM pipeline: custom dataset → QLoRA fine-tuning (Qwen2.5) → RAG on PostgreSQL/pgvector → evaluation. Runs locally on a single GPU.

    Python 1

  2. CDMXMP CDMXMP Public

    CDMX Mobility Pulse es una plataforma de análisis de movilidad urbana que integra datos abiertos de CDMX (C5, GTFS y ECOBICI), genera indicadores y un dashboard interactivo para apoyar decisiones o…

    Python 1

  3. climate-commodity-alpha-lab climate-commodity-alpha-lab Public

    Quantitative research: do weather & climate-risk features improve commodity return forecasts? Walk-forward validation, XGBoost/LightGBM, cost-aware backtesting.

    Jupyter Notebook 1

  4. customer-churn-intelligence customer-churn-intelligence Public

    Production-aware customer churn prediction (Telco): tested pipeline with preprocessing, Logistic Regression + XGBoost, evaluation and SHAP explainability.

    Python