Skip to content

Lab 02 mendoza#6

Open
RodrigoM10 wants to merge 19 commits into
maxflorentin:mainfrom
RodrigoM10:lab-02-Mendoza
Open

Lab 02 mendoza#6
RodrigoM10 wants to merge 19 commits into
maxflorentin:mainfrom
RodrigoM10:lab-02-Mendoza

Conversation

@RodrigoM10

Copy link
Copy Markdown

No description provided.

maxflorentin and others added 19 commits June 3, 2026 10:07
- data/raw/events.jsonl: 8 eventos (signup, login, purchase, logout)
- data/processed/signups.json: 3 signups filtrados por process_events.py
- data/processed/sales_by_country.csv: ventas por país exportadas desde PostgreSQL
- docs/decisions.md: decisiones 003 (JSONL) y 004 (pipeline de procesamiento)
Datasets:
- data/raw/olist/: 3000 órdenes Olist (ene-feb 2018), 8 tablas relacionales
- data/raw/events/github_events.jsonl: 2000 eventos GitHub Archive (2024-01-15)

Schema:
- sql/001_schema.sql: 8 tablas con FK (customers, sellers, products, orders,
  order_items, order_payments, order_reviews, category_translations)

Scripts:
- scripts/load_postgres.py: carga las 8 tablas desde CSVs con psycopg2
- scripts/download_data.py: descarga datasets completos (Kaggle + gharchive.org)

Atribución: CREDITS.md
- Lee data/raw/events/github_events.jsonl (real, 2000 eventos)
- Filtra PushEvent → data/processed/push_events.json (1658 eventos)
- Bootstrap: elimina creación de events.jsonl sintético, crea directorios correctos
- Decisions.md: actualiza entrada 004 con nueva fuente y output
- Limpia signups.json y sales_by_country.csv obsoletos

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/ cubre los 9 scripts: process_events, query_analytics, load_postgres,
  download_data, upload_to_object_storage, produce/consume SQS y Kafka
- Mocks para boto3, psycopg2, kafka-python — corren sin Docker ni servicios
- Refactor mínimo de scripts para testabilidad: main() wrapper, clientes
  creados dentro de main() y pasados como parámetros a subfunciones
- query_analytics: migrado a fetchall() para evitar dependencia de numpy;
  actualizado para usar push_events.json en lugar de sales_by_country.csv
- produce/consume_kafka: actualizados al formato GitHub Archive (type/actor/repo)
- Agrega pytest>=8.0 a requirements.txt

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Los CSVs tienen texto en portugués con caracteres especiales.
open() sin encoding usa el locale del sistema, que en algunos
entornos no es UTF-8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lab-01: entorno levantado, decision de CodeSpace documentada
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants