Lab 02 mendoza#6
Open
RodrigoM10 wants to merge 19 commits into
Open
Conversation
- data/raw/events.jsonl: 8 eventos (signup, login, purchase, logout) - data/processed/signups.json: 3 signups filtrados por process_events.py - data/processed/sales_by_country.csv: ventas por país exportadas desde PostgreSQL - docs/decisions.md: decisiones 003 (JSONL) y 004 (pipeline de procesamiento)
Datasets: - data/raw/olist/: 3000 órdenes Olist (ene-feb 2018), 8 tablas relacionales - data/raw/events/github_events.jsonl: 2000 eventos GitHub Archive (2024-01-15) Schema: - sql/001_schema.sql: 8 tablas con FK (customers, sellers, products, orders, order_items, order_payments, order_reviews, category_translations) Scripts: - scripts/load_postgres.py: carga las 8 tablas desde CSVs con psycopg2 - scripts/download_data.py: descarga datasets completos (Kaggle + gharchive.org) Atribución: CREDITS.md
- Lee data/raw/events/github_events.jsonl (real, 2000 eventos) - Filtra PushEvent → data/processed/push_events.json (1658 eventos) - Bootstrap: elimina creación de events.jsonl sintético, crea directorios correctos - Decisions.md: actualiza entrada 004 con nueva fuente y output - Limpia signups.json y sales_by_country.csv obsoletos Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/ cubre los 9 scripts: process_events, query_analytics, load_postgres, download_data, upload_to_object_storage, produce/consume SQS y Kafka - Mocks para boto3, psycopg2, kafka-python — corren sin Docker ni servicios - Refactor mínimo de scripts para testabilidad: main() wrapper, clientes creados dentro de main() y pasados como parámetros a subfunciones - query_analytics: migrado a fetchall() para evitar dependencia de numpy; actualizado para usar push_events.json en lugar de sales_by_country.csv - produce/consume_kafka: actualizados al formato GitHub Archive (type/actor/repo) - Agrega pytest>=8.0 a requirements.txt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Los CSVs tienen texto en portugués con caracteres especiales. open() sin encoding usa el locale del sistema, que en algunos entornos no es UTF-8. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix CSV UTF-8 BOM
lab-01: entorno levantado, decision de CodeSpace documentada
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.