Feat/pdf parser#323
Closed
michaeltomlinsontuks wants to merge 8 commits into
Closed
Conversation
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




PR: Timetable Solver and PDF Parser Integration
Branch:
feat/pdf-parserTarget:
devSummary
This pull request integrates a Python-based FastAPI PDF parser service and a NestJS solver/parser controller framework. It establishes asynchronous job processing queues (using BullMQ and Redis) to handle timetable optimizations and PDF document parsing asynchronously, rather than holding HTTP requests open during intensive tasks.
Motivation
Design Decisions
1. Queue-Based Processing via Redis and BullMQ
Decision: Enqueues PDF parsing and solver jobs into distinct BullMQ queues (
pdf-parseandsolver-optimize) and processes them asynchronously via worker callbacks.Rationale: Standard request-response cycles are inadequate for long-running CPU-bound tasks. This queue structure decouples job submission from completion, providing robust retry logic, backoff, and resource isolation.
2. Standalone Python PDF Parsing Scalable Service
Decision: Built
apps/pdf_parseras a containerized Python service exposing FastAPI endpoints for parsing University of Pretoria (UP) schedule PDFs.Rationale: Python's ecosystem contains superior tools for PDF layout analysis and text extraction. Containerizing this as a separate microservice behind Traefik makes it modular, reusable, and easy to deploy.
3. Public Worker Callbacks
Decision: Exposed callback endpoints with NestJS
@Public()bypass guards to allow workers to submit processing results back to the core backend.Rationale: Background workers do not operate with active user session tokens. Public, validated callback endpoints allow workers to report results asynchronously without manual session generation.
Files Changed
New Files
apps/backend/src/pdf-parser/dto/pdf-parser.dto.tsapps/backend/src/pdf-parser/pdf-parser.controller.tsapps/backend/src/pdf-parser/pdf-parser.module.tsapps/backend/src/pdf-parser/pdf-parser.service.tsapps/backend/src/redis/queue.constants.tspdf-parseandsolver-optimize) to prevent string duplication.apps/backend/src/redis/redis-queue.module.tsREDIS_URLconnection parameter.apps/backend/src/solver/dto/solver.dto.tsapps/backend/src/solver/solver.controller.tsapps/backend/src/solver/solver.module.tsapps/backend/src/solver/solver.service.tsapps/pdf_parser/.dockerignoreapps/pdf_parser/main.pyapps/pdf_parser/parser/__init__.pyapps/pdf_parser/parser/base_parser.pyBaseParserinterface for schedule document parsing.apps/pdf_parser/parser/data_processor.pyapps/pdf_parser/parser/up_parser.pyapps/pdf_parser/pdf_parser.Dockerfileapps/pdf_parser/requirements.txtapps/pdf_parser/static/brand/...apps/pdf_parser/swagger_ui.pyapps/pdf_parser/up_test_pdfs/...apps/pdf_parser/verify_up_parser.pyModified Files
apps/backend/src/app.module.tsRedisQueueModule,PdfParserModule, andSolverModule.apps/solver/solver.Dockerfiledocker-compose.prod.ymlpdf-parsercontainer with Traefik routing configuration for production environment.docker-compose.ymlpdf-parserconfiguration, maps local Redis container port6379, and defines local profiles.package.jsonpdf-parser:install,pdf-parser:verify).API Endpoints (If Applicable)
POST/pdf-parser/submitPOST/pdf-parser/callbackPOST/solver/submitPOST/solver/callbackPOST/parse(Python App)GET/health(Python App)