feat: Scoring V2 and NewsAPI integration with budget enforcement#4
Conversation
- Create comprehensive design doc (docs/design/scoring_v2_macro_first.md) - Archive Task 22 (design-only task) - Create Task 23 for implementation - Document design decisions and acceptance criteria
…ement
Scoring V2 (Task 23):
- Add macro-first scoring algorithm (heuristics_v2.py)
- Source tier scoring (T1: 20pts, T2: 15pts, T3: 10pts)
- Recency decay with configurable half-life
- Title keyword boosting for high-impact topics
- Configurable via providers.scoring: heuristic_v2
NewsAPI Integration (Task 24):
- Add TheNewsAPI.com as alternative ingestion source
- Multi-key rotation (round-robin/failover strategies)
- Smart pagination with sliding window duplicate detection
- Budget enforcement: stops when usage_remaining <= threshold
- Fix: properly convert usage headers from string to int
- Configurable via apis/newsapi_{stream}.txt
Analysis tooling:
- Add comparison eval framework for A/B testing scoring algorithms
- Generate reports comparing v1 vs v2 output rankings
Tests: 559 passed, 5 skipped
|
@copilot check and verify this PR |
|
@satriapamudji I've opened a new pull request, #5, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Pull request overview
This PR implements two major features for the Argus news aggregation system:
Scoring V2 (Macro-First): A new scoring algorithm that prioritizes macro-economic news over clickbait content using domain-based tier weighting, recency decay, keyword boosting for high-impact topics, and penalties for low-quality content.
NewsAPI Integration: Adds TheNewsAPI.com as an alternative ingestion source to RSS feeds, featuring multi-key rotation, smart pagination with sliding window duplicate detection, and budget enforcement to prevent API quota exhaustion.
Reviewed changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_scoring_v2.py | Comprehensive test suite (26 tests) for v2 scoring algorithm |
| tests/test_ingestion_api_newsapi.py | Test suite (23 tests) for NewsAPI ingestion provider |
| tests/test_pipeline_registry.py | Updated registry tests for new providers |
| tests/test_config_providers.py | Updated config tests for v2 default |
| src/argus/scoring/heuristics_v2.py | New v2 scoring implementation with macro-first prioritization |
| src/argus/scoring/types.py | Extended ScoringCandidate with feed_url and author fields |
| src/argus/scoring/worker.py | Updated to support v2 scoring |
| src/argus/pipeline/registry.py | Registered new providers |
| src/argus/pipeline/providers/* | New NewsAPI provider implementations |
| src/argus/config.py | Added NewsApiConfig and changed default scorer to v2 |
| config.yaml | Updated provider defaults |
| docs/integrations/newsapi.md | Comprehensive NewsAPI documentation |
| docs/design/scoring_v2_macro_first.md | Detailed design document for v2 |
| tasks/03_archive/* | Task documentation files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parse_iso_datetime, | ||
| ) | ||
| from argus.pipeline.providers.ingestion_api_newsapi import NewsApiIngestionProvider | ||
| from argus.pipeline.providers.news_api_client import NewsApiResponse, NewsArticle |
There was a problem hiding this comment.
Import of 'NewsApiResponse' is not used.
| # Unicode characters in titles/snippets. Ensure we never crash while printing. | ||
| try: | ||
| sys.stdout.reconfigure(errors="backslashreplace") | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Best-effort: if stdout cannot be reconfigured (e.g., on unsupported platforms), | |
| # continue without crashing; this only affects how Unicode errors are displayed. |
Summary
This PR implements two major features:
Scoring V2 (Task 23)
providers.scoring: heuristic_v2NewsAPI Integration (Task 24)
usage_remaining <= min_remaining_budgetapis/newsapi_{stream}.txtproviders.ingestion: api_newsapiAnalysis Tooling
Key Files
src/argus/scoring/heuristics_v2.py,src/argus/pipeline/providers/scoring_heuristic_v2.pysrc/argus/pipeline/providers/news_api_client.py,ingestion_api_newsapi.pysrc/argus/config.py(NewsApiConfig),apis/newsapi_us_markets.txtdocs/integrations/newsapi.md,tasks/03_archive/task_documentation/24_NewsAPIIntegration.mdtests/test_scoring_v2.py(26 tests),tests/test_ingestion_api_newsapi.py(23 tests)Testing
Configuration