Skip to content

Latest commit

 

History

History
124 lines (87 loc) · 4.11 KB

File metadata and controls

124 lines (87 loc) · 4.11 KB

Changelog

All notable changes to this project will be documented in this file.

[0.1.7] - 2026-05-17

Added

  • Align README import examples with current crawl4md init exports

[0.1.6] - 2026-05-17

Added

  • Prefix extracted image text with Figure: "..." in remove_images fixtures
  • Localize remove_images figure labels from HTML language with en fallback
  • Propagate detected page language through fetch/convert pipeline into all rule apply calls

Refactored

  • Move markdown-converter session group resolution into test suite
  • Move internal helper functions to crawl4md.utils.check_helpers
  • Move Markdown converter session pydantic models out of tests
  • Extract progress printing helper from tests support module
  • Extract Markdown converter session loading and HTML normalization helpers
  • Extract Markdown converter fetcher/session discovery helpers from tests
  • Extract pipeline check cases into helper class without behavior change
  • Auto-derive progress checks from helper method list
  • Introduce shared ProgressChecksTestCase base for pipeline/profile progress-driven checks
  • Inline HTML fetching/normalization into BaseMarkdownFetcher and remove fetch/html.py
  • Rename Markdown fetcher modules/classes to HTML fetcher naming and update imports
  • Move test progress runner from tests/support into crawl4md.utils and update imports
  • Move preprocessing RuleCase/data-provider helpers into crawl4md.utils and update test imports
  • Move language label mapping into crawl4md/i18n/labels module
  • Move frames module into crawl4md.utils and update check helper imports
  • Move check CLI module into crawl4md.commands and update script entry points
  • Move top-level runtime modules into crawl4md.core and update imports/entrypoint

Test

  • Migrate crawl4ai content-selector unit checks to session fixtures
  • Remove direct unittest main guard from Markdown converter test module
  • Replace private _asyncioRunner access with get_running_loop in asyncSetUp

Style

  • Reduce panel title padding in test headers

[0.1.5] - 2026-05-16

Added

  • Add data-driven Markdown converter tests
  • Suppress Crawl4AI output during Markdown conversion
  • Add Boeing 707 Wikipedia converter fixture
  • Add Boeing 707 markdown-fit converter fixture
  • Document Markdown converter fixture conventions
  • Add crawl step duration output
  • Add group filtering to Markdown converter checks
  • Handle missing Markdown converter test groups
  • Add list fixture for Markdown preprocessing
  • Normalize spacing between Markdown list items
  • Add kreuzberg-dev Markdown parser
  • Add preprocessing option to normalize Markdown tables
  • Split whitespace and linebreak normalization rules
  • Add preprocessing option to remove cite links
  • Move crawled output to crawled and store docs separately
  • Add base classes for markdown converters and fetchers

Refactored

  • Rename Markdown fetcher and converter for Crawl4AI

[0.1.4] - 2026-05-03

Fixed

  • Suppress crawl4ai SyntaxWarning under Python 3.14

[0.1.3] - 2026-05-02

Added

  • Add realistic HTTP headers to HtmlFetcher to prevent 403 responses

[0.1.2] - 2026-05-02

Added

  • Update README for PyPI package usage and clarify batch crawler setup

[0.1.1] - 2026-05-02

Added

  • Add uv check command for tests and Ruff linting
  • Export public Python API and expand README with usage and crawl4ai context

Refactored

  • Split fetch_markdown into fetch and convert layers
  • Move markdown preprocessing from CLI into convert pipeline
  • Refactor markdown fetch/convert into classes and add sync APIs

Removed

  • Crawl4AI Logging

[0.1.0] - 2026-05-02

Added

  • Initial release
  • CLI for crawling single pages and sitemaps
  • YAML-based project configuration
  • Deterministic Markdown file output
  • Support for multiple Markdown extraction modes
  • Configurable Markdown preprocessing pipeline
  • Automatic cleanup of common wiki and web artifacts
  • Automatic removal of reference and appendix sections
  • Whitespace and document structure normalization
  • Automatic insertion of missing top-level headings
  • Clear separation of crawling, preprocessing, and file writing
  • Basic test coverage for core Markdown processing