Skip to content

Backfill old supporting quotes to include spaces at element boundaries #147

@monneyboi

Description

@monneyboi

After adding separator=" " to BeautifulSoup.get_text(), new quotes extracted by the LLM will have proper spaces at HTML element boundaries (e.g. "In office 27 March 2014" instead of "In office27 March 2014").

Old quotes in the database still have the merged format. The frontend highlighter handles both formats via its boundary space skipping logic, so highlighting still works. But it would be cleaner to normalize the old quotes.

Approach: For each source with archived HTML, re-extract text with get_text(separator=" "), then for each old quote find the corresponding span in the new text and update the property_references.supporting_quotes column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    loomPoliloom core project issues

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions