After adding separator=" " to BeautifulSoup.get_text(), new quotes extracted by the LLM will have proper spaces at HTML element boundaries (e.g. "In office 27 March 2014" instead of "In office27 March 2014").
Old quotes in the database still have the merged format. The frontend highlighter handles both formats via its boundary space skipping logic, so highlighting still works. But it would be cleaner to normalize the old quotes.
Approach: For each source with archived HTML, re-extract text with get_text(separator=" "), then for each old quote find the corresponding span in the new text and update the property_references.supporting_quotes column.
After adding
separator=" "toBeautifulSoup.get_text(), new quotes extracted by the LLM will have proper spaces at HTML element boundaries (e.g."In office 27 March 2014"instead of"In office27 March 2014").Old quotes in the database still have the merged format. The frontend highlighter handles both formats via its boundary space skipping logic, so highlighting still works. But it would be cleaner to normalize the old quotes.
Approach: For each source with archived HTML, re-extract text with
get_text(separator=" "), then for each old quote find the corresponding span in the new text and update theproperty_references.supporting_quotescolumn.