Boundary Bot doesn't scrape legislation links or titles for ECOs with draft legislation

Boundary Bot doesn't scrape legislation links or titles for ECOs before their legislation has been made (and the LGBCE site is updated). 

The relevant code is:
https://github.com/DemocracyClub/EveryElection/blob/77adf175a4b80e0a5841764e35319049e6eda504/every_election/apps/organisations/boundaries/boundary_bot/spider.py#L90-L134

This method in the scrapy spider gets the ECO link and title from containers like this one on the LGBCE review detail pages:

Detail page of a completed review with made legislation:
![image](https://github.com/user-attachments/assets/f761573c-b61f-498e-bca8-1470ba6b479e)

However, reviews that are laid, have a container like this:

Detail page of draft review with laid legislation:
![image](https://github.com/user-attachments/assets/a8e7f257-d8ba-4eac-b0aa-552d752dff3a)


This container doesn't have the legislation title in it, so `get_eco_title_and_link` won't see it as a relevant review and won't return it. 

I'm currently processing ECOs still in 'draft' and I have to manually enter the title and legislation links. If we're going to continue to process draft ECOs, then we might want to adapt this method to handle cases like the above.

	def get_eco_title_and_link(self, response, latest_event):
	def get_link_title(selector):
	return selector.xpath(
	'*/div[@class="link-title"]//text()'
	).extract_first()

	def get_link(selector):
	return selector.xpath("*/a/@href").extract_first()

	def is_relevant_review(title):
	return (
	"(electoral changes) order" in title.lower()
	or "(structural changes) order" in title.lower()
	or "greater london authority" # edge case to handle https://www.lgbce.org.uk/all-reviews/greater-london-authority
	in title.lower()
	)

	links = [
	(get_link_title(selector), get_link(selector))
	for selector in response.xpath(
	'//div[@class="latest-information"]//div[@class="link-name-and-view-container"]'
	)
	]

	made_ecos = [
	(title, link) for title, link in links if is_relevant_review(title)
	]

	if latest_event == "Effective date" and len(made_ecos) == 1:
	# This catches draft links and made links.
	# Sometimes they put a draft link in where the made link should go.
	# So, if the change is 'effective', and we only have a draft link,
	# use it
	return made_ecos[0]

	made_ecos = [
	(title, link)
	for title, link in made_ecos
	if is_relevant_review(title) and "ukdsi" not in link
	]

	if len(made_ecos) == 1:
	return made_ecos[0]

	return None, None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Boundary Bot doesn't scrape legislation links or titles for ECOs with draft legislation #2318

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Boundary Bot doesn't scrape legislation links or titles for ECOs with draft legislation #2318

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions