Skip to content

Boundary Bot doesn't scrape legislation links or titles for ECOs with draft legislation #2318

Description

@awdem

Boundary Bot doesn't scrape legislation links or titles for ECOs before their legislation has been made (and the LGBCE site is updated).

The relevant code is:

def get_eco_title_and_link(self, response, latest_event):
def get_link_title(selector):
return selector.xpath(
'*/div[@class="link-title"]//text()'
).extract_first()
def get_link(selector):
return selector.xpath("*/a/@href").extract_first()
def is_relevant_review(title):
return (
"(electoral changes) order" in title.lower()
or "(structural changes) order" in title.lower()
or "greater london authority" # edge case to handle https://www.lgbce.org.uk/all-reviews/greater-london-authority
in title.lower()
)
links = [
(get_link_title(selector), get_link(selector))
for selector in response.xpath(
'//div[@class="latest-information"]//div[@class="link-name-and-view-container"]'
)
]
made_ecos = [
(title, link) for title, link in links if is_relevant_review(title)
]
if latest_event == "Effective date" and len(made_ecos) == 1:
# This catches draft links and made links.
# Sometimes they put a draft link in where the made link should go.
# So, if the change is 'effective', and we only have a draft link,
# use it
return made_ecos[0]
made_ecos = [
(title, link)
for title, link in made_ecos
if is_relevant_review(title) and "ukdsi" not in link
]
if len(made_ecos) == 1:
return made_ecos[0]
return None, None

This method in the scrapy spider gets the ECO link and title from containers like this one on the LGBCE review detail pages:

Detail page of a completed review with made legislation:
image

However, reviews that are laid, have a container like this:

Detail page of draft review with laid legislation:
image

This container doesn't have the legislation title in it, so get_eco_title_and_link won't see it as a relevant review and won't return it.

I'm currently processing ECOs still in 'draft' and I have to manually enter the title and legislation links. If we're going to continue to process draft ECOs, then we might want to adapt this method to handle cases like the above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions