Boundary Bot doesn't scrape legislation links or titles for ECOs before their legislation has been made (and the LGBCE site is updated).
The relevant code is:
|
def get_eco_title_and_link(self, response, latest_event): |
|
def get_link_title(selector): |
|
return selector.xpath( |
|
'*/div[@class="link-title"]//text()' |
|
).extract_first() |
|
|
|
def get_link(selector): |
|
return selector.xpath("*/a/@href").extract_first() |
|
|
|
def is_relevant_review(title): |
|
return ( |
|
"(electoral changes) order" in title.lower() |
|
or "(structural changes) order" in title.lower() |
|
or "greater london authority" # edge case to handle https://www.lgbce.org.uk/all-reviews/greater-london-authority |
|
in title.lower() |
|
) |
|
|
|
links = [ |
|
(get_link_title(selector), get_link(selector)) |
|
for selector in response.xpath( |
|
'//div[@class="latest-information"]//div[@class="link-name-and-view-container"]' |
|
) |
|
] |
|
|
|
made_ecos = [ |
|
(title, link) for title, link in links if is_relevant_review(title) |
|
] |
|
|
|
if latest_event == "Effective date" and len(made_ecos) == 1: |
|
# This catches draft links and made links. |
|
# Sometimes they put a draft link in where the made link should go. |
|
# So, if the change is 'effective', and we only have a draft link, |
|
# use it |
|
return made_ecos[0] |
|
|
|
made_ecos = [ |
|
(title, link) |
|
for title, link in made_ecos |
|
if is_relevant_review(title) and "ukdsi" not in link |
|
] |
|
|
|
if len(made_ecos) == 1: |
|
return made_ecos[0] |
|
|
|
return None, None |
This method in the scrapy spider gets the ECO link and title from containers like this one on the LGBCE review detail pages:
Detail page of a completed review with made legislation:

However, reviews that are laid, have a container like this:
Detail page of draft review with laid legislation:

This container doesn't have the legislation title in it, so get_eco_title_and_link won't see it as a relevant review and won't return it.
I'm currently processing ECOs still in 'draft' and I have to manually enter the title and legislation links. If we're going to continue to process draft ECOs, then we might want to adapt this method to handle cases like the above.
Boundary Bot doesn't scrape legislation links or titles for ECOs before their legislation has been made (and the LGBCE site is updated).
The relevant code is:
EveryElection/every_election/apps/organisations/boundaries/boundary_bot/spider.py
Lines 90 to 134 in 77adf17
This method in the scrapy spider gets the ECO link and title from containers like this one on the LGBCE review detail pages:
Detail page of a completed review with made legislation:

However, reviews that are laid, have a container like this:
Detail page of draft review with laid legislation:

This container doesn't have the legislation title in it, so
get_eco_title_and_linkwon't see it as a relevant review and won't return it.I'm currently processing ECOs still in 'draft' and I have to manually enter the title and legislation links. If we're going to continue to process draft ECOs, then we might want to adapt this method to handle cases like the above.