Skip to content
This repository was archived by the owner on Jul 9, 2022. It is now read-only.

Lockbit scraper fixed (now uses playwright) #74#89

Open
biligonzales wants to merge 2 commits into
captainGeech42:mainfrom
biligonzales:issues/74
Open

Lockbit scraper fixed (now uses playwright) #74#89
biligonzales wants to merge 2 commits into
captainGeech42:mainfrom
biligonzales:issues/74

Conversation

@biligonzales

Copy link
Copy Markdown
Contributor

Describe the changes

Lockbit 2.0 now uses a ddos protection mechanism hence the regular http get method is no longer working.

As a workaround I have implemented the playwright Microsoft library which behaves as if a proper browser did the request.

Summary of the changes:

  1. lockbit.py: replaced the use of requests by playwright
  2. requirements.txt: added playwright
  3. Dockerfile: added playwright chromium support as well as required libraries.

I have also upgraded at the top of the Dockerfile from python3.9-buster to python3.10-bullseye.

Related issue(s)

It fixes Issue #74

Note that the scraping engine for lockbit has been left untouched as it is still perfectly working. Only the web page retrieval method has been altered.

How was it tested?

  • docker-compose build app
  • docker-compose up --abort-on-container-exit
  • Checked that Lockbit entries have been inserted into the database

@captainGeech42

Copy link
Copy Markdown
Owner

Hey @biligonzales, thanks for working on this.

I've got a large refactor/fix collection from an anonymous contributor who maintains a private fork that I'll be merging in this weekend (just waiting to hear back from them on something), that I believe covers this and your other PR.

Once I get that merged in and assess what is still an issue, I will let you know here.

Regardless, I appreciate the PRs, thank you! Will be in touch.

@ocbrollingpaper

ocbrollingpaper commented Apr 14, 2022

Copy link
Copy Markdown

@captainGeech42 is this coming or nah? Asking because I have fully refactored scrapers and rest of the stuff to be ran in real-time without need for cronjobs

EDIT:
I also have go version

@captainGeech42

captainGeech42 commented Apr 21, 2022

Copy link
Copy Markdown
Owner

@ocbrollingpaper I guess not (was waiting to receive it from 3p, never came), feel free to PR those. Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants