🐶 OhMyScrapper - v0.10.2

OhMyScrapper scrapes texts and urls looking for links and jobs-data to create a final report with general information about job positions.

Scope

Read texts;
Extract and load urls;
Scrapes the urls looking for og:tags and titles;
Export a list of links with relevant information;

Installation

You can install directly in your pip:

pip install ohmyscrapper

I recomend to use the uv, so you can just use the command bellow and everything is installed:

uv add ohmyscrapper
uv run ohmyscrapper --version

But you can use everything as a tool, for example:

uvx ohmyscrapper --version

How to use and test (development only)

OhMyScrapper works in 3 stages:

It collects and loads urls from a text in a database;
It scraps/access the collected urls and read what is relevant. If it finds new urls, they are collected as well;
Export a list of urls in CSV files;

You can do 3 stages with the command:

ohmyscrapper start

Remember to add your text file in the folder /input with the name that finishes with .txt!

You will find the exported files in the folder /output like this:

/output/report.csv
/output/report.csv-preview.html
/output/urls-simplified.csv
/output/urls-simplified.csv-preview.html
/output/urls.csv
/output/urls.csv-preview.html

BUT: if you want to do step by step, here it is:

First we load a text file you would like to look for urls. It it works with any txt file.

The default folder is /input. Put one or more text (finished with .txt) files in this folder and use the command load:

ohmyscrapper load

or, if you have another file in a different folder, just use the argument -input like this:

ohmyscrapper load -input=my-text-file.txt

In this case, you can add an url directly to the database, like this:

ohmyscrapper load -input=https://cesarcardoso.cc/

That will append the last url in the database to be scraped.

That will create a database if it doesn't exist and store every url the oh-my-scrapper find. After that, let's scrap the urls with the command scrap-urls:

ohmyscrapper scrap-urls --recursive --ignore-type

That will scrap only the linkedin urls we are interested in. For now they are:

linkedin_post: https://%.linkedin.com/posts/%
linkedin_redirect: https://lnkd.in/%
linkedin_job: https://%.linkedin.com/jobs/view/%
linkedin_feed" https://%.linkedin.com/feed/%
linkedin_company: https://%.linkedin.com/company/%

But we can use every other one generically using the argument --ignore-type:

ohmyscrapper scrap-urls --ignore-type

And we can ask to make it recursively adding the argument --recursive:

ohmyscrapper scrap-urls --recursive

!!! important: we are not sure about blocks we can have for excess of requests

And we can finally export with the command:

ohmyscrapper export
ohmyscrapper export --file=output/urls-simplified.csv --simplify
ohmyscrapper report

To monitor recent scraping jobs locally, start the dashboard:

ohmyscrapper dashboard

Then open http://127.0.0.1:8765. Use --host and --port to bind a different local address.

That's the basic usage! But you can understand more using the help:

ohmyscrapper --help

License

This package is distributed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
.github/workflows		.github/workflows
src/ohmyscrapper		src/ohmyscrapper
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐶 OhMyScrapper - v0.10.2

Scope

Installation

How to use and test (development only)

BUT: if you want to do step by step, here it is:

See Also

License

About

Uh oh!

Releases 35

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐶 OhMyScrapper - v0.10.2

Scope

Installation

How to use and test (development only)

BUT: if you want to do step by step, here it is:

See Also

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages