-
Clone data.vlaanderen.be2-generated in the parent directory and use the branch
productionviagit clone -b production https://github.com/Informatievlaanderen/data.vlaanderen.be2-generated ../data.vlaanderen.be-generated
-
Install the dependencies via
npm i
-
Run the scraper with its default config via
node bin/cli.js
-
Remove duplicate lines and sort them via
sort -u output.nt > output.unique.nt -
Create a config file called
config.jsonto overwrite the default config of the scraper.
shacl-files: This object configures how the scraper should handle SHACL files.enabled: If true the scraper includes SHACL false. The default is false.
log-level: The level used by the scraper's logger. The default iswarn.generated-files-repo: The path to the clone of this repo. The default is../data.vlaanderen.be2-generated.
The generated Turtle file is large. You can generate subsets of this file via
npm run subsetsThis generates two Turtle files:
classes-ap.ttlcontains the triples that connects classes to the application profiles that uses them.predicates-ap.ttlcontains the triples that connects predicates to the application profiles that uses them.
We run the scrapper every day via the
Gitlab CI.
You find the output in this repo.
We use a personal GitHub authentication token to push to this repo.
We use Pieter Heyvaert's name and email when doing the commit.
You can change this at deploy.before_script in .gitlab-ci.yml.
You find examples SPARQL queries in the directory queries.
all-aps-that-use-dcterms-title.rqreturns all application profiles that usedcterms:title.properties-used-by-ap-vrachtwagenparkeren.rqreturns all properties used by the application profile "Vrachtwagenparkeren".
We do a basic validation of the scraped output in validation/test.js.
It checks whether specific APs and NodeShapes are present.