Data.Vlaanderen Turtle scraper

Install

Clone data.vlaanderen.be2-generated in the parent directory and use the branch production via

git clone -b production https://github.com/Informatievlaanderen/data.vlaanderen.be2-generated ../data.vlaanderen.be-generated

Install the dependencies via
```
npm i
```

Usage

Run the scraper with its default config via
```
node bin/cli.js
```
Remove duplicate lines and sort them via
```
sort -u output.nt > output.unique.nt
```
Create a config file called config.json to overwrite the default config of the scraper.

Config file

shacl-files: This object configures how the scraper should handle SHACL files.
- enabled: If true the scraper includes SHACL false. The default is false.
log-level: The level used by the scraper's logger. The default is warn.
generated-files-repo: The path to the clone of this repo. The default is ../data.vlaanderen.be2-generated.

Generate subsets

The generated Turtle file is large. You can generate subsets of this file via

npm run subsets

This generates two Turtle files:

classes-ap.ttl contains the triples that connects classes to the application profiles that uses them.
predicates-ap.ttl contains the triples that connects predicates to the application profiles that uses them.

Deployment

We run the scrapper every day via the Gitlab CI. You find the output in this repo. We use a personal GitHub authentication token to push to this repo. We use Pieter Heyvaert's name and email when doing the commit. You can change this at deploy.before_script in .gitlab-ci.yml.

Example queries

You find examples SPARQL queries in the directory queries.

all-aps-that-use-dcterms-title.rq returns all application profiles that use dcterms:title.
properties-used-by-ap-vrachtwagenparkeren.rq returns all properties used by the application profile "Vrachtwagenparkeren".

Validation

We do a basic validation of the scraped output in validation/test.js. It checks whether specific APs and NodeShapes are present.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bin		bin
queries		queries
src		src
validation		validation
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.markdownlint-cli2.cjs		.markdownlint-cli2.cjs
README.md		README.md
config-ci.json		config-ci.json
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data.Vlaanderen Turtle scraper

Install

Usage

Config file

Generate subsets

Deployment

Example queries

Validation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data.Vlaanderen Turtle scraper

Install

Usage

Config file

Generate subsets

Deployment

Example queries

Validation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages