scripts/
βββ pipeline.py
βββ objects_extractor.py
βββ object_version_extractor.py
βββ gnn_feature_extractor.py
βββ lgbm_feature_extractor.py
test-data/
βββ changesets.csv
βββ ovid_labels.tsv
βββ training/labels.tsv
output/
βββ objects.jsonl
βββ fetch_prev_queue.csv
βββ object_versions.jsonl
βββ processed_changesets.txt
βββ processed_versions.txt
βββ nodes.csv
βββ edges.csv
βββ labels.csv
βββ lgbm_features.csv
dataset (--dataset)
β
changeset ID μΆμΆ
β
objects_extractor
β
objects.jsonl + fetch_prev_queue.csv
β
object_version_extractor (κΈ°λ³Έ ON)
β
object_versions.jsonl
- μ΄λ―Έ μ²λ¦¬λ changeset / versionμ μλ μ€ν΅ (λμ μ€ν κ°λ₯)
- κΈ°λ³Έμ μΌλ‘ μ΄μ λ²μ (prev)λ ν¨κ» μμ§
--no-prevμ΅μ μ¬μ© μ μ΄μ λ²μ μμ§ μλ΅
κΈ°λ³Έ μ€ν μμ:
python scripts/pipeline.py --dataset changesets
λ²μ μ§μ :
python scripts/pipeline.py --dataset ovid --start 0 --end 100
μ΄μ λ²μ μμ§ λκΈ°:
python scripts/pipeline.py --dataset training --no-prev
output μ΄κΈ°ν ν λ€μ μ€ν:
python scripts/pipeline.py --dataset changesets --overwrite
| μ΅μ | μ€λͺ |
|---|---|
--dataset |
μ¬μ©ν λ°μ΄ν°μ
(changesets, ovid, training) |
--start / --end |
μ²λ¦¬ν ID λ²μ |
--output-dir |
μΆλ ₯ λλ ν 리 (κΈ°λ³Έ: ./output) |
--overwrite |
κΈ°μ‘΄ κ²°κ³Ό μ΄κΈ°ν |
--no-prev |
μ΄μ λ²μ μμ§ λΉνμ±ν (κΈ°λ³Έμ ON) |