This repository collects the outcomes of the effort for re-engineering Polifonia Lexicon and adding it to Framester.
This repository contains the code and output produced to re-engineer the Polifonia Lexicon and prepare it for import into Framester in RDF format.
The repository is structured as follows:
-
input/: This folder contains the input data required for the lexicon re-engineering process. It includes the following subfolders:excel/: Contains the lexicon excel files on which the annotators worked. The 'initial' ones are the ones before the work of the annotators began. The 'final' ones are the ones after the work of annotators ended.csv_to_compare/: Contains the CSV files extrapolated from the excel and used for comparing the initial and final versions of the lexicon, to separate the lexicon entries. They are created using 'data_preparation.py'.csv_for_rdf/: Contains the CSV files generated from the Polifonia Lexicon, which are then transformed into RDF following Framester's schema. They are created using 'data_preparation.py'.csv_for_wn_alignment: Contains the merged CSV files generated from finding best matches for the Polifonia Lexicon in Wordnet.
-
output/: This folder contains the output generated by the re-engineering process. It includes the following subfolder:rdf/: Contains the RDF files generated from the CSVs using thelexicon_to_framester.pyscript.
-
schema/: This folder contains the project (.drawio file) for manipulating the graphical diagrams of the re-engineered Polifonia Lexicon's schema and the OWL ontology file(s) derived from the re-engineered Polifonia Lexicon's RDFs. -
script/: This folder contains the scripts required for the re-engineering process. It includes the following files:data_preparation.py: A script for transforming the Polifonia Lexicon files into CSV format.lexicon_to_framester.py: A script for converting the CSVs created from the Polifonia Lexicon into RDF data according to the Framester schema.lexicon_kg_to_onto.py: A script for converting Polifonia Lexicon's RDF files into an OWL ontology ready to be imported in Protegé for further elaboration.lexicon_to_wordnet.ipynb: A notebook for finding best matches for Wordnet synsets for the Polifonia Lexicon.enrich_input_file_EN_with_wordnet_synset_synsets.py: A script for enriching the CSVs created from the Polifonia Lexicon with the Wordnet synsets found inlexicon_to_wordnet.ipynb.wordnet_utils.py: Contains utility functions for working with Wordnet.