This repository contains the scripts and documents used in the NER tasks on the document "Güemes Documentado"
Resultados: Here we have the results obtained on the different models evaluated. They are in an excel format
Stanford: There is the JAR file and the spanish model for the execution of Stanford model.
configs: Config files needed to run the Spacy script. Here we have some settings like the pipeline, language, batch size, architecture, etc.
dataset: Here are the GD dataset in three formats: In plain text, jsonl which is the format exported by Doccano and in conll format.
GD-1_Stanford.txt: File generated by Stanford script with the entities labeled.
NER_using_spaCy_CoNLL.ipynb: Spacy script.
Spacy: Deprecated Spacy model with incorrect tokenization and classification.
SpanBERTa - NER Spanish BERT.ipynb: Script for Transformer based models.
StanfordNERTagger.ipynb: Stanford script.
run_ner.py: Fine-tuning the models for NER.
utils_ner.py: Utilities for the fine-tuning task.