propaganda-SI-TC

SEMEVAL 2020 TASK 11 "DETECTION OF PROPAGANDA TECHNIQUES IN NEWS ARTICLES"

task link: https://propaganda.qcri.org/semeval2020-task11/index.html

check our paper: https://arxiv.org/pdf/2008.10163.pdf

Background

We refer to propaganda whenever information is purposefully shaped to foster a predetermined agenda. Propaganda uses psychological and rhetorical techniques to reach its purpose. Such techniques include the use of logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first sight, might seem correct and objective. However, a careful analysis shows that the conclusion cannot be drawn from the premise without the misuse of logical rules. Another set of techniques makes use of emotional language to induce the audience to agree with the speaker only on the basis of the emotional bond that is being created, provoking the suspension of any rational analysis of the argumentation. All of these techniques are intended to go unnoticed to achieve maximum effect.

Our Approach

This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.

How to Run

add datasets folder from SEMEVAL 2020 TASK 11 "DETECTION OF PROPAGANDA TECHNIQUES IN NEWS ARTICLES"
add pytorch_pretrained_bert from https://github.com/facebookresearch/SpanBERT/tree/master/code
add spanbert_hf_base from https://github.com/facebookresearch/SpanBERT
run process_data.py to generate train and dev dataset for SI and TC separetely
create a folder 'pro_output' to store the fine-tuned model
run SI+SpanBERT, SI+BERT, TC+SpanBERT, TC+BERT --model spanbert_hf_base --output_dir pro_output --train_file datasets/sI/train.json --test_file datasets/sI/dev.json --do_test --version_2_with_negative --dev_file datasets/sI/train.json --do_train

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
SI		SI
TC		TC
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

propaganda-SI-TC

Background

Our Approach

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

propaganda-SI-TC

Background

Our Approach

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages