Machine Learning Operations Project

Project Description

The overall goal of this project is to use transformers for predicting whether a given tweet is about a real disaster or not. We intend to ensure reproducibility of the code using the principles presented in the MLOps course. The experiment visualizer wandb will be used for creating dashboards showing the main results of this project. Additionally, profilers may be used for identifying possible bottlenecks in the code.

The framework we will be using is Transformers from the Pytorch Ecosystem. A pre-trained model provided by the repository will be used to perform NLP tasks such as feature extraction and prediction. This includes ConvBERT from the transformers framework which is an autoencoding model that has achived good performance in natural language processing (NLP) tasks while reducing model parameters and training time compared to the BERT model.

Data is from the competition "Natural Language Processing with Disaster Tweets" found on Kaggle. It consists of the text from a tweet, a keyword from that tweet and the location where that tweet was sent from. It should be noted that the keyword and location may be blank. These data set features will be used to classify which tweets are about real disasters and which ones are not. The data set contains 7,613 and 3,263 observations for training and testing respectively.

If time permits, we will submit our contribution to the Kaggle competition.

Project Organization

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
└── src                <- Source code for use in this project.
    ├── __init__.py    <- Makes src a Python module
    │
    ├── data           <- Scripts to download or generate data
    │   └── make_dataset.py
    │
    ├── models         <- Scripts to train models and then use trained models to make
    │   │                 predictions
    │   ├── predict_model.py
    │   └── train_model.py
    │
    └── visualization  <- Scripts to create exploratory and results oriented visualizations
        └── visualize.py

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
batch_scripts		batch_scripts
conf		conf
references		references
reports		reports
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Operations Project

Project Description

Project Organization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Operations Project

Project Description

Project Organization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages