A machine learning (ML) model that translates a screenshot of a website to its corresponding code representation. Inspired by the pix2code problem and dataset. This project was created as university project by Timo Angerer and Marvin Knoll (@marvinknoll)
For general information about the problem, architecture and implementation, see the project documentation.
To set up the project and use the model, refer to the next section of this document.
Follow the follwing steps to train and evaluate the model locally on your machine. Note: This project uses python 3.8.8 and pytorch 1.8
-
Clone the repository
-
Install the dependencies
Run the following command to create a new conda environment named
pix2codewith the required dependencies:conda env create python=3.8.8 -f environment.ymlor install all the dependencies manually:
conda install -y -c pytorch pytorch=1.8.1 torchvision=0.9.1 cudatoolkit=10.2 conda install -y -c conda-forge tqdm=4.60.0 pillow=8.2.0 nltk=3.6.1 conda install -y -c conda-forge nb_conda_kernels=2.3.1 jupyterlab=3.0.12 -
Download the dataset
Create a new folder
datainside the project root folder, download the dataset, and extract it into the data folder.The datasets that was used is the pix2code dataset. You can download the dataset from one of the following links: Google drive, GitHub.
This is what the folder structure of the
datafolder should look like:data ├── ... └── web └── all_data ├── AF4840B2-2B9F-4ED0-A58D-E260B14858E1.gui └── ...The pix2code dataset contains three sub-datasets. The following steps are only concerned with the
webdataset. -
Split the dataset
The
train.pyandevaluate.pyscripts assume the existence of three data split filestrain_dataset.txt,test_dataset.txt, andvalidation_dataset.txt, each containing the IDs of the data examples for the respective data split. The data split files have to be at the same folder level as the folder containing the data examples.Run
split_data.pyto generate the data split files for thewebdataset:python split_data.py --data_path=./data/web/all_dataThis is what the
datafolder should look like up to this point:data ├── ... └── web ├── all_data | ├── AF4840B2-2B9F-4ED0-A58D-E260B14858E1.gui │ └── ... ├── test_dataset.txt ├── train_dataset.txt └── validation_dataset.txt -
Create the vocabulary file
You need to generate a
vocab.txtfile that contains all the tokens the model should be able to predict, separated by whitespaces.Run
build_vocab.pyto generate a vocabulary file based on the tokens that appear in the specified dataset.python build_vocab.py --data_path=./data/web/all_dataThis is what the
datafolder should look like up to this point:data ├── ... └── web ├── all_data | ├── AF4840B2-2B9F-4ED0-A58D-E260B14858E1.gui │ └── ... ├── test_dataset.txt ├── train_dataset.txt ├── validation_dataset.txt └── vocab.txt -
Train the model
Run the following command to train the model:
python train.py --data_path=./data/web/all_data --epochs=15 --save_after_epochs=5 --batch_size=4 --split=train -
Evaluate the model
Run the following command to evaluate the model:
python evaluate.py --data_path=./data/web/all_data --model_file_path=<path-to-model-file> --split=validation --vizTo visualize the results of the model evaluation, run the evaluation script with the
--vizflag and then follow the steps inside thevisualize_inference.ipynb.
- Tony Beltramelli for the original pix2code paper and the dataset.
- Imagine captioning tutorials: Basic idea of image captioning, Image captioning PyTorch, image captioning TensorFlow
- Show, attend and tell paper for image captioning