Skip to content

mkleinegger/spark-svm-amazon-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Scale Machine Learning

This repository contains code for an university project regarding "Data Intensive Computing" and Large Scale Machine Learning models. A report can be found under https://mkleinegger.github.io/spark-svm-amazon-reviews/report.pdf. Furthermore, it consists of the following files.

  • notebook.ipynb: The Jupyter notebook containing the solution to the tasks of the exercise.
  • output_rdd.txt: The output of the RDD based solution.
  • output_ds.txt: The output of the Dataset/DataFrame based solution.
  • report.pdf: The report of the exercise.
  • evaluation.ipynb: The Jupyter notebook containing the evaluation of the solutions and the code to generate the plots.

Additionally, there are other files, like output.txt and grid_search_evaluation.csv, which are outputs of the notebook and/or needed for the evaluation and the stopwords.txt file containing the stopwords used for filtering in the exercise.

About

Large Scale Text Classification on Amazon Reviews Corpus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors