Large Scale Machine Learning

This repository contains code for an university project regarding "Data Intensive Computing" and Large Scale Machine Learning models. A report can be found under https://mkleinegger.github.io/spark-svm-amazon-reviews/report.pdf. Furthermore, it consists of the following files.

notebook.ipynb: The Jupyter notebook containing the solution to the tasks of the exercise.
output_rdd.txt: The output of the RDD based solution.
output_ds.txt: The output of the Dataset/DataFrame based solution.
report.pdf: The report of the exercise.
evaluation.ipynb: The Jupyter notebook containing the evaluation of the solutions and the code to generate the plots.

Additionally, there are other files, like output.txt and grid_search_evaluation.csv, which are outputs of the notebook and/or needed for the evaluation and the stopwords.txt file containing the stopwords used for filtering in the exercise.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
docs		docs
src		src
README.md		README.md
grid_search_evaluation.csv		grid_search_evaluation.csv
output.txt		output.txt
output_ds.txt		output_ds.txt
output_rdd.txt		output_rdd.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Scale Machine Learning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Large Scale Machine Learning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages