The goal of this Project is to develop python scripts and code to conduct a complete data analysis project on a show called My Little Pony.
Data used for this project to analyse the scripts of the show
- for this project I used "clean_dialog.csv"
Data used for this project to analyse the non-dictionary words
- for this project I used the "words_alpha.txt" file
- An average of the number of dialogues spoken by a specific pony
- The number of times a specific pony was mentioned by other ponies
- The fraction of times each pony has a line that DIRECTLY follows the others pony’s line
- a list of the 5 non-dictionary words used most often by each Pony
for example: “twilight”: [ “huh”, “ugh”, “awwww” , “wheee”, “wha”]
- Writing unittest spread across verbosity, mentions, follow-on-comments, and nondictionary words to confirm they give out the desired output.
to install pandas please run: $ pip install pandas in the command line
- Import all the data to the data/ directory
- Add the analysis.py file to the script directory
- Add the files test.py, follow.py, jsonfile.py, mentions.py, non_dictionary_words.py and the verbosity.py file to the src/hw3 directory
- Add the analysis_tester.py file to the src/hw3/tests directory
$ python3 analysis.py data/clean_dialog.csv -o [optional_json_file_for_output_in_json]