GitHub - artem-spector/getdata013: getting and cleaning data assignment

#Getting and Cleaning Data Course Project

This folder contains the following files:

CodeBook.md - describes the input and output data and the transformations
run_analysis.R - script performing the transformation.

To reproduce the transformation put the directory UCR HAR Dataset of the original data set under the working folder. Run the run_analysis() function defined in the script. The result of the processing is the file subject_activity_summary.txt written in the working folder.

To perform the required transformation the script does the following:

read the variable names from the file features.txt
create the index vector of the names containing substrings "mean()" or "std()"
since the bracket characters are illegal for the column names, remove those characters from the variable names read on step 1.
read the activity labels from the file activity_labels.txt
The data obtained in steps 1 to 4 is common for the test and training sets, and is used for processing each of them.

The test and train data sets have identical structure, the only difference is in the file names. The test files have test in their names, whereas training files have train in their names. The script defines a nested function readDataSet that accepts the direcory name as a parameter and builds the file names dynamically. The processing logic for both directories is exactly the same with the following steps:
5. read the activity numbers from the y_...txt file, and replace the numbers with labels from the step 4
6. read the subjects from the file subject_...txt
7. read the measurements data from the X_...txt file, using the vector obtained on step 3 as the column names, and apply the index vector obtained on step 2 to select only the mean and standard deviation columns.
8. construct the result data frame from the subject column from step 6, activity column from step 5, and the data columns from step 7
9. the data frames received from processing of the test and train data sets are vertically merged
10. the aggregate function is applied to the dataset received in the step 9 to calculate mean values to all the data columns, grouped by subject and activity.
11. the result data frame is written to the file subject_activity_summary.txt

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages