Skip to content

Latest commit

 

History

History
655 lines (501 loc) · 25.4 KB

File metadata and controls

655 lines (501 loc) · 25.4 KB

Overview

This document outlines the standardized process for managing GitHub repositories within our epidemiology team, including repository creation, cloning procedures, branching workflows, and code review processes.

Repository Creation Process - "GitHub First" Approach

We follow the "GitHub first, then RStudio" workflow because this method automatically sets up the local Git repository for immediate pulling and pushing. Under the hood, this approach uses git clone which configures the remote GitHub repo as the origin and sets up tracking branches automatically.

1. Create Repository on GitHub

  1. Go to https://github.com and ensure you are logged in
  2. Click the green "New" button near "Repositories"
  3. Fill in the repository details:
    • Repository name: Use descriptive but brief names (letters, digits, -, ., or _ allowed)
    • Description: Brief description
    • Public or Private
    • Initialize with: Add a README file
    • Add .gitignore: Choose "R" template
  4. Click "Create repository"
  5. Click the green "<> Code" button
  6. Copy the HTTPS clone URL to your clipboard

2. Repository Structure

After cloning, organize your repository with this recommended structure:

project-name/
├── data/
├── scripts/
├── output/
├── documentation/
├── README.md
└── .gitignore

Cloning to Shared R Drive

Responsible Party

The database administrator or designated workflow manager is responsible for:

  • Cloning repositories to the shared 02_GitHub folder on the R drive
  • Maintaining the master copies of all team repositories
  • Ensuring regular synchronization with remote repositories

Cloning Process Using RStudio Git Interface

The database administrator should use the "GitHub first, then RStudio" approach through RStudio's built-in Git interface:

Step-by-Step RStudio Git Setup

  1. In RStudio: File > New Project > Version Control > Git
  2. Repository URL: Paste the GitHub clone URL you copied earlier
  3. Project directory name: Will auto-populate from the repository name
  4. Create project as subdirectory of: Browse to select R:/
  5. Open in new session (recommended)
  6. Click "Create Project"

What RStudio Does Automatically

When you use this method, RStudio:

  • Downloads all repository files including README.md
  • Sets up the remote GitHub repo as origin
  • Configures local main branch to track origin/main
  • Enables the Git tab in the Environment pane for easy version control
  • Creates an RStudio project (.Rproj file) for easy future access

Verifying Git Setup in RStudio

After cloning, you should see:

  • Git tab in the upper-right pane of RStudio
  • Project files in the Files tab
  • The repository name in RStudio's title bar
  • A .git folder (may be hidden) indicating Git is active

Maintenance Using RStudio Git Interface

Daily Maintenance Tasks

Pulling Latest Changes:

  1. Open the repository project in RStudio
  2. Click the Git tab in the upper-right pane
  3. Click the blue "Pull" button (down arrow)
  4. RStudio will display any changes that were downloaded

Checking Repository Status:

  • The Git tab shows all modified files
  • Yellow "M" = Modified files
  • Green "A" = Added files
  • Red "D" = Deleted files
  • Purple "?" = Untracked files

Committing Changes:

  1. In the Git tab, check the boxes next to files you want to commit
  2. Click "Commit" button
  3. In the commit window:
    • Review changes in the diff panel
    • Write a descriptive commit message
    • Click "Commit" button
  4. Close the commit window

Pushing Changes:

  1. After committing, click the green "Push" button (up arrow) in the Git tab
  2. RStudio will show the push progress
  3. Changes are now synchronized with GitHub

Understanding Branching: Forking vs. Cloning vs. Branching

It's important to understand these three different workflows, as they are often confused:

Forking - Creating a complete copy of a repository under your own GitHub account. This is typically used when contributing to projects you don't have direct write access to (e.g., open-source projects). After forking, you clone your fork locally and create a PR back to the original repository. Forking creates a permanent separation and is more complex to manage.

Cloning - Creating a local copy of a repository on your computer. This is the act of downloading the repository code so you can work on it locally. Cloning doesn't create a copy on GitHub; it just downloads the existing repository to your machine.

Branching - Creating a separate line of development within the same repository. All team members with access to the repository can create branches directly without forking. Changes made on a branch don't affect the main codebase until they are merged through a pull request.

For KPHD workflows, we use branching for team collaboration. This approach is simpler to manage, requires less setup, and keeps all work organized within a single repository. Team members create feature branches for their work, submit pull requests (PRs) for review, and merge changes back to the main branch once approved. This avoids the complexity of managing forks and makes it easier for team members to stay synchronized.

Creating and Working with Feature Branches

When to Create a Branch

Create a new branch when you:

  • Are working on a new feature or improvement
  • Are fixing a bug
  • Are making updates to code for review
  • Don't want your changes to affect the main branch until they're ready

Creating a Branch in RStudio

Method 1: Using RStudio Terminal

  1. Open the Terminal in RStudio (Tab next to Console)

  2. Create and switch to a new branch using:

    git checkout -b branch-name
    

    Replace branch-name with a descriptive name (examples: fix-data-validation, add-new-analysis, update-documentation)

  3. Verify the branch was created: In the Git tab, you should see your new branch name displayed

Method 2: Using Git Interface (if available)

Some versions of RStudio may show a branch dropdown in the Git tab:

  1. Click the branch dropdown in the Git tab
  2. Click "New Branch"
  3. Enter the branch name and click Create

Branch Naming Conventions (from GitHub Guidance)

Use clear, descriptive names that indicate the purpose of your work:

  • fix-data-validation - for bug fixes
  • add-new-analysis - for new features
  • update-documentation - for documentation updates
  • improve-performance - for performance improvements
  • refactor-data-processing - for code refactoring

Avoid vague names like update, fix, test, or work-in-progress.

Making Changes on Your Branch

  1. Make all your code changes in your local files
  2. Test your changes thoroughly to ensure they work correctly
  3. Save your work frequently
  4. Commit your changes regularly with descriptive messages (see Committing Changes section above)

Pushing Your Branch to GitHub

Once you've made commits on your branch:

  1. Click the green "Push" button in the Git tab
  2. RStudio will push your branch to GitHub
  3. You'll see a success message when complete
  4. Your branch is now visible on GitHub.com

Pull Request Workflow

What is a Pull Request?

A pull request (PR) is a formal request to merge your branch into the main codebase. It allows team members to review your changes, provide feedback, suggest improvements, and ensure code quality before merging. Pull requests create a complete audit trail of all changes and decisions.

Creating a Pull Request

Step 1: Push Your Branch to GitHub

  1. Make sure all your changes are committed (see Committing Changes section above)
  2. Click the green "Push" button in RStudio's Git tab
  3. Your branch is now on GitHub

Step 2: Create the PR on GitHub.com

  1. Go to the repository on GitHub.com
  2. GitHub will show a yellow banner at the top saying something like:
    • "Your branch branch-name has recent pushes"
    • "Compare & pull request"
  3. Click "Compare & pull request"
  4. Alternatively, if the banner doesn't appear:
    • Go to the "Pull requests" tab at the top of the repository
    • Click the green "New pull request" button
    • Select your branch from the "compare" dropdown

Step 3: Fill in Pull Request Details (Guidance from GitHub)

When creating your PR, provide clear information:

Title: A concise summary of your changes

  • Examples: "Fix data validation errors", "Add new ci function", "Update README with new methods"

Description: Explain what you changed and why

  • What changes did you make?
  • Why were these changes necessary?
  • Are there any important details reviewers should know?
  • Include any relevant issue numbers (e.g., "Closes #45" or "Relates to #32")

Reviewers: Add team members who should review your code

  • Click "Reviewers" on the right side
  • Search for and select team members

Labels: Add tags to categorize your PR

  • Examples: "bug fix", "enhancement", "documentation", "review-needed"

Example PR Description:

## Summary
Fixed data validation errors in the data processing script that were causing the analysis to fail for datasets with missing values.

## Changes Made
- Added null value checks before data processing
- Improved error messages to indicate which fields have missing data
- Added unit tests to verify the fix works correctly

## Testing
- Tested with sample data containing various missing value patterns
- All existing tests continue to pass
- New tests added to prevent regression

## Related Issues
Closes #42

Step 4: Click "Create Pull Request"

Once you've filled in all the details, click the green "Create pull request" button.

Reviewing the PR on GitHub

For Authors: Monitor Your PR

Respond to Feedback:

  1. Check your PR regularly for comments from reviewers
  2. Read all comments and suggestions carefully
  3. Reply to comments to acknowledge feedback or ask clarifying questions

Making Requested Changes:

  1. Go back to your local RStudio project (the branch should still be checked out)
  2. Make the requested changes to your code files
  3. Save your changes
  4. Commit your changes with a descriptive message like:
    • "Address review feedback: improved error handling"
    • "Update documentation based on reviewer suggestions"
  5. Push your changes using the green Push button
  6. The changes will automatically appear in your PR on GitHub
  7. Leave a comment on the PR saying something like "Updates complete, ready for another review"

Request Additional Review: If you've made significant changes based on feedback:

  1. Go to your PR on GitHub
  2. Click "Re-request review" next to the reviewer's name (if they've already reviewed)
  3. Or add a new reviewer if needed

For Reviewers: Conducting a Code Review

Reviewing on GitHub.com:

  1. Go to the Pull requests tab and click on the PR to review
  2. Use the "Files changed" tab to review code:
    • Green lines = additions
    • Red lines = deletions
    • Hover over lines to add inline comments
  3. Click the blue "+" icon next to a line to comment on that specific line
  4. Type your comment and click "Add single comment" for individual comments or "Start a review" to batch multiple comments together
  5. Use the "Conversation" tab for general discussion and questions

Submitting Your Review:

  1. When you're done reviewing, click the "Review changes" button in the top-right
  2. Choose one of three options:
    • Comment: General feedback without approval
    • Approve: Changes look good and can be merged
    • Request changes: Issues need to be addressed before merging
  3. Add a summary comment if needed
  4. Click "Submit review"

Suggesting Specific Code Changes:

GitHub allows you to suggest exact code fixes:

  1. Hover over a line and click the "±" (suggestion) button
  2. GitHub will show you the exact change you're proposing
  3. The PR author can apply your suggestion with one click
  4. You can suggest multiple changes for efficiency

For Repository Managers: Merging PRs

Before Merging:

  1. Ensure all required reviews are complete
  2. Verify all comments have been addressed
  3. Check that automated tests pass (if configured)
  4. Review the complete list of changes in the "Files changed" tab

Merging the PR:

  1. On the PR page, scroll to the bottom
  2. Click the green "Merge pull request" button
  3. Choose a merge strategy:
    • "Create a merge commit": Recommended for KPHD - preserves branch history
    • "Squash and merge": Combines all commits into one (for cleaner history)
    • "Rebase and merge": Linear history (rarely used for team projects)
  4. Add a commit message summarizing the merge
  5. Click "Confirm merge"

After Merging:

  1. Click "Delete branch" to clean up the feature branch
  2. This prevents branch clutter and keeps the repository organized
  3. Delete the branch locally in RStudio as well:
    • In Terminal: git branch -d branch-name
    • Or use the Git tab interface

Pull Request Best Practices

For Contributors:

  • Keep PRs focused on a single feature or fix (avoid mixing unrelated changes)
  • Write clear, descriptive commit messages
  • Test your code thoroughly before submitting
  • Respond promptly to reviewer feedback
  • Don't merge your own PR - wait for another reviewer to approve

For Reviewers:

  • Review PRs promptly to maintain workflow momentum
  • Provide constructive, specific feedback
  • Ask questions if something isn't clear
  • Suggest improvements but acknowledge good work
  • Test the code changes locally if they're complex
  • Use inline comments for specific technical feedback
  • Use PR conversation for broader architectural discussions

For All Team Members:

  • Use GitHub Issues to track bugs and feature requests
  • Link PRs to related issues using keywords (e.g., "Closes #45")
  • Include references to previous related PRs or commits
  • Keep PR descriptions up-to-date as work progresses

Code Review and Collaboration Process

For Epidemiologists Reviewing Code

Step 1: Clone to Personal H Drive Using RStudio

Why Review on Personal H Drive?

Reviewing code on your personal H drive (instead of the shared R drive) is crucial because:

  • Prevents conflicts: Multiple reviewers can work simultaneously without interfering with each other
  • Maintains shared repo integrity: The R drive version stays clean and stable during review processes
  • Enables safe experimentation: You can test changes, run code, and make modifications without affecting the main working version
  • Facilitates independent branching: Each reviewer can create their own branches and commits without coordination issues
  • Protects against accidental changes: Reduces risk of unintended modifications to the authoritative shared version

Clone Process Using RStudio:

  1. In RStudio: File > New Project > Version Control > Git
  2. Repository URL: Paste the GitHub URL of the repository you want to review
  3. Project directory name: Keep the default (repository name)
  4. Create project as subdirectory of: Browse to H:/GitHub_Reviews/
    • Create this folder if it doesn't exist
  5. Open in new session
  6. Click "Create Project"

What You'll See After Cloning:

  • New RStudio session opens with the repository
  • Git tab is active and ready to use
  • All project files are available in the Files tab
  • You're automatically on the main branch

Step 2: Review Workflow Using RStudio Git Interface

Always Start by Pulling Latest Changes:

  1. In your cloned repository, click the Git tab
  2. Click the blue "Pull" button to get the latest changes
  3. This establishes a good habit and prevents conflicts later

Create a Review Branch:

  1. In RStudio, open the Terminal tab (next to Console)
  2. Create and switch to a new branch:
    git checkout -b review-[your-initials]-[date]
    
    Example: git checkout -b review-JS-2025-09-26
  3. You'll see the branch name change in RStudio's Git tab

Conduct Your Review:

  1. Read through the code files using RStudio's editor
  2. Run the scripts to test functionality
  3. Check outputs in the plots/console
  4. Make notes directly in the code as comments
  5. Test edge cases with different data if needed

Make Your Suggested Changes:

  1. Edit files directly in RStudio editor
  2. Add comments explaining your suggestions
  3. Save all changes
  4. In the Git tab, you'll see modified files marked with "M"

Step 3: GitHub Browser Interface for Code Review

GitHub provides powerful online review tools for real-time collaboration:

Browsing Repository Files Online

  1. Navigate to the repository on GitHub.com
  2. Browse folders and files by clicking through the file tree
  3. Click on any file to view its contents with syntax highlighting
  4. Use the search box (press 's' on keyboard) to quickly find files

Adding Comments to Specific Code Lines

  1. Open any file in the GitHub repository
  2. Click on line numbers where you want to comment
  3. A blue "+" appears - click it to add a comment
  4. Type your comment and click "Add single comment" or "Start a review"
  5. Comments are immediately visible to all team members with access

Using the Issues Tab for Broader Discussions

  1. Click "Issues" tab at the top of the repository
  2. Click "New issue" to create a discussion topic
  3. Use templates if available, or write a custom issue
  4. Add labels like "bug", "enhancement", or "question"
  5. Assign team members using @mentions
  6. Reference specific files using file paths in your comments

Pull Request Review Interface

  1. Go to "Pull requests" tab when someone submits changes
  2. Click on a pull request to review it
  3. Use "Files changed" tab for line-by-line review:
    • Green lines = additions
    • Red lines = deletions
    • Click on line numbers to add inline comments
  4. Use "Conversation" tab for general discussion
  5. Submit your review with one of three options:
    • Comment: General feedback without explicit approval
    • Approve: Changes look good to merge
    • Request changes: Issues need to be addressed first

Advanced GitHub Review Features

  • Suggest specific changes: Click the ±suggestion button to propose exact code modifications
  • Resolve conversations: Mark comment threads as resolved once addressed
  • Review requested changes: See what the author changed in response to feedback
  • Compare versions: Use the commit history to see how code evolved

Step 4: Submitting Your Review Using RStudio Git Interface

Commit Your Changes in RStudio:

  1. In the Git tab, review all your modified files
  2. Check the boxes next to files you want to commit (usually all of them)
  3. Click the "Commit" button
  4. In the commit window:
    • Review your changes in the diff panel (green = additions, red = deletions)
    • Write a descriptive commit message like:
      • "Review suggestions: improved data validation methods"
      • "Added error handling for missing data scenarios"
      • "Suggested more efficient data processing approach"
  5. Click the "Commit" button
  6. Close the commit window

Push Your Review Branch:

  1. Back in the Git tab, click the green "Push" button
  2. RStudio will push your review branch to GitHub
  3. You'll see a success message when complete

Create Pull Request Online:

  1. Go to the repository on GitHub.com
  2. GitHub will show a yellow banner saying "Your recently pushed branch"
  3. Click "Compare & pull request"
  4. Fill in the pull request details:
    • Title: Descriptive summary of your review
    • Description: Explain your suggested changes and rationale
    • Reviewers: Add team members using the right sidebar
    • Labels: Add tags like "review", "enhancement", etc.
  5. Click "Create pull request"

Engage in Discussion:

Once your pull request is created, team members can:

  • Comment on your overall suggestions
  • Reply to specific line comments you made
  • Suggest modifications to your suggestions
  • Approve or request changes to your review
  • Have threaded discussions on complex topics

This creates a complete audit trail of the review process and decisions made.

Best Practices

For All Team Members

  • Always work on feature branches, never directly on main/master
  • Use descriptive commit messages
  • Include documentation for all code changes
  • Test code thoroughly before submitting reviews
  • Engage actively in online GitHub discussions
  • Use @mentions to notify specific team members
  • Link to relevant issues using #issue-number

Communication Protocol

  • Use GitHub Issues for tracking bugs and feature requests
  • Use inline code comments for specific technical feedback
  • Use pull request discussions for broader code architecture conversations
  • Reference issue numbers in commit messages when applicable
  • Respond to comments promptly to maintain workflow momentum
  • Schedule team meetings for complex discussions that require real-time collaboration

Data Security and Privacy (Guidance from GitHub)

  • Never commit sensitive data or personally identifiable information
  • Use .gitignore files to exclude data files and sensitive configurations
  • Follow institutional data governance policies
  • Regularly audit repository contents for compliance
# Example .gitignore for R projects
writeLines(c(
  "# R files",
  "*.RData",
  "*.Rhistory",
  ".Rproj.user/",
  "",
  "# Data files",
  "data/*.csv",
  "data/*.xlsx", 
  "data/sensitive/",
  "",
  "# Output",
  "output/*.png",
  "output/*.pdf"
), ".gitignore")

Troubleshooting Common Issues

Using RStudio Git Interface for Troubleshooting

Checking Repository Status

  1. Git tab shows current status at a glance:
    • Modified files appear with "M"
    • Untracked files show with "?"
    • Branch name displayed at top
  2. Click "Diff" to see exactly what changed in each file
  3. Use "History" button to see previous commits

Handling Merge Conflicts in RStudio

  1. Pull conflicts will show in the Git tab with "U" (unmerged)
  2. Click on conflicted file to open it in RStudio editor
  3. Look for conflict markers:
    <<<<<<< HEAD
    Your changes
    =======
    Their changes
    >>>>>>> branch-name
    
  4. Edit the file to resolve conflicts (remove markers, choose correct code)
  5. Save the file
  6. In Git tab, check the box next to the resolved file
  7. Click "Commit" to complete the merge

When to Contact Repository Manager

  • Multiple unresolved conflicts across many files
  • Uncertainty about which version of code to keep
  • Push/pull errors related to permissions
  • Branch confusion or accidentally working on wrong branch

RStudio Git Interface Advantages

  • Visual diff viewer shows changes side-by-side
  • Point-and-click staging instead of command-line git add
  • Integrated conflict resolution within the familiar RStudio environment
  • One-click push/pull operations
  • Branch switching through dropdown menu (when available)

Merge Conflicts

  • Contact the repository manager before attempting to resolve
  • Use RStudio's Git pane for visual merge conflict resolution
  • Document resolution decisions for future reference

Access Issues

  • Verify GitHub permissions with repository administrators
  • Ensure network access to both GitHub and shared drives
  • Contact IT support for drive access problems

Using Push Pulls Directly in R

  • Please refer to the README document for pull request functions from the 'usethis' package

Working with GitHub Repository Browser Interface

Making Changes Directly on GitHub

Sometimes you need to make quick changes directly in the GitHub web interface:

Editing Files Online

  1. Navigate to the file you want to edit in the repository
  2. Click the pencil icon (✏️) in the top-right corner of the file view
  3. Make your changes using GitHub's built-in editor
  4. Preview changes using the "Preview" tab if available
  5. Scroll down to "Commit changes" section:
    • Commit message: Write a descriptive message
    • Extended description: Add details if needed
    • Choose: "Commit directly to main" or "Create a new branch"
  6. Click "Commit changes"

Creating New Files Online

  1. In the repository, click "Add file" > "Create new file"
  2. Name your file (include .R, .md, .txt extension as appropriate)
  3. Add content using the web editor
  4. Commit the file using the same process as editing

Uploading Files via Browser

  1. Click "Add file" > "Upload files"
  2. Drag and drop files or click "choose your files"
  3. Add commit message and commit changes

Reviewing Code in GitHub Browser

Exploring Repository Structure

  1. Main repository page shows all folders and files
  2. Click folders to navigate deeper into the project structure
  3. Click files to view their contents with syntax highlighting
  4. Use the "Go to file" button (keyboard shortcut: 't') to quickly find specific files

Understanding File History

  1. Click on any file to view it
  2. Click "History" button to see all changes made to that file
  3. Click on commits to see exactly what changed
  4. Compare versions by selecting different commits

Using Search and Navigation

  1. Search within repository: Use the search box at the top
  2. Search for specific code: Use GitHub's code search with terms like function:, class:, etc.
  3. Navigate by file type: Filter files using the repository insights