Overview

This repository is a template for a reproducible data analysis project or paper. The default example uses R, Quarto, Git, and GitHub, but the structure is workflow-first so projects can add Python, Julia, shell scripts, or other tools without major reorganization.

This template also includes lightweight guidance for AI-supported work. The goal is not to make AI do the project for you. The goal is to make AI tools useful for coding, documentation, review, and troubleshooting while keeping the project transparent, reproducible, and human-reviewed.

Pre-Requisites

The default example uses R, Quarto, GitHub, and a reference manager that can handle BibTeX. Zotero with the Better BibTeX plugin is a good choice.

It is also useful to have a word processor installed, such as MS Word or LibreOffice. To produce PDF output, you need a TeX distribution. TinyTeX is one option; see the Quarto PDF instructions.

The example files use these R packages: broom, dplyr, ggplot2, here, knitr, readxl, skimr, and tidyr. Install them before running the example workflow:

install.packages(c("broom", "dplyr", "ggplot2", "here", "knitr",
                   "readxl", "skimr", "tidyr"))

Template Structure

The template comes with a folder structure and example files to show the kinds of content you would place in each folder. See the folder-specific readme files for more detail.

ai/: AI workflow notes, prompt templates for analysis planning, modeling review, code review, reproducibility audits, final product review, a student AI policy, and a short AI-use log. See ai/readme-ai.md.
assets/: static non-code materials such as references, CSL files, PDFs, and manually created figures. See assets/readme-assets.md.
code/: analysis code organized by workflow stage. See code/readme-code.md.
data/: raw, processed, private, and large data folders. See data/readme-data.md.
products/: final or near-final deliverables such as reports, manuscripts, presentations, posters, and apps. See products/readme-products.md.
results/: outputs generated by code, such as figures, tables, and model summaries. See results/readme-results.md.

Important project-level files:

readme.md: this project overview.
usage.md: practical instructions for running and reproducing the project.
agents.md: extra instructions for AI coding assistants and collaborators using AI tools.

Naming Conventions

Use descriptive file and folder names. In general:

use lower-case names;
separate words with -;
avoid spaces, underscores, and CamelCase unless a standard file name or file extension requires otherwise.

For example, this template has code/analysis/statistical-analysis.r.

Readme files are named by folder context, such as readme-code.md, readme-data.md, and readme-exploration.md.

GitHub, Sharing, And Sensitive Data

Git and GitHub are useful for tracking changes, backing up work, and sharing a project with collaborators. In Git/GitHub language, to commit a file means to add a recorded version of that file to the project history. Once a file is committed and pushed to GitHub, it can be difficult to fully remove from the history, especially if the repository has been shared.

For many class and research projects, it is wise to start with a private GitHub repository. Later, after checking the data, outputs, license, authorship, and project goals, the project team can decide whether the repository should remain private or become public.

Do not commit private, sensitive, regulated, identifiable, restricted, or license-protected data unless the project owner has explicitly approved that workflow. A private GitHub repository is not the same as a secure data system or an IRB/DUA-approved storage plan. If the repository stays private because it contains or depends on restricted material, document what can be committed and what must stay local or in an approved storage location.

To keep files out of Git, list them in .gitignore before committing them. For example, this template already ignores the contents of data/private-data/, data/large-files/, and results/large-files/, while keeping small readme or placeholder files in those folders. If a file has already been committed, adding it to .gitignore is not enough; it must also be removed from Git tracking while leaving the local file in place. See GitHub's documentation on ignoring files for more detail.

Generated Outputs

Generated outputs are generally committed when they are reasonably small and do not contain sensitive information. That includes example processed data, figures, tables, rendered HTML files, and other outputs that help users see expected results and render products.

The main exceptions are:

large files that are too big for ordinary Git/GitHub use;
outputs that contain sensitive, private, regulated, identifiable, restricted, or license-protected information;
outputs that the project owner, instructor, collaborator, data provider, IRB, DUA, journal, or funder says should not be shared.

Use documented ignored locations such as data/private-data/, data/large-files/, or results/large-files/ for files that should not be committed. Commit a readme or placeholder explaining what belongs there, how the file is generated or obtained, and who is allowed to access it.

Software And Package Management

Document the software and package setup your project needs. The default example uses manually installed R packages because that is approachable for students and short projects.

For R projects, renv can help manage R packages and improve long-term reproducibility. This template does not enable renv by default because it adds complexity for new users and classroom settings.

If you decide to use renv, commit the lockfile (renv.lock) and the files needed to activate the environment, but do not commit the local package library.

For Python, Julia, or other languages, document the chosen environment manager in this readme or another human-facing documentation file. Examples include virtual environments, Conda, Poetry, Julia project files, or containers. These are optional; use them when they solve a real project need.

Getting Started

This is a GitHub template repository. The best way to start a new project is to create a repository from this template.

For the example project, run the code in documented pieces. The workflow is reproducible because data processing, figure creation, table creation, and analysis are done by code rather than by undocumented manual edits to data files or figures. See usage.md for the run order, setup checks, optional example workflow runner, and product-rendering instructions.

AI-Supported Workflow

AI tools can help explain code, draft first-pass code, improve documentation, suggest checks, and review for reproducibility problems. They should not be treated as final authority for scientific claims, model choice, data privacy, citation accuracy, or interpretation of results.

When using AI tools:

Point the tool to readme.md, usage.md, agents.md, data/readme-data.md, and relevant files in ai/.
AI tools may use ai/project-summary.yml as a concise orientation aid. It is a secondary summary of information documented elsewhere, not the source of truth.
Do not paste sensitive, private, regulated, or identifiable data into external AI tools unless the project owner has explicitly approved that workflow.
Ask for small, reviewable changes.
Rerun affected scripts or rerender affected products after meaningful changes.
Follow ai/ai-policy-for-students.md for student-facing AI-use expectations.
Add a short entry to ai/ai-use-log.md for meaningful AI-assisted work when maintaining or updating the project.

AI-related files and expected readers/writers

Different files in ai/ have different audiences:

ai/prompts/: prompt templates are mainly read and copied by humans or AI tools, and may be edited by humans or AI maintainers when the workflow changes.
ai/project-summary.yml: a concise AI-readable summary of information already documented elsewhere. Humans usually do not need to edit it. AI assistants may read or update it when maintaining the project, but if it disagrees with the human-facing documentation, the human-facing documentation wins.
ai/readme-ai.md, ai/ai-policy-for-students.md, and ai/review-checklist.md: these should be read by humans and AI tools. Humans may edit them when project or course policy changes; AI may suggest or make updates when asked.
ai/ai-use-log.md: this is mainly a human-readable transparency record. Humans usually read it rather than editing it. AI assistants or project maintainers may add concise entries when meaningful AI-assisted work occurs.
Local AI/tool state folders such as .ai-local/, .ai-cache/, .codex/, and local Claude settings are not committed. AI tools may read and write those files for their own operation, and humans usually do not need to inspect them.

Project files outside ai/, such as readme.md, usage.md, data/readme-data.md, code/, results/, and products/, are human-facing project materials. Humans should be able to read and understand them. AI tools may help edit or review them, but important scientific, statistical, privacy, and interpretation choices need human review.

GitHub Actions and other automated workflows can be useful for advanced users. They are intentionally not enabled by default in this template because many users will be new to Git and GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Pre-Requisites

Template Structure

Naming Conventions

GitHub, Sharing, And Sensitive Data

Generated Outputs

Software And Package Management

Getting Started

AI-Supported Workflow

AI-related files and expected readers/writers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
ai		ai
assets		assets
code		code
data		data
products		products
results		results
.gitignore		.gitignore
agents.md		agents.md
readme.md		readme.md
todo.md		todo.md
usage.md		usage.md

Folders and files

Latest commit

History

Repository files navigation

Overview

Pre-Requisites

Template Structure

Naming Conventions

GitHub, Sharing, And Sensitive Data

Generated Outputs

Software And Package Management

Getting Started

AI-Supported Workflow

AI-related files and expected readers/writers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages