This repository is a template for a reproducible data analysis project or paper. The default example uses R, Quarto, Git, and GitHub, but the structure is workflow-first so projects can add Python, Julia, shell scripts, or other tools without major reorganization.
This template also includes lightweight guidance for AI-supported work. The goal is not to make AI do the project for you. The goal is to make AI tools useful for coding, documentation, review, and troubleshooting while keeping the project transparent, reproducible, and human-reviewed.
The default example uses R, Quarto, GitHub, and a reference manager that can handle BibTeX. Zotero with the Better BibTeX plugin is a good choice.
It is also useful to have a word processor installed, such as MS Word or LibreOffice. To produce PDF output, you need a TeX distribution. TinyTeX is one option; see the Quarto PDF instructions.
The example files use these R packages: broom, dplyr, ggplot2, here,
knitr, readxl, skimr, and tidyr. Install them before running the example
workflow:
install.packages(c("broom", "dplyr", "ggplot2", "here", "knitr",
"readxl", "skimr", "tidyr"))The template comes with a folder structure and example files to show the kinds of content you would place in each folder. See the folder-specific readme files for more detail.
ai/: AI workflow notes, prompt templates for analysis planning, modeling review, code review, reproducibility audits, final product review, a student AI policy, and a short AI-use log. Seeai/readme-ai.md.assets/: static non-code materials such as references, CSL files, PDFs, and manually created figures. Seeassets/readme-assets.md.code/: analysis code organized by workflow stage. Seecode/readme-code.md.data/: raw, processed, private, and large data folders. Seedata/readme-data.md.products/: final or near-final deliverables such as reports, manuscripts, presentations, posters, and apps. Seeproducts/readme-products.md.results/: outputs generated by code, such as figures, tables, and model summaries. Seeresults/readme-results.md.
Important project-level files:
readme.md: this project overview.usage.md: practical instructions for running and reproducing the project.agents.md: extra instructions for AI coding assistants and collaborators using AI tools.
Use descriptive file and folder names. In general:
- use lower-case names;
- separate words with
-; - avoid spaces, underscores, and CamelCase unless a standard file name or file extension requires otherwise.
For example, this template has code/analysis/statistical-analysis.r.
Readme files are named by folder context, such as readme-code.md,
readme-data.md, and readme-exploration.md.
Git and GitHub are useful for tracking changes, backing up work, and sharing a project with collaborators. In Git/GitHub language, to commit a file means to add a recorded version of that file to the project history. Once a file is committed and pushed to GitHub, it can be difficult to fully remove from the history, especially if the repository has been shared.
For many class and research projects, it is wise to start with a private GitHub repository. Later, after checking the data, outputs, license, authorship, and project goals, the project team can decide whether the repository should remain private or become public.
Do not commit private, sensitive, regulated, identifiable, restricted, or license-protected data unless the project owner has explicitly approved that workflow. A private GitHub repository is not the same as a secure data system or an IRB/DUA-approved storage plan. If the repository stays private because it contains or depends on restricted material, document what can be committed and what must stay local or in an approved storage location.
To keep files out of Git, list them in .gitignore before committing them. For
example, this template already ignores the contents of data/private-data/,
data/large-files/, and results/large-files/, while keeping small readme or
placeholder files in those folders. If a file has already been committed, adding
it to .gitignore is not enough; it must also be removed from Git tracking while
leaving the local file in place. See GitHub's documentation on
ignoring files
for more detail.
Generated outputs are generally committed when they are reasonably small and do not contain sensitive information. That includes example processed data, figures, tables, rendered HTML files, and other outputs that help users see expected results and render products.
The main exceptions are:
- large files that are too big for ordinary Git/GitHub use;
- outputs that contain sensitive, private, regulated, identifiable, restricted, or license-protected information;
- outputs that the project owner, instructor, collaborator, data provider, IRB, DUA, journal, or funder says should not be shared.
Use documented ignored locations such as data/private-data/,
data/large-files/, or results/large-files/ for files that should not be
committed. Commit a readme or placeholder explaining what belongs there, how the
file is generated or obtained, and who is allowed to access it.
Document the software and package setup your project needs. The default example uses manually installed R packages because that is approachable for students and short projects.
For R projects, renv can help
manage R packages and improve long-term reproducibility. This template does not
enable renv by default because it adds complexity for new users and classroom
settings.
If you decide to use renv, commit the lockfile (renv.lock) and the files
needed to activate the environment, but do not commit the local package library.
For Python, Julia, or other languages, document the chosen environment manager in this readme or another human-facing documentation file. Examples include virtual environments, Conda, Poetry, Julia project files, or containers. These are optional; use them when they solve a real project need.
This is a GitHub template repository. The best way to start a new project is to create a repository from this template.
For the example project, run the code in documented pieces. The workflow is
reproducible because data processing, figure creation, table creation, and
analysis are done by code rather than by undocumented manual edits to data files
or figures. See usage.md for the run order, setup checks, optional example
workflow runner, and product-rendering instructions.
AI tools can help explain code, draft first-pass code, improve documentation, suggest checks, and review for reproducibility problems. They should not be treated as final authority for scientific claims, model choice, data privacy, citation accuracy, or interpretation of results.
When using AI tools:
- Point the tool to
readme.md,usage.md,agents.md,data/readme-data.md, and relevant files inai/. - AI tools may use
ai/project-summary.ymlas a concise orientation aid. It is a secondary summary of information documented elsewhere, not the source of truth. - Do not paste sensitive, private, regulated, or identifiable data into external AI tools unless the project owner has explicitly approved that workflow.
- Ask for small, reviewable changes.
- Rerun affected scripts or rerender affected products after meaningful changes.
- Follow
ai/ai-policy-for-students.mdfor student-facing AI-use expectations. - Add a short entry to
ai/ai-use-log.mdfor meaningful AI-assisted work when maintaining or updating the project.
Different files in ai/ have different audiences:
ai/prompts/: prompt templates are mainly read and copied by humans or AI tools, and may be edited by humans or AI maintainers when the workflow changes.ai/project-summary.yml: a concise AI-readable summary of information already documented elsewhere. Humans usually do not need to edit it. AI assistants may read or update it when maintaining the project, but if it disagrees with the human-facing documentation, the human-facing documentation wins.ai/readme-ai.md,ai/ai-policy-for-students.md, andai/review-checklist.md: these should be read by humans and AI tools. Humans may edit them when project or course policy changes; AI may suggest or make updates when asked.ai/ai-use-log.md: this is mainly a human-readable transparency record. Humans usually read it rather than editing it. AI assistants or project maintainers may add concise entries when meaningful AI-assisted work occurs.- Local AI/tool state folders such as
.ai-local/,.ai-cache/,.codex/, and local Claude settings are not committed. AI tools may read and write those files for their own operation, and humans usually do not need to inspect them.
Project files outside ai/, such as readme.md, usage.md, data/readme-data.md,
code/, results/, and products/, are human-facing project materials. Humans
should be able to read and understand them. AI tools may help edit or review
them, but important scientific, statistical, privacy, and interpretation choices
need human review.
GitHub Actions and other automated workflows can be useful for advanced users. They are intentionally not enabled by default in this template because many users will be new to Git and GitHub.