Skip to content

feat: initialize PKU-DFIIC data cleaning project#124

Open
QianbinDou wants to merge 1 commit into
pedrohcgs:mainfrom
QianbinDou:feat/init-pku-dfiic-cleaning
Open

feat: initialize PKU-DFIIC data cleaning project#124
QianbinDou wants to merge 1 commit into
pedrohcgs:mainfrom
QianbinDou:feat/init-pku-dfiic-cleaning

Conversation

@QianbinDou

Copy link
Copy Markdown

Repurpose template repo as PKU Digital Finance Inclusion Index cleaning project:

  • Add Python exploration script (01_explore.py): reads 4 Excel sheets, generates province/city/county/regionalism .dta files with full diagnostic output
  • Add Stata cleaning scripts (02-04_clean_*.do): validation, labeling, xtset, and save for province (403 obs), city (4,369 obs), and county (26,090 obs)
  • Add cleaned output files: PKU_DFIIC_province/city/county.dta
  • Fix 3 Stata syntax issues found during first run:
    • assert does not accept message string arguments
    • r(unique_value) overwritten by duplicates list; save to local first
    • missing values treated as +inf by Stata; add !missing() to range checks
  • Update CLAUDE.md and README.md for new project context
  • Remove Beamer/LaTeX/R template scaffolding (Slides, Preambles, scripts, templates, etc.)
  • Update .gitignore: exclude macOS resource forks (._*), data_temp/, data_logs/

Summary

Type

  • Bug fix (fix/...)
  • New feature: skill / agent / rule / hook (feat/...)
  • Docs / guide / README (docs/...)
  • Chore / cleanup (chore/...)

Test plan

  • ./scripts/validate-setup.sh exits 0
  • python3 scripts/quality_score.py <changed-files> ≥ 80
  • /deep-audit finds no new inconsistencies
  • Manually exercised the changed skill/agent/hook on a real file
  • Updated both README.md and guide/workflow-guide.qmd if user-facing
  • Skill counts agree across CLAUDE.md, README.md, docs/index.html, and the guide

Generality (for new skills/agents/rules)

  • Works for ≥2 academic domains (not specific to one field)
  • No hardcoded paths, machine-specific config, or institutional branding
  • No project-specific examples in shared infrastructure

Notes for reviewer

Repurpose template repo as PKU Digital Finance Inclusion Index cleaning project:

- Add Python exploration script (01_explore.py): reads 4 Excel sheets, generates
  province/city/county/regionalism .dta files with full diagnostic output
- Add Stata cleaning scripts (02-04_clean_*.do): validation, labeling, xtset,
  and save for province (403 obs), city (4,369 obs), and county (26,090 obs)
- Add cleaned output files: PKU_DFIIC_province/city/county.dta
- Fix 3 Stata syntax issues found during first run:
  * assert does not accept message string arguments
  * r(unique_value) overwritten by duplicates list; save to local first
  * missing values treated as +inf by Stata; add !missing() to range checks
- Update CLAUDE.md and README.md for new project context
- Remove Beamer/LaTeX/R template scaffolding (Slides, Preambles, scripts, templates, etc.)
- Update .gitignore: exclude macOS resource forks (._*), data_temp/, data_logs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant