Skip to content

COP: Epic: Develop Entry-level Data Science Projects for Fall New Joiners #269

@MissBrandyLea

Description

@MissBrandyLea

Overview

We need to identify and develop 2-3 Data Science Projects focused largely on spreadsheet analysis, so newcomers from Calbright program will have meaningful work to start with as they grow in skills

Updated Epic Action Items Draft

Planning / Alignment

  • Review existing HfLA Data Science project template, One Sheet expectations, and Epic examples
  • Clarify goals for the entry-level project pathway
    • Support new Calbright / beginner-level contributors
    • Provide meaningful spreadsheet-based data analysis work
    • Create a repeatable starter project structure that can be reused for future datasets
    • Help identify and mentor future peer reviewers, stage leads, and project leads
  • Draft initial proposal for starter project structure

    • Recommend starting with 2–3 spreadsheet-based projects
    • Recommend using the same full workflow across each dataset
    • Recommend adding SQL projects later after spreadsheet teams are stable
  • Review starter project proposal with Andrew

    • Confirm whether this approach fits the HfLA Data Science Community of Practice goals
    • Confirm preferred number of starter projects for initial launch
    • Confirm whether projects should be framed as labs, starter projects, or full HfLA projects
    • Confirm expectations for project ownership, lead review, and contributor onboarding
    • Confirm whether any datasets, civic topics, or partner priorities should be preferred or avoided

Dataset Research and Selection

  • Identify potential civic/public policy/DEI datasets suitable for beginner spreadsheet analysis

    • Prefer LA City, Los Angeles County, Greater LA, or California datasets
    • Prefer CSV, XLSX, Google Sheets, or easily exportable formats
    • Prefer datasets with approximately 300–10,000 records after filtering
    • Prefer datasets with enough categorical and numerical fields to support pivot tables, charts, and summary analysis
  • Add candidate data sources to Resources section

  • Evaluate candidate datasets for beginner suitability

    • Topic relevance
    • Spreadsheet performance
    • Data cleanliness / complexity
    • Ethical risk or sensitivity
    • Storytelling potential
    • Suitability for repeatable workflow
  • Select 2–3 datasets for initial spreadsheet starter projects

  • Document selected dataset sources, download dates, filters, and known limitations

Starter Project Template Development

  • Define standard workflow to be used across all starter spreadsheet projects

    • Project overview and stakeholder question
    • Dataset orientation
    • Data dictionary / column documentation
    • Data cleaning
    • Exploratory data analysis
    • Insight identification
    • Visualization / presentation planning
    • Final report or slide deck
    • Final documentation and project closeout
  • Create reusable project README / starter guide

  • Create reusable contributor onboarding instructions

  • Create reusable issue templates for beginner-friendly spreadsheet tasks

  • Create peer review checklists for common task types

    • Data dictionary review
    • Data cleaning review
    • Pivot table / summary table review
    • Chart review
    • Insight review
    • Final deliverable review
  • Create lead review checklist for final review before issue closure

Project One Sheet Development

For each selected starter project:

  • Draft official project name
  • Write brief project summary
  • Define stakeholder or intended audience
  • Summarize project value add
  • Document project history, if applicable
  • Identify project partners, if applicable
  • Define tools to be used for analysis and visualization
  • Draft 6-month roadmap
  • Add project resources and dataset links
  • Review One Sheet draft with Andrew or Data Science leadership
  • Revise and finalize One Sheet

Issue Creation / Project Board Setup

For each selected starter project:

  • Create project Epic or parent issue, if needed

  • Create “Start Here” issue for new contributors

  • Create dataset orientation issue

  • Create data dictionary issue(s)

  • Create data cleaning issue(s)

  • Create exploratory data analysis issue(s)

  • Create visualization issue(s)

  • Create insight identification issue(s)

  • Create presentation/report issue(s)

  • Create final documentation issue(s)

  • Add labels, milestones, and project board status fields as appropriate

  • Confirm each issue has clear acceptance criteria

  • Confirm each issue can move through HfLA review flow:

    • Self-assigned
    • In progress
    • Peer review
    • Lead review
    • Closed

Contributor Onboarding and Leadership Pipeline

  • Define expectations for starter project contributors
  • Define expectations for peer reviewers
  • Define expectations for stage leads or emerging team leads
  • Create guidance for how contributors can move from beginner tasks into peer review or stage lead roles
  • Identify first possible peer reviewers / team leads during onboarding
  • Document mentoring process for new reviewers and future team leads
  • Create process for escalating blocked issues or unclear analysis questions

Launch / Pilot

  • Prepare starter project overview for first Data Science Community onboarding session
  • Present available starter projects to new contributors
  • Help contributors select or self-assign starter issues
  • Monitor initial issue flow and project board activity
  • Support peer review process
  • Complete lead review for issues that pass peer review
  • Track project bottlenecks, unclear instructions, or repeated beginner questions
  • Revise templates/issues based on pilot feedback

Evaluation and Next Steps

  • Review how well the initial starter projects supported new contributors
  • Identify which tasks worked well for beginners
  • Identify which tasks need clearer instructions or smaller issue breakdowns
  • Identify contributors ready to support peer review or stage leadership
  • Decide whether to launch additional spreadsheet projects
  • Decide when there is enough capacity to create a SQL starter project
  • Document lessons learned for future starter project creation

Resources/Instructions

Example Epics:
Issues #106 , #107

One Sheet Resources:
One Sheet Template
How to create a One Sheet guide
Check issue #3 for sample one sheets

Data Dictionary Resources:
USGS Explanation

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions