Skip to content

Latest commit

 

History

History
42 lines (33 loc) · 2.03 KB

File metadata and controls

42 lines (33 loc) · 2.03 KB

|Prev|Index|Next|

Generating the Open-FF data set

The Open-FF data set is regenerated roughly every month to update with new fracking disclosures and to incorporate and changes that were made to existing disclosures. The process is performed by the developers of Open-FF and is sponsored by the FracTracker Alliance.

The process has many steps, some automated, some manual. It is guided by a jupyter notebook that includes instructions, code and tests to validate the process through each step. The primary steps are:

Set up

  1. Downloading the materials needed:
    • the previous data repo
    • the external data sets used
    • a fresh FracFocus download

Curation

  1. Determine the disclosures that are new
  2. Search the fresh data for new CASNumbers; fetch authoritative data about them (SciFinder, CompTox)
  3. Search the fresh data for new IngredientNames; try to resolve to an authoritive CASRN.
  4. Assign final bgCAS value to each new bgCAS:IngredientName pairs
  5. Search for new company names and link them to other existing company names as appropriate
  6. Check geographic and location data - flag errors and curate any new counties
  7. Determine the carrier record(s) of every disclosure to facilitate mass calculations

Generation

  1. Search for duplicate disclosures and duplicate records; flag them
  2. Flag disclosures without chemicals
  3. Assemble chemical, disclosure, and company tables
  4. Apply external lists to chemical list
  5. Calculate mass where enough data is available
  6. Produce full data set

Post processing

  1. Perform dataset-wide integrity tests
  2. Detect and flag set of documented "FF_issues."
  3. Construct a full data repository

|Prev|Index|Next|