|Prev|Index|Next|

Generating the Open-FF data set

The Open-FF data set is regenerated roughly every month to update with new fracking disclosures and to incorporate and changes that were made to existing disclosures. The process is performed by the developers of Open-FF and is sponsored by the FracTracker Alliance.

The process has many steps, some automated, some manual. It is guided by a jupyter notebook that includes instructions, code and tests to validate the process through each step. The primary steps are:

Set up

Downloading the materials needed:
- the previous data repo
- the external data sets used
- a fresh FracFocus download

Curation

Determine the disclosures that are new
Search the fresh data for new CASNumbers; fetch authoritative data about them (SciFinder, CompTox)
Search the fresh data for new IngredientNames; try to resolve to an authoritive CASRN.
Assign final bgCAS value to each new bgCAS:IngredientName pairs
Search for new company names and link them to other existing company names as appropriate
Check geographic and location data - flag errors and curate any new counties
Determine the carrier record(s) of every disclosure to facilitate mass calculations

Generation

Search for duplicate disclosures and duplicate records; flag them
Flag disclosures without chemicals
Assemble chemical, disclosure, and company tables
Apply external lists to chemical list
Calculate mass where enough data is available
Produce full data set

Post processing

Perform dataset-wide integrity tests
Detect and flag set of documented "FF_issues."
Construct a full data repository

|Prev|Index|Next|

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating the Open-FF data set

Set up

Curation

Generation

Post processing

FilesExpand file tree

Generating_the_Open-FF_data_set.md

Latest commit

History

Generating_the_Open-FF_data_set.md

File metadata and controls

Generating the Open-FF data set

Set up

Curation

Generation

Post processing