Extracts key information from 1,025 historical NOAA Form 17-4 (Initial Report On Weather Modification Activities) PDF files using LLM integration. Saves extracted structured information into CSV file.
All NOAA forms are publicly available for download at https://library.noaa.gov/weather-climate/weather-modification-project-reports.
- Download all files you wish to process and save them in
noaa-files/ - Navigate to
code/ - Install required Python dependencies
pip install requirements.txt - Obtain your own OpenAI and LLM Whisperer credentials and save your API keys in
.env - Use
python ./file-helpers/move-interim-final-files.pyand manual review to ensure the final Form 17-4 is the first page of the PDF. - Run
python llm-extractor.pyto generate the dataset. This will take about 2.5 hours to process all NOAA files (~10-15 seconds per file). - Run
python clean-dataset.pyto clean and standardize the dataset. - View the generated dataset in
dataset/final/