Problem
Currently in weather-sp, the file splitter skips processing an input file only when all its split children already exist. If even one of the split files is missing (for example, due to a partial failure or interrupted run), the splitter re-splits the entire file and overwrites all the previously created children.
This leads to redundant processing and unnecessary I/O, especially when re-running pipelines to fill in missing data.
Proposed Solution
Add functionality to skip the creation of individual split files if they already exist in the output location.
Instead of an all-or-nothing approach at the input file level, the splitter should:
- Identify which specific split files are expected to be produced.
- Check if each file already exists.
- Only generate and upload the missing files, leaving the existing ones intact.
Problem
Currently in
weather-sp, the file splitter skips processing an input file only when all its split children already exist. If even one of the split files is missing (for example, due to a partial failure or interrupted run), the splitter re-splits the entire file and overwrites all the previously created children.This leads to redundant processing and unnecessary I/O, especially when re-running pipelines to fill in missing data.
Proposed Solution
Add functionality to skip the creation of individual split files if they already exist in the output location.
Instead of an all-or-nothing approach at the input file level, the splitter should: