Skip to content

WIP: Rebuild everything into a completely new pipeline, handling both SNVs and CNVs#19

Draft
ifokkema wants to merge 116 commits into
masterfrom
rebuild-everything
Draft

WIP: Rebuild everything into a completely new pipeline, handling both SNVs and CNVs#19
ifokkema wants to merge 116 commits into
masterfrom
rebuild-everything

Conversation

@ifokkema

Copy link
Copy Markdown
Member

This is a work in progress; this PR has been created for code reviews.

ifokkema and others added 30 commits February 20, 2026 16:49
I ran into problems when I wanted to use this class for multiple
 files at the same time. So we'll have to use it like a normal
 object and a constructor, which receives the file name.
The exit codes are re-used everywhere; better create them here and
 add them to the settings.
The settings file is not committed, so this needs to be done
 manually in the repo, configured to your needs.
It can be configured to also write to the screen at the same time.
We'll fully automate interaction with the servers, so we'll need
 to have the SSH key passphrases. To make sure we can check them
 immediately when given, we should connect to the server, but
 that's overkill. So, instead, store the hash and compare. That
 doesn't guarantee that they will work, but once a working hash has
 been cached, we'll have a quick check.
We'll need to store here the files that we need to create a
 release. That's better done per center, which means we need to
 rebuild how we store information on the centers in the settings.
This required me to add a feature to delete settings.
Also check for .gz files that we can decompress.
I don't like those terms, but that's what they're called within the
 VKGL project.
We will no longer generate VCF, but use the given HGVS. The VCF was
 causing issues for inversions, that were mistranslated into WT
 variants. This will also solve that.
We need this for the Radboud format.
Used the HGVS library to parse whatever value is in the
 transcripts_or_dna field, which is a mixed bag of transcripts,
 cDNA descriptions (with or without transcripts), genomic DNA
 descriptions, or protein descriptions.
ifokkema and others added 30 commits May 29, 2026 16:42
This requires the validator to store the statistics internally,
 after which the pipeline can retrieve and store them. This also
 removes code from the validator that loads the Settings class.
Also, don't do tricks with the directory names that only work for
 as long as we don't update the directory structure. Use the proper
 variables so it always works.
- Renamed to validateAggregatedData(), which follows naming
  guidelines and allows us later to use the Validator class for
  other validations, too.
- Fix or improve some comments, explaining better what we're
  actually doing right now.
- Instead of creating an array just to use count() on it a few
  times, better get the count right away.
- Use single-use variables only if it significantly shortens a
  line and improves readability.
- We don't need an else when all if()s and elseif()s die. Not using
  that else will reduce the indentation and improve readability.
- Instead of using arrays that then need to be counted, just
  increase a simple counter by one each time.
We can simplify the formatting, as the newest HGVS library has new
 functionality that will help us later (e.g., the pter/qter
 recognition).
The logic has, with modifications, been migrated there.
Don't always use the result; only when it looks valid. Also fix
 some issues with the pter/qter to '?' replacement.
For some of the code, I couldn't figure out and I had to remove it.
Overall, we ended up with more variants. That's in part because of
 the improvements in the HGVS library, but it also really seems
 the old code rejected entries sometimes that were fine.

This new code also makes sure that rejected lines are actually
 logged, so they end up in the error log.
That way, if we have to re-run parts of the pipeline, they will
 simply get overwritten without any mingling by accident.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant