WIP: Rebuild everything into a completely new pipeline, handling both SNVs and CNVs#19
Draft
ifokkema wants to merge 116 commits into
Draft
WIP: Rebuild everything into a completely new pipeline, handling both SNVs and CNVs#19ifokkema wants to merge 116 commits into
ifokkema wants to merge 116 commits into
Conversation
I ran into problems when I wanted to use this class for multiple files at the same time. So we'll have to use it like a normal object and a constructor, which receives the file name.
The exit codes are re-used everywhere; better create them here and add them to the settings.
The settings file is not committed, so this needs to be done manually in the repo, configured to your needs.
It can be configured to also write to the screen at the same time.
We'll fully automate interaction with the servers, so we'll need to have the SSH key passphrases. To make sure we can check them immediately when given, we should connect to the server, but that's overkill. So, instead, store the hash and compare. That doesn't guarantee that they will work, but once a working hash has been cached, we'll have a quick check.
We'll need to store here the files that we need to create a release. That's better done per center, which means we need to rebuild how we store information on the centers in the settings.
This required me to add a feature to delete settings.
Also check for .gz files that we can decompress.
I don't like those terms, but that's what they're called within the VKGL project.
We will no longer generate VCF, but use the given HGVS. The VCF was causing issues for inversions, that were mistranslated into WT variants. This will also solve that.
We need this for the Radboud format.
Used the HGVS library to parse whatever value is in the transcripts_or_dna field, which is a mixed bag of transcripts, cDNA descriptions (with or without transcripts), genomic DNA descriptions, or protein descriptions.
This requires the validator to store the statistics internally, after which the pipeline can retrieve and store them. This also removes code from the validator that loads the Settings class. Also, don't do tricks with the directory names that only work for as long as we don't update the directory structure. Use the proper variables so it always works.
- Renamed to validateAggregatedData(), which follows naming guidelines and allows us later to use the Validator class for other validations, too. - Fix or improve some comments, explaining better what we're actually doing right now. - Instead of creating an array just to use count() on it a few times, better get the count right away. - Use single-use variables only if it significantly shortens a line and improves readability.
- We don't need an else when all if()s and elseif()s die. Not using that else will reduce the indentation and improve readability. - Instead of using arrays that then need to be counted, just increase a simple counter by one each time.
We can simplify the formatting, as the newest HGVS library has new functionality that will help us later (e.g., the pter/qter recognition).
The logic has, with modifications, been migrated there.
Don't always use the result; only when it looks valid. Also fix some issues with the pter/qter to '?' replacement.
For some of the code, I couldn't figure out and I had to remove it. Overall, we ended up with more variants. That's in part because of the improvements in the HGVS library, but it also really seems the old code rejected entries sometimes that were fine. This new code also makes sure that rejected lines are actually logged, so they end up in the error log.
That way, if we have to re-run parts of the pipeline, they will simply get overwritten without any mingling by accident.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a work in progress; this PR has been created for code reviews.