Skip to content

Rule map_sort sometimes fails for ema #73

@pontushojer

Description

@pontushojer

I have been having intermittent issues with runs failing at the map_sort step. This seems to be due to a formatting error in the ema align output that results in samtools sort failing. The formatting error is usually related to the cigar, see my issue arshajii/ema#39.

To solve this I have so far mapped manually with ema and the parsed the SAM output with this script found below. This simply skips any entries that are wrongly formatted. The I manually sort the output using samtools and name is initialmapping.bam to integrate it to the workflow again.

"""
Parse filter BAM/SAM
"""
import argparse
import pysam
from tqdm import tqdm


def main(args):
    in_mode = "rb" if args.input.endswith(".bam") else "r"
    out_mode = "wb" if args.output.endswith(".bam") else "w"
    with pysam.AlignmentFile(args.input, mode=in_mode) as infile:
        with pysam.AlignmentFile(args.output, mode=out_mode, template=infile) as outfile:
            progress = tqdm()
            parser = infile.fetch(until_eof=True)
            while True:
                try:
                    read = next(parser)
                except StopIteration:
                    break
                except OSError:
                    continue
                progress.update()
                outfile.write(read)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("input", help="Input SAM/BAM")
    parser.add_argument("output", help="Output SAM/BAM")
    main(parser.parse_args())

Issue copied from FrickTobias/BLR#235

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions