Support for CSI-Indexed VCF Files and Error in motifbreakR

Dear Author,

I am currently using the motifbreakR package to predict the impact of SNPs on motifs in wheat. While attempting to load SNP and Indel data from a VCF file into the snps.mb object using the variants.from.file function, I encountered an error. Although I verified that the index for my VCF file exists, it is in CSI format. This is because most wheat chromosomes exceed the length limitation of the TBI format. May I ask if future versions of the package could support CSI-indexed VCF files?

To address the import issue, I successfully loaded the VCF file into a GRanges object using the VariantAnnotation package. However, when running motifbreakR, I encountered a "subscript out of bounds" error. I have confirmed that the motifs object is in the correct MotifList format. Using your example code for human data, the motifDb successfully predicted disruptive SNPs. Additionally, the gr object in GRanges format has chromosome names consistent with the genome in my environment. I am using a custom-built genome, BSgenome.IWGSC.Custom, created with the forgeBSgenomeDataPkg function.

I would greatly appreciate your assistance and guidance on resolving this issue.

Thank you very much for your time and support!

Sincerely,
Chens

1. error

`r$> snps.mb <- variants.from.file("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.vcf.gz", search.genome = BSgenome.IWGSC.Custom, format = "vcf")
Error in open.TabixFile(VcfFile(file)) : 'indexname' must be character(1)`

2. My solution
`tabix -p vcf Use-Break_prune.vcf.gz
[E::hts_idx_push] Region 537353588..537353589 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
tbx_index_build failed: Use-Break_prune.vcf.gz`

`library(VariantAnnotation)
vcf <- readVcf("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.Trans.vcf.gz", genome = "BSgenome.IWGSC.Custom")
gr <- rowRanges(vcf)`


3. r$> gr
`
GRanges object with 92505 ranges and 5 metadata columns:
               seqnames          ranges strand | paramRangeID            REF                ALT      QUAL      FILTER
                  <Rle>       <IRanges>  <Rle> |     <factor> <DNAStringSet> <DNAStringSetList> <numeric> <character>
     1_3383025    chr1A         3383025      * |           NA              G                  A        NA           .
     1_3383082    chr1A         3383082      * |           NA              G                  C        NA           . 
   seqinfo: 21 sequences from BSgenome.IWGSC.Custom genome`
 4. Final mistake
` 
r$> # run motifbreakR
    results <- motifbreakR(gr,
        motifs,
        threshold = 0.85,
        method = "ic",
        BPPARAM = BiocParallel::SerialParam()
    )
Error in as.character(package)[[1L]] : subscript out of bounds`
5. verify
`r$> class(motifs)
[1] "MotifList"
attr(,"package")
[1] "MotifDb"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for CSI-Indexed VCF Files and Error in motifbreakR #56

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for CSI-Indexed VCF Files and Error in motifbreakR #56

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions