Dear Author,
I am currently using the motifbreakR package to predict the impact of SNPs on motifs in wheat. While attempting to load SNP and Indel data from a VCF file into the snps.mb object using the variants.from.file function, I encountered an error. Although I verified that the index for my VCF file exists, it is in CSI format. This is because most wheat chromosomes exceed the length limitation of the TBI format. May I ask if future versions of the package could support CSI-indexed VCF files?
To address the import issue, I successfully loaded the VCF file into a GRanges object using the VariantAnnotation package. However, when running motifbreakR, I encountered a "subscript out of bounds" error. I have confirmed that the motifs object is in the correct MotifList format. Using your example code for human data, the motifDb successfully predicted disruptive SNPs. Additionally, the gr object in GRanges format has chromosome names consistent with the genome in my environment. I am using a custom-built genome, BSgenome.IWGSC.Custom, created with the forgeBSgenomeDataPkg function.
I would greatly appreciate your assistance and guidance on resolving this issue.
Thank you very much for your time and support!
Sincerely,
Chens
- error
r$> snps.mb <- variants.from.file("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.vcf.gz", search.genome = BSgenome.IWGSC.Custom, format = "vcf") Error in open.TabixFile(VcfFile(file)) : 'indexname' must be character(1)
- My solution
tabix -p vcf Use-Break_prune.vcf.gz [E::hts_idx_push] Region 537353588..537353589 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6 tbx_index_build failed: Use-Break_prune.vcf.gz
library(VariantAnnotation) vcf <- readVcf("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.Trans.vcf.gz", genome = "BSgenome.IWGSC.Custom") gr <- rowRanges(vcf)
- r$> gr
GRanges object with 92505 ranges and 5 metadata columns: seqnames ranges strand | paramRangeID REF ALT QUAL FILTER <Rle> <IRanges> <Rle> | <factor> <DNAStringSet> <DNAStringSetList> <numeric> <character> 1_3383025 chr1A 3383025 * | NA G A NA . 1_3383082 chr1A 3383082 * | NA G C NA . seqinfo: 21 sequences from BSgenome.IWGSC.Custom genome
- Final mistake
r$> # run motifbreakR results <- motifbreakR(gr, motifs, threshold = 0.85, method = "ic", BPPARAM = BiocParallel::SerialParam() ) Error in as.character(package)[[1L]] : subscript out of bounds
- verify
r$> class(motifs) [1] "MotifList" attr(,"package") [1] "MotifDb"
Dear Author,
I am currently using the motifbreakR package to predict the impact of SNPs on motifs in wheat. While attempting to load SNP and Indel data from a VCF file into the snps.mb object using the variants.from.file function, I encountered an error. Although I verified that the index for my VCF file exists, it is in CSI format. This is because most wheat chromosomes exceed the length limitation of the TBI format. May I ask if future versions of the package could support CSI-indexed VCF files?
To address the import issue, I successfully loaded the VCF file into a GRanges object using the VariantAnnotation package. However, when running motifbreakR, I encountered a "subscript out of bounds" error. I have confirmed that the motifs object is in the correct MotifList format. Using your example code for human data, the motifDb successfully predicted disruptive SNPs. Additionally, the gr object in GRanges format has chromosome names consistent with the genome in my environment. I am using a custom-built genome, BSgenome.IWGSC.Custom, created with the forgeBSgenomeDataPkg function.
I would greatly appreciate your assistance and guidance on resolving this issue.
Thank you very much for your time and support!
Sincerely,
Chens
r$> snps.mb <- variants.from.file("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.vcf.gz", search.genome = BSgenome.IWGSC.Custom, format = "vcf") Error in open.TabixFile(VcfFile(file)) : 'indexname' must be character(1)tabix -p vcf Use-Break_prune.vcf.gz [E::hts_idx_push] Region 537353588..537353589 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6 tbx_index_build failed: Use-Break_prune.vcf.gzlibrary(VariantAnnotation) vcf <- readVcf("/home/chens/my_data/wheat_SNP/01jiaoyuling/MotifBreakR/Use-Break_prune.Trans.vcf.gz", genome = "BSgenome.IWGSC.Custom") gr <- rowRanges(vcf)GRanges object with 92505 ranges and 5 metadata columns: seqnames ranges strand | paramRangeID REF ALT QUAL FILTER <Rle> <IRanges> <Rle> | <factor> <DNAStringSet> <DNAStringSetList> <numeric> <character> 1_3383025 chr1A 3383025 * | NA G A NA . 1_3383082 chr1A 3383082 * | NA G C NA . seqinfo: 21 sequences from BSgenome.IWGSC.Custom genomer$> # run motifbreakR results <- motifbreakR(gr, motifs, threshold = 0.85, method = "ic", BPPARAM = BiocParallel::SerialParam() ) Error in as.character(package)[[1L]] : subscript out of boundsr$> class(motifs) [1] "MotifList" attr(,"package") [1] "MotifDb"