ghyao/GAA
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
NAME
====
GAA - Graph Accordance of Assemly
VERSION
=======
Version 1.0
LICENCE
=======
Copyright 2011 - Guohui Yao - Washington University in St. Louis.
gyao at wustl dot edu
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
MA 02110-1301, USA.
INTRODUCTION
============
No individual assembly algorithm addresses all the known limitations
of assembling short length sequences. Overall reduced sequence contig
length is the major problem that challenges the usage of these assemblies.
GAA is designed to take advantages of different assembly algorithms or
sequencing platforms to improve the quality of next-generation sequence
(NGS) assemblies.
PREREQUISITES
=============
Perl => 5.8.9
Sequence aligners: BLAT GSMapper
INSTALLATION
====
The program is tested under LINUX systems. Below command could install
the GAA package.
export PERL5LIB=$PERL5LIB:path_to_the_gaa_package
COMMAND LINE
============
Run gaa.pl without any parameter, it will print below usage on the screen.
-.pl --target|-t <target.fa> : target consensus file, the contig names should be in the format of >Contig1.1
--query|-q <query.fa> : query consensus file, the contig names should be in the format of >Contig1.1
The following options are optional.
--target-qual|-T <target.qual> : target quality file
--query-qual|-Q <query.qual> : query quality file
--target_gap|-g <target gap file> : e.g. gap.txt for Newbler
--query_gap|-G <query gap file> : e.g. gap.txt for Newbler
--match|-m <alignment.psl> : blat target.fa query.fa -fastMap query_vs_target.psl
--pair_status_target|r <454PairStatus.txt> : generated from GSMapper
--pair_status_query|R <454PairStatus.txt> : generated from GSMapper
--ce <ce.psl> : ce status psl file
--output-dir|-o <output_dir=pwd> : default is merged_assembly under current directory
--tip-size|-p <tip_size=90> : tip tolerance
--score-cutoff-lower|-l <score_cutoff_lower=30> : confident unique contig, if score < 30
--score-cutoff-upper|-u <score-cutoff_upper=100> : confident match, if score > 100
--contig-size-cutoff|-s <contig_size_cutoff=100> : keep unique contigs of length > 100
Using -m option would shortcut the alignment step if alignment.psl alread exists.
EXAMPLES
========
$ ./gaa.pl -t newbler.fa -q cabog.fa
Merge newbler.fa and cabog.fa, and output newbler.fa_n_cabog.fa as the merged assembly to:
$ ./gaa.pl -t newbler.fa -q cabog.fa -T newbler.qual -Q cabog.qual
Also generate a quality file (newbler.qual_n_cabog.qual) for the merged assembly.
$ ./gaa.pl -t newbler.fa -q cabog.fa -t newbler.gap
Generate the gap file (g.gap) for merged assembly.
$ head target.fa
>Contig0.1
AAGgAAGCAAATtAtAAAaCtAaaGCATTAAAGATATACACATAAAAATGAAGTACTAAA
TTTATCAAGACAAAAaGCCCTGTCCAAATTGTACTATTGTCCAACTTGTACCACTTTACC
>Contig0.2
CCCTAAATTGAGAAAAaTAACTTTAGGGAATATATCAATGTACAGTGCCCGCTTCGTTAT
TCGGGAGAGAAATTCGAAAAAGTTAAAATAACATTCCGGAAATGTTAATTTTACCCTGCA
>Contig0.3
GTATTGATCCAAAATCGGTGTAAATATTACCCTTTTTAGGTGTATTAGGTGTTAAATGTA
TCCTTTTTCATGTTAATTTTACCCTTAATAAGGTGTAAAATTAACATTAAAAAATGTTGA
TATATTTTTACACCTAAAAAGTGTTAAATTTCTGAGGAAAAATGTTAATCGCACCCTCTT
$ head target.qual
>Contig0.1
64 64 64 26 64 64 64 64 64 64 50 64 26 64 16 64 64 64 22 64 39 64
39 39 64 64 55 64 64 64 64 64 64 64 64 64 64 64 64 48 64 64 64 64
64 64 64 49 45 64 64 64 64 64 64 64 64 64 64 32 64 64 64 64 64 64
$ head gap.txt
Contig0.1 20
Contig0.2 811
Contig0.3 932
Contig0.4 876
Contig0.5 1927
Contig0.6 1730
Contig0.7 20
Contig1.1 171
Contig1.2 402
Contig1.3 1382
$ head 454PairStatus.txt
Template Status Distance Left Accno Left Pos Left Dir Right Accno Right Pos Right Dir Left Distance Right Distance
FC3FKPP01BUVA8 TruePair 3967 Contig18.13 24188 - Contig18.13 20221 +
FC3FKPP01BTJNG TruePair 3820 Contig23.11 44530 + Contig23.11 48350 -
FC3FKPP01AFDGW TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01CEJ64 TruePair 2498 Contig41.19 5103 - Contig41.19 2605 +
FC3FKPP01E3K7I TruePair 3735 Contig117.3 98721 - Contig117.3 94986 +
FC3FKPP01A7AS8 FalsePair - Contig129.5 2438 + Contig129.6 1986 - 276 1986
FC3FKPP01BJX6Q FalsePair - Contig133.3 28972 + Contig140.4 1428 - 5803 1428
FC3FKPP01AWH9A TruePair 2636 Contig40.10 122575 + Contig40.10 125211 -
FC3FKPP01DCED4 MultiplyMapped - Contig18.9 1244 - Repeat
CONTACT
=======
Guohui Yao <gyao at wustl dot edu>