Scaffolding |
ABySS |
-
–
Paired-end scaffolding.
-
–
Scaffolding feature already integrated in the ABySS de novo assembly pipeline.
-
–
Uses the estimated distances generated by the program DistanceEst (from the same package) as input.
-
–
Allows the scaffolding using long-reads, such as those generated by PacBio and Oxford Nanopore platforms.
|
boost libraries:
www.boost.org/
Open MPI:
http://www.open-mpi.org
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
|
(Simpson et al., 2009Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM and Birol I (2009) ABySS: A parallel assembler for short read sequence data. Genome Res 19:1117-23.) |
http://www.bcgsc.ca/platform/bioinfo/software/abyss
|
Scaffolding |
Bambus 2 |
-
–
Paired-end scaffolding.
-
–
Can be easily integrated with assembly projects that are built on top of the AMOS package.
-
–
Supports the scaffolding of metagenomes.
-
–
Requires experience with the AMOS package and its data formats.
|
AMOS package (Treangen et al., 2011Treangen TJ, Sommer DD, Angly FE, Koren S and Pop M (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics 33:11.8.1-11.8.18.):
http://amos.sourceforge.net/
|
(Koren et al., 2011Koren S, Treangen TJ and Pop M (2011) Bambus 2: Scaffolding metagenomes. Bioinformatics 27:2964-2971.) |
https://sourceforge.net/projects/amos/
|
Scaffolding |
MIP |
|
lpsolve library:
http://sourceforge.net/projects/lpsolve/ lemon library: http://lemon.cs.elte.hu/
|
(Salmela et al., 2011Salmela L, Mäkinen V, Välimäki N, Ylinen J and Ukkonen E (2011) Fast scaffolding with small independent mixed integer programs. Bioinformatics 27:3259-3265.) |
https://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/
|
Scaffolding |
OPERA |
-
–
Paired-end scaffolding.
-
–
Identifies potential spurious connections caused by chimeric reads and repetitive genomics elements that may affect the reliability of the scaffolding.
-
–
Contigs identified as misassembled may be used in the construction of more than one scaffold, but sometimes it may lead to new assembly errors.
|
BWA (Li and Durbin 2009Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754-1760.):
http://bio-bwa.sourceforge.net/
Bowtie (Langmead et al., 2009Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.):
http://bowtie-bio.sourceforge.net/ Samtools (Li et al., 2009Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G and Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.): http://samtools.sourceforge.net/
|
(Gao et al., 2011Gao S, Sung W-K and Nagarajan N (2011) Opera: Reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 18:1681-1691.) |
https://sourceforge.net/projects/operasf
|
Scaffolding |
SCARPA |
|
None |
(Donmez and Brudno, 2013Donmez N and Brudno M (2013) SCARPA: Scaffolding reads with practical algorithms. Bioinformatics 29:428-434.) |
http://compbio.cs.toronto.edu/hapsembler/scarpa.html
|
Scaffolding |
SGA |
-
–
Paired-end scaffolding.
-
–
Scaffolding feature already integrated in the SGA assembly pipeline, which is optimized for Illumina data and large genomes.
-
–
Uses the estimated distances generated by the program DistanceEst (from the package ABySS) as input, along with the read mapping file in .BAM format.
-
–
Allows multiple libraries to be used in the same scaffolding project.
|
Bamtools (Barnett et al., 2011Barnett DW, Garrison EK, Quinlan AR, Stromberg MP and Marth GT (2011) BamTools: A C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27:1691-1692.):
https://github.com/pezmaster31/bamtools
BWA (Li and Durbin, 2009Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754-1760.): http://bio-bwa.sourceforge.net/ Samtools (Li et al., 2009Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G and Durbin R (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.): http://samtools.sourceforge.net/
Sparse-hash library: http://goog-sparsehash.sourceforge.net/
|
(Simpson and Durbin, 2012Simpson JT and Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549-556.) |
https://github.com/jts/sga
|
Scaffolding |
SOPRA |
-
–
Paired-end scaffolding.
-
–
Developed to improve the assemblies generated by Velvet and SSAKE, and required the .AFG files.
-
–
Supports data from early Illumina and ABI SOLiD platforms, including paired-end and mate-pair reads.
-
–
Is not fully automated, so it is necessary to run different scripts for each step of the scaffolding.
|
None |
(Dayarian et al., 2010Dayarian A, Michael TP and Sengupta AM (2010) SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics 11:345.) |
http://www.physics.rutgers.edu/~anirvans/SOPRA/
|
Scaffolding |
SSPACE |
-
–
Paired-end scaffolding.
-
–
Trims the edge of the contigs as they are more suitable to assembly errors.
-
–
Requires information about the paired-end library, including mean size of the insert, standard deviation and the relative orientation of the mates.
|
None |
(Boetzer et al., 2011Boetzer M, Henkel CV, Jansen HJ, Butler D and Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578-579.) |
http://www.baseclear.com/genomics/bioinformatics/basetools/
|
Scaffolding |
SSPACE-LongRead |
|
None |
(Boetzer and Pirovano, 2014Boetzer M and Pirovano W (2014) SSPACE-LongRead: Scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics 15:211.) |
http://www.baseclear.com/genomics/bioinformatics/basetools/
|
Scaffolding |
MUMmer |
|
|
(Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.) |
http://mummer.sourceforge.net/
|
Scaffolding |
ABACAS |
-
–
Single reference-based scaffolding.
-
–
Useful when the reference and the target genome are closely-related, and the genome to be scaffolded is not larger than the reference genome.
-
–
Not optimized for bacteria with two or more replicons/chromosomes (ex: Leptospira genus).
-
–
Allows the design of primers for gap-closing.
|
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
Primer3 (Koressaar and Remm, 2007Koressaar T and Remm M (2007) Enhancements and modifications of primer design program Primer3. Bioinformatics 23:1289-1291.; Untergasser et al., 2012Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M and Rozen SG (2012) Primer3 - new capabilities and interfaces. Nucleic Acids Res 40:e115.):
http://primer3.ut.ee/
|
(Assefa et al., 2009Assefa S, Keane TM, Otto TD, Newbold C and Berriman M (2009) ABACAS: Algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25:1968-1969.) |
http://abacas.sourceforge.net/
|
Scaffolding |
CONTIGuator |
-
–
Single reference-based scaffolding.
-
–
Useful when the target genome is composed by more than one chromosome / replicon.
-
–
Allows a more sensitive identification of syntenic regions, if compared to ABACAS, as it applies a BLAST search after MUMmmer.
|
ABACAS (Assefa et al., 2009Assefa S, Keane TM, Otto TD, Newbold C and Berriman M (2009) ABACAS: Algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25:1968-1969.):
http://abacas.sourceforge.net/
BioPython (Python package):
http://biopython.org/
BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
Primer3 (Koressaar and Remm, 2007Koressaar T and Remm M (2007) Enhancements and modifications of primer design program Primer3. Bioinformatics 23:1289-1291.; Untergasser et al., 2012Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M and Rozen SG (2012) Primer3 - new capabilities and interfaces. Nucleic Acids Res 40:e115.):
http://primer3.ut.ee/
|
(Galardini et al., 2011Galardini M, Biondi EG, Bazzicalupo M and Mengoni A (2011) CONTIGuator: A bacterial genomes finishing tool for structural insights on draft genomes. Source Code Biol Med 6:11.) |
http://contiguator.sourceforge.net/
|
Scaffolding |
Mauve |
-
–
Single reference- based scaffolding.
-
–
Can be used both through a commandline interface (CLI) and a graphical user interface (GUI).
-
–
Allows the identification of genomic inversions and translocations.
-
–
Not optimized for bacteria with two or more replicons/chromosomes.
|
Java:
https://www.java.com/
|
(Darling et al., 2004Darling ACE, Mau B, Blattner FR and Perna NT (2004) Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394-1403.; Rissman et al., 2009Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD and Perna NT (2009) Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 25:2071-2073.) |
http://darlinglab.org/mauve/mauve.html
|
Scaffolding |
FillScaffolds |
-
–
Single reference- based scaffolding.
-
–
Not optimized for bacteria with two or more replicons/chromosomes.
-
–
Results may require post-processing to reconstruct the sequence of the scaffold.
|
Java:
https://www.java.com/
|
(Muñoz et al., 2010Muñoz A, Zheng C, Zhu Q, Albert VA, Rounsley S and Sankoff D (2010) Scaffold filling, contig fusion and comparative gene order inference. BMC Bioinformatics 11:304.) |
Supplementary data of Muñoz et al. (2010)Muñoz A, Zheng C, Zhu Q, Albert VA, Rounsley S and Sankoff D (2010) Scaffold filling, contig fusion and comparative gene order inference. BMC Bioinformatics 11:304.. http://dx.doi.org/10.1186/1471-2105-11-304
|
Scaffolding |
SIS |
-
–
Single reference-based scaffolding.
-
–
Allows the identification of genomic inversions.
-
–
Not optimized for bacteria with two or more replicons/chromosomes.
|
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Dias et al., 2012Dias Z, Dias U and Setubal JC (2012) SIS: A program to generate draft genome sequence scaffolds for prokaryotes. BMC Bioinformatics 13:96.) |
http://marte.ic.unicamp.br:8747. |
Scaffolding |
CAR |
-
–
Single reference-based scaffolding.
-
–
Allows the identification of genomic inversions and translocations.
-
–
Also available as a webserver.
-
–
Not optimized for bacteria with two or more replicons/chromosomes.
|
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
PHP:
https://php.net/
|
(Lu et al., 2014Lu C, Chen K-T, Huang S-Y and Chiu H-T (2014) CAR: Contig assembly of prokaryotic draft genomes using rearrangements. BMC Bioinformatics 15:381.) |
http://genome.cs.nthu.edu.tw/CAR/
|
Scaffolding |
RACA |
-
–
Multiple reference-based scaffolding.
-
–
Optimized for large genomes and with multiple chromosomes.
-
–
Can also use paired-end data.
|
None |
(Kim et al., 2013Kim J, Larkin DM, Cai Q, Asan, Zhang Y, Ge R-L, Auvil L, Capitanu B, Zhang G, Lewin HA, et al. (2013) Reference-assisted chromosome assembly. Proc Natl Acad Sci U S A 110:1785-1790.): |
http://bioen-compbio.bioen.illinois.edu/RACA/
|
Scaffolding |
Ragout |
|
Networkx (Python package):
http://networkx.github.io/
Newick (Python package):
http://www.daimi.au.dk/~mailund/newick.html
Sibelia:
http://github.com/bioinf/Sibelia
|
(Kolmogorov et al., 2014Kolmogorov M, Raney B, Paten B and Pham S (2014) Ragout - a reference-assisted assembly tool for bacterial genomes. Bioinformatics 30:i302-i309.) |
https://github.com/fenderglass/Ragout
|
Scaffolding |
MeDuSa |
|
BioPython (Python package):
http://biopython.org/
Java:
https://www.java.com/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Bosi et al., 2015Bosi E, Donati B, Galardini M, Brunetti S, Sagot M-F, Lió P, Crescenzi P, Fani R and Fondi M (2015) MeDuSa: A multi-draft based scaffolder. Bioinformatics 31:2443-2451.) |
https://github.com/combogenomics/medusa
|
Assembly integration |
Minimus |
|
AMOS package (Treangen et al., 2011Treangen TJ, Sommer DD, Angly FE, Koren S and Pop M (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics 33:11.8.1-11.8.18.):
http://amos.sourceforge.net/
|
(Sommer et al., 2007Sommer DD, Delcher AL, Salzberg SL and Pop M (2007) Minimus: A fast, lightweight genome assembler. BMC Bioinformatics 8:64.) |
https://sourceforge.net/projects/amos/
|
Assembly integration |
Reconciliator |
|
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Zimin et al., 2008Zimin AV, Smith DR, Sutton G and Yorke JA (2008) Assembly reconciliation. Bioinformatics 24:42-45.) |
http://www.genome.umd.edu/
|
Assembly integration |
MAIA |
-
–
Allows the integration of two or more assemblies.
-
–
Accepts reference genome to perform scaffolding, what is useful for those contigs without correspondence in the other assemblies.
|
Matlab:
https://www.mathworks.com/
MUMmer:
http://mummer.sourceforge.net/
GAIMC (Matlab toolbox):
http://github.com/dgleich/gaimc
|
(Nijkamp et al., 2010Nijkamp J, Winterbach W, van den Broek M, Daran J-M, Reinders M and de Ridder D (2010) Integrating genome assemblies with MAIA. Bioinformatics 26:433-439.) |
http://bioinformatics.tudelft.nl
|
Assembly integration |
CISA |
|
BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Lin and Liao, 2013Lin S-H and Liao Y-C (2013) CISA: Contig integrator for sequence assembly of bacterial genomes. PLoS One 8:e60843.) |
http://sb.nhri.org.tw/CISA/
|
Assembly integration |
GAA |
|
BLAT (Kent, 2002Kent WJ (2002) BLAT: The BLAST-like alignment tool. Genome Res 12:656-664.):
https://genome.ucsc.edu/
GSMapper:
http://454.com/
|
(Yao et al., 2012Yao G, Ye L, Gao H, Minx P, Warren WC and Weinstock GM (2012) Graph accordance of next-generation sequence assemblies. Bioinformatics 28:13-16.) |
http://sourceforge.net/projects/gaa-wugi/
|
Assembly integration |
Mix |
|
Networkx (Python package):
http://networkx.lanl.gov/
BioPython (Python package):
http://biopython.org/
MUMmer(Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Soueidan et al., 2013Soueidan H, Maurier F, Groppi A, Sirand-Pugnet P, Tardy F, Citti C, Dupuy V and Nikolski M (2013) Finishing bacterial genome assemblies with Mix. BMC Bioinformatics 14 Suppl 1:S16.) |
https://github.com/cbib/MIX
|
Assembly integration |
GAM / GAM-NGS |
-
–
Requires the read files to perform the assembly integration.
-
–
One of the assemblies to be merged is defined as “master”, while the others are defined as “slaves”.
-
–
Allows the identification of misassembled regions in the master, which are corrected before the generation of the final assembly.
|
cmake:
https://cmake.org/
zlib library:
http://www.zlib.net/
boost libraries:
www.boost.org/
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
|
(Casagrande et al., 2009Casagrande A, Del Fabbro C, Scalabrin S and Policriti A (2009) GAM: Genomic Assemblies Merger: A graph based method to integrate different assemblies. IEEE Int Conf Bioinform Biomed 2009:321-326.; Vicedomini et al., 2013Vicedomini R, Vezzi F, Scalabrin S, Arvestad L and Policriti A (2013) GAM-NGS: Genomic assemblies merger for next generation sequencing. BMC Bioinformatics 14 Suppl 7:S6.) |
https://github.com/vice87/gam-ngs
|
Assembly integration |
Zorro |
-
–
Requires the read files to perform the assembly integration.
-
–
Remaps the reads back to the contigs and identifies misassembled and repetitive regions based on the coverage.
-
–
Splits the misassembled contigs and performs the assembly integration using Minimus.
|
AMOS (Treangen et al., 2011Treangen TJ, Sommer DD, Angly FE, Koren S and Pop M (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics 33:11.8.1-11.8.18.):
http://amos.sourceforge.net/
BioPerl (Perl module):
http://bioperl.org
Bowtie (Langmead et al., 2009Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.):
http://bowtie-bio.sourceforge.net/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Argueso et al., 2009Argueso JL, Carazzolle MF, Mieczkowski PA, Duarte FM, Netto OVC, Missawa SK, Galzerani F, Costa GGL, Vidal RO, Noronha MF, et al. (2009) Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production. Genome Res 19:2258-2270.) |
http://lge.ibi.unicamp.br/zorro/
|
Gap closing |
GapCloser |
|
None |
(Li et al., 2010Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, et al. (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265-272.) |
http://soap.genomics.org.cn/
|
Gap closing |
IMAGE |
|
None |
(Tsai et al., 2010Tsai IJ, Otto TD and Berriman M (2010) Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol 11:R41.) |
https://sourceforge.net/projects/image2
|
Gap closing |
GapFiller |
-
–
Iteratively performs a remapping of the reads to the contigs, followed by the selection of those that overlap the gap region and a local reassembly.
-
–
Requires information about the paired-end library, including mean size of the insert, its standard deviation and the relative orientation of the mates.
|
None |
(Boetzer and Pirovano, 2012Boetzer M and Pirovano W (2012) Toward almost closed genomes with GapFiller. Genome Biol 13:R56.) |
http://www.baseclear.com/genomics/bioinformatics/basetools
|
Gap closing |
Enly |
-
–
Iteratively performs a remapping of the reads to the contigs, followed by the selection of those that overlap the gap region and a local reassembly.
-
–
If a reference genome is provided, a new scaffolding step can be performed to improve the assembly.
|
BioPython (Python package):
http://biopython.org/
BLAST and BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Cdbfasta/cdbyank:
http://compbio.dfci.harvard.edu/tgi/software/
EMBOSS:
http://emboss.sourceforge.net/
Minimo assembler (Treangen et al., 2011Treangen TJ, Sommer DD, Angly FE, Koren S and Pop M (2011) Next generation sequence assembly with AMOS. Curr Protoc Bioinformatics 33:11.8.1-11.8.18.):
http://amos.sourceforge.net/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
Phrap: http://www.phrap.org/phredphrapconsed.html
|
(Fondi et al., 2014Fondi M, Orlandini V, Corti G, Severgnini M, Galardini M, Pietrelli A, Fuligni F, Iacono M, Rizzi E, De Bellis G, et al. (2014) Enly: Improving draft genomes through reads recycling. J Genomics 2:89-93.) |
http://enly.sourceforge.net/
|
Gap closing |
FGAP |
|
Matlab:
https://www.mathworks.com/
|
(Piro et al., 2014Piro VC, Faoro H, Weiss VA, Steffens MBR, Pedrosa FO, Souza EM and Raittz RT (2014) FGAP: An automated gap closing tool. BMC Res Notes 7:371.) |
http://www.bioinfo.ufpr.br/fgap/
|
Gap closing |
Sealer |
|
boost libraries:
www.boost.org/
sparse-hash library:
http://goog-sparsehash.sourceforge.net/
Open MPI:
http://www.open-mpi.org
|
(Paulino et al., 2015Paulino D, Warren RL, Vandervalk BP, Raymond A, Jackman SD and Birol I (2015) Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinformatics 16:230.) |
https://github.com/bcgsc/abyss/tree/sealer-release
|
Gap closing |
GMCLoser |
|
MUMmer (Kurtz et al. 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Bowtie (Langmead et al., 2009Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.):
http://bowtie-bio.sourceforge.net/
YASS (Noé and Kucherov, 2005Noé L and Kucherov G (2005) YASS: Enhancing the sensitivity of DNA similarity search. Nucleic Acids Res 33:W540-W543.):
http://bioinfo.lifl.fr/yass
|
(Kosugi et al., 2015Kosugi S, Hirakawa H and Tabata S (2015) GMcloser: Closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics 31:3733-3741.) |
https://sourceforge.net/projects/gmcloser/
|
Gap closing |
MapRepeat |
|
BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
BioPython (Python package):
http://biopython.org/
MIRA:
http://mira-assembler.sourceforge.net
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(Mariano et al., 2015Mariano DC, Pereira FL, Ghosh P, Barh D, Figueiredo HC, Silva A, Ramos RT and Azevedo VA (2015) MapRepeat: An approach for effective assembly of repetitive regions in prokaryotic genomes. Bioinformation 11:276-9.) |
http://github.com/dcbmariano/maprepeat
|
Gap closing |
GapBlaster |
|
BLAST and BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
|
(de Sá et al., 2016de Sá PHCG, Miranda F, Veras A, de Melo DM, Soares S, Pinheiro K, Guimarães L, Azevedo V, Silva A and Ramos RTJ (2016) GapBlaster-A graphical gap filler for prokaryote genomes. PLoS One 11:e0155327.) |
https://sourceforge.net/projects/gapblaster2015/
|
Assembly evaluation |
REAPR |
-
–
Calculates the accuracy of the assembly based on the coverage after remapping the reads back to the scaffolds.
-
–
Misassembled regions can be identified as they usually present a discrepant coverage.
-
–
A new set of scaffolds is generated by splitting the regions identified as misassembled.
|
File::Basename, File::Copy, File::Spec, File::Spec::Link, Getopt::Long and List::Util (Perl modules):
http://www.cpan.org/
R:
https://www.r-project.org/
|
(Hunt et al., 2013Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M and Otto TD (2013) REAPR: A universal tool for genome assembly evaluation. Genome Biol 14:R47.) |
http://www.sanger.ac.uk/science/tools/reapr
|
Assembly evaluation |
QUAST |
-
–
Calculate several assembly metrics, such as C+G%, N50 and L50.
-
–
Can be used to compare different assemblies for the same genome, and / or compare then to a reference genome.
|
boost libraries:
www.boost.org/
Java:
https://www.java.com/
Matplotlib (Python package):
http://matplotlib.org
Time::HiRes (Perl module):
http://www.cpan.org/
|
(Gurevich et al., 2013Gurevich A, Saveliev V, Vyahhi N and Tesler G (2013) QUAST: Quality assessment tool for genome assemblies. Bioinformatics 29:1072-1075.) |
http://bioinf.spbau.ru/quast
|
Assembly evaluation |
ALE |
|
Matplotlib (Python package):
http://matplotlib.org
Mpmath (Python package):
http://mpmath.org
Numpy (Python package):
http://www.numpy.org
Pymix (Python package):
http://www.pymix.org/pymix
Setuptools (Python package):
https://github.com/pypa/setuptools
|
(Clark et al., 2013Clark SC, Egan R, Frazier PI and Wang Z (2013) ALE: A generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 29:435-43.) |
http://www.alescore.org
|
Assembly evaluation |
CGAL |
|
None |
(Rahman and Pachter, 2013Rahman A and Pachter L (2013) CGAL: Computing genome assembly likelihoods. Genome Biol 14:R8.) |
http://bio.math.berkeley.edu/cgal/
|
Assembly evaluation |
GMvalue |
|
MUMmer (Kurtz et al., 2004Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C and Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.):
http://mummer.sourceforge.net/
BLAST+ (Altschul et al., 1990Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-410.; Camacho et al., 2009Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K and Madden TL (2009) BLAST+: Architecture and applications. BMC Bioinformatics 10:421):
ftp://ftp.ncbi.nlm.nih.gov/blast/
Bowtie (Langmead et al., 2009Langmead B, Trapnell C, Pop M and Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.):
http://bowtie-bio.sourceforge.net/
YASS (Noé and Kucherov, 2005Noé L and Kucherov G (2005) YASS: Enhancing the sensitivity of DNA similarity search. Nucleic Acids Res 33:W540-W543.):
http://bioinfo.lifl.fr/yass
|
(Kosugi et al., 2015Kosugi S, Hirakawa H and Tabata S (2015) GMcloser: Closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments. Bioinformatics 31:3733-3741.) |
https://sourceforge.net/projects/gmcloser/
|
Assembly correction |
iCORN |
-
–
Requires paired-end reads.
-
–
Interactively identifies and corrects short misassemblies, such as base-substitutions and short INDELs.
|
SNP-o-matic (Manske and Kwiatkowski, 2009Manske HM and Kwiatkowski DP (2009) SNP-o-matic. Bioinformatics 25:2434-2435.):
https://snpomatic.svn.sourceforge.net/svnroot/snpomatic
SSAHA Pileup (Ning et al., 2001Ning Z, Cox AJ and Mullikin JC (2001) SSAHA: A fast search method for large DNA databases. Genome Res 11:1725-1729.):
ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/
|
(Otto et al., 2010Otto TD, Sanders M, Berriman M and Newbold C (2010) Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26:1704-1707.) |
http://icorn.sourceforge.net/
|
Assembly correction |
SEQuel |
-
–
Requires paired-end reads.
-
–
Interactively identifies and corrects short misassemblies, such as base-substitutions and short INDELs.
-
–
Performs a local reassembly of the misassembled regions using information from k-mers and paired-end reads.
|
Java:
https://www.java.com/
JGraphT (Java library):
http://jgrapht.org/
|
(Ronen et al., 2012Ronen R, Boucher C, Chitsaz H and Pevzner P (2012) SEQuel: Improving the accuracy of genome assemblies. Bioinformatics 28:188-196.) |
http://bix.ucsd.edu/SEQuel/
|
Assembly correction |
GFinisher |
-
–
Doesn't require paired-end reads.
-
–
Integrates a reference-guided scaffolding step and gap-closing procedures, along with the assembly correction process.
-
–
Identifies misassembled regions based on the GC-Skew distribution.
|
Java:
https://www.java.com/
|
(Guizelini et al., 2016Guizelini D, Raittz RT, Cruz LM, Souza EM, Steffens MBR and Pedrosa FO (2016) Gfinisher: A new strategy to refine and finish bacterial genome assemblies. Sci Rep 6:34963.) |
http://gfinisher.sourceforge.net/
|