The modules can also be run as separate scripts. Code related to any particular module is organized into individual folders. The wrapper script calls individual modules of the pipeline. run-asm-pipeline-post-review.sh –h |-help Individual modules Run the following to list expected arguments: The pipeline consists of one bash wrapper script run-asm-pipeline.sh that calls individual modules to assemble a genome. GNU Parallel >=20150322 – highly recommended to increase performance.scipy numpy matplotlib - for chromosome number-aware splitter module only.Python >=2.7 - for chromosome number-aware splitter module only.LastZ (version 1.03.73 released 20150708) – for diploid mode only.Step (iv) are omitted for Hs2-HiC, which is not in the Rabl configuration and lacks substantial undercollapsed heterozygosity. This post-processing includes four steps: (i) a polishing algorithm which attempts to fix errors when cumulative 3D signal 'wins' over the diagonal (ii) a chromosome splitting algorithm, which is used to extract the chromosome-length scaffolds from the megascaffold (iii) a sealing algorithm, which detects false positives in the misjoin correction process, and restores the erroneously removed sequence from the original scaffold and (iv) a merge algorithm, which corrects misassembly errors due to undercollapsed heterozygosity in the input scaffolds. After this process is complete, the scaffolding algorithm is applied to the revised input scaffolds, and the output – a single “megascaffold” which concatenates all the chromosomes – is retained for post-processing. The ultimate effect of these iterations is to reliably detect misjoins in the input scaffolds without removing correctly assembled sequence. Finally, the edited scaffold pool is used as an input for the next iteration of the misjoin correction algorithm. Next, the misjoin correction algorithm is applied to detect errors in the scaffold pool, thus creating an edited scaffold pool. The scaffolding algorithm is used to order and orient these scaffolds. Each step begins with a scaffold pool (initially, this pool is the set of input scaffolds themselves). We begin with a series of iterative steps whose goal is to eliminate misjoins in the input scaffolds. Overview of the pipelineĪn overview of the detailed workflow of the 3D-DNA pipeline is schematically given in Fig. This software is distributed under The MIT License (MIT). De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. If you use this code or the resulting assemblies, please cite the following paper:ĭudchenko, O., Batra, S.S., Omer, A.D., Nyquist, S.K., Hoeger, M., Durand, N.C., Shamim, M.S., Machol, I., Lander, E.S., Aiden, A.P., et al. įeel free to post your questions and comments at: įor the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in (Dudchenko et al., Science, 2017) see the original commit.įor the detailed description of the merge section see. This version of the pipeline (180419) updates the merge section that aims to address errors of undercollapsed heterozygosity.įor a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see Genome Assembly Cookbook on.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |