Genome assembly

From Freepedia

Genome assembly refers to the process of taking a large number of short DNA sequences, all of which were generated by a shotgun sequencing project, and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 900 nucleotides or bases at a time. (The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT.) A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues.

Genome assembly is a very difficult computational problem, made more difficult because genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and some occur in thousands of different locations, especially in the large genomes of plants and animals.

Assembly software

AMOS (A Modular, Open-Source assembler) is well-known open source effort is under way to bring together the efforts of leading genome assembly code developers. The home of AMOS is currently http://www.tigr.org/software/AMOS. AMOS was initiated at The Institute for Genomic Research by Steven Salzberg, Mihai Pop, and Art Delcher.

The Celera Assembler was the assembler developed by Gene Myers, Granger Sutton, Art Delcher, and others at Celera Genomics from 1998 until approximately 2002. It was moved to SourceForge and continues to be developed by the original scientists and others, at http://sourceforge.net/projects/wgs-assembler.



Views
Personal tools
Similar Links