Online Lectures on Bioinformatics
|
Physical Mapping and Sequence AssemblySequence AssemblyCurrent technology permits experimentalists to directly determine the sequence of a DNA strand of approximately 500 nucleotides in length. To sequence a long piece of DNA many such reads are taken and subsequently re-assembled to produce the original sequence. Computationally, this gives rise to the problem of assembling the fragments using the overlap information among them. Overlaps are deduced from sequence similarity. Since 1-10% of the nucleotides in the fragment data are missing or incorrect, and since a fragment's sequence can be reversed with respect to the others these overlaps cannot be perfectly determined.The standard approach to solving sequence assembly has three steps. (1) All pairwise overlaps of fragments are determined. (2) Layout of fragments into approximate positions with chosen orientation for each fragment so that the overlaps can now be used to determine the sequence. (3) Multiply align the fragments using the fragment layout to infer the sequence.
On the biological side the phenomenon that is most difficult
to account for are internal repeats in sequences. Repeating sequences occur
naturally, especially in human genome sequencing. These can be simple
repeats (several recently identified genetic diseases are simply caused
by variations in the lengths of such repeats) or longer highly similar
300 character repeats (such repeats make up nearly 15% of the human genome).
![]() exercises Comments are very welcome. luz@molgen.mpg.de |