Online Lectures on Bioinformatics
|
Physical Mapping and Sequence Assembly
Mapping using hybridization data
In a formal setting, this procedure can be described as follows.
We start with a clone library CL of clones which correspond to subintervals of a larger contiguous
piece of DNA G, all having the same size.
From CL we select a subset
of probes P.
Each probe
is labeled and tested against the clone library. If a clone contains some
sequence part complementary to the probe sequence, the probe will hybridize to this clone and a
positive hybridization signal can be detected. The result of these experiments is a binary
clone/probe hybridization matrix
A = (ai,j)
where
The physical mapping problem is to find the order of the probes P that corresponds to their real
position in G.
A subsequent problem would then be to extend this order to the whole clone library.
Here, we do not deal with the latter question, though.
The physical mapping problem can be translated into the following optimization problem
[4]:
Given a hybridization matrix, find a permutation of the columns (probes) such that the
reordered matrix has the consecutive ones property, i.e. every row
has at most one block of consecutive ones.
Unfortunately, physical mapping by hybridization experiments is highly influenced by errors and
ambiguities: there are high rates of false positive and negative hybridization signals and
inconsistent hybridization signals caused by repetitive sequences, chimeric clones, or clones
containing deletions.
Additionally there is variation in library coverage and in clone size.
Note, that even in the error free case ambiguities may occur due to multiple solutions to the
consecutive ones problem.
In the absence of errors, all admissible probe orders can be found and characterized efficiently
using the PQ-tree data structure defined by Booth and Lueker [5].
However, in the presence of noise, there is no generalization of the PQ-tree approach, and
the problem becomes ill defined.
There are several approaches which are used to solve the physical mapping problem in the presence
of errors, see e.g. the references in [6]. Generally, a most likely probe order
is searched which globally optimizes a certain objective function.
Comments are very welcome.
luz@molgen.mpg.de
|