previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Physical Mapping and Sequence Assembly


Mapping using hybridization data

In a formal setting, this procedure can be described as follows. We start with a clone library CL of clones which correspond to subintervals of a larger contiguous piece of DNA G, all having the same size. From CL we select a subset $P \subset CL$ of probes P. Each probe $p_i \in P$ is labeled and tested against the clone library. If a clone contains some sequence part complementary to the probe sequence, the probe will hybridize to this clone and a positive hybridization signal can be detected. The result of these experiments is a binary clone/probe hybridization matrix A = (ai,j) where

\begin{displaymath}a_{ij}:= \left\{ \begin{array}{r@{\quad}l}
1 & \mbox{if prob...
...clone $c_i$ ;} \\
0 & \mbox{otherwise.}
\end{array} \right. \end{displaymath}




The physical mapping problem is to find the order of the probes P that corresponds to their real position in G. A subsequent problem would then be to extend this order to the whole clone library. Here, we do not deal with the latter question, though. The physical mapping problem can be translated into the following optimization problem [4]: Given a hybridization matrix, find a permutation of the columns (probes) such that the reordered matrix has the consecutive ones property, i.e. every row has at most one block of consecutive ones. Unfortunately, physical mapping by hybridization experiments is highly influenced by errors and ambiguities: there are high rates of false positive and negative hybridization signals and inconsistent hybridization signals caused by repetitive sequences, chimeric clones, or clones containing deletions. Additionally there is variation in library coverage and in clone size. Note, that even in the error free case ambiguities may occur due to multiple solutions to the consecutive ones problem. In the absence of errors, all admissible probe orders can be found and characterized efficiently using the PQ-tree data structure defined by Booth and Lueker [5]. However, in the presence of noise, there is no generalization of the PQ-tree approach, and the problem becomes ill defined. There are several approaches which are used to solve the physical mapping problem in the presence of errors, see e.g. the references in [6]. Generally, a most likely probe order is searched which globally optimizes a certain objective function.



Comments are very welcome.
luz@molgen.mpg.de