previous section previous page next page next section
CMB

Online Lectures on Bioinformatics

navigation


Alignment statistics


Statistical Significance of Local Smith-Waterman-Alignments

According to theorem of Arratia and Waterman [AW90], there are only two possibilities for the local SW-alignment-score to grow with the increase of sequence length (for a given gap-cost-function): there's a region of linear and of logarithmic growth and between the two regions there's a sharp phase-transition.

The goal is to obtain a value for the statistical significance of a local SW-alignment by modelling the scores by a Poisson distribution analogously to the HSPs. Introducing non-overlapping local suboptimal alignments, the logic of HSP-statistics is applied to local alignments. To do that in the same way presumes, that the score of the local SW-alignment grows logarithmically with the length of the sequences (as this implies strong gap penalties). Recalling again Arratia an Waterman, there's a connection between the regions of linear and logarithmic growth of the local SW-alignment-scores and the global alignment-scores, which is summarized in the following table:

expected global
alignmet score
growth of the
local SW-alignment-score
with sequence length
positive linear
negative logarithmic


The context above is used to determine the logarithmic region depending on the gap-cost-function by global alignemts with simulated data.
$\gamma$ and p are estimated. A value for the statistical significance of a local-SW-alignment provides the formula:



A more detailed presentation of the subject as well as estimations for $\gamma$ and p and given gap-cost-functions are provided by Vingron and Waterman.

exercises
exercises


Comments are very welcome.
luz@molgen.mpg.de