| TOPS++FATCAT Home | Database Searching | References | Help | FATCAT | Godzik Lab |
Given two protein structures, denote a match of two fragments (i.e., length = 8 aa), one from each protein as an Aligned Fragment Pair (AFP). Each AFP can define a transformation of two structures. The figure below shows two AFPs, which define two very different transformations of input structures.
As shown in the schematic example below, the green structure has to be rearranged (a twist introduced at the hinge, pointed by an arrow) so that the green and "#000066" structures can be better aligned (i.e., including 1-4 AFPs, instead of only two, either 1-2 or 3-4).
(a) TOPS+ graph model, (b) TOPS+ strings model, and (c) TOPS+ strings matches between Dihydropteridine reductase from rat (1dhr) and human (1hdr). All the conserved TOPS+ strings elements are shown with pink arrows. Dotted arrows indicate matched helices and strands, plain arrows indicate matched loops, and arrows with double lines indicate matched ligand-interacting loops.
We introduce a parameter r to control the strictness of constraints by TOPS+ strings alignments; r equals 0 if the alignment region is strictly restrained by TOPS+ strings alignment, and r is set to 1 by default in our program to allow certain flexibility to the constrained alignment region (following figure(c)). We then can speed up the FATCAT alignment by considering only the AFPs within the constrained alignment area (following figure (d)). The rigid structural alignment can be treated as a special case of TOPS++FATCAT, in which no twist is allowed in chaining AFPs. However, the TOPS++FATCAT program provides alignment in both, rigid mode and flexible mode (default).
The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see above text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an eligible block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.
In TOPS++FATCAT, Flexible structure alignment is formulated as an
AFP chaining process (e.g., the path connected by blue dotted lines in
the alignment graph below represents a possibe alignment) allowing at most t
twists (t=5).
Dynamic programming is used in the chaining process (as shown in the figure below).
If we denote S(k) as the best score ending at AFP k, it can be calculated from
the best ending at previous AFPs that can be connected with AFP k subject to the constraints of the consecutive,
where a(k) is the score of AFP k itself, determined by its RMSD (dk) and length (L) with
long AFPs rewarded and large RMSDs penalized;
is the score of introducing a connection between AFP m and AFP k,
defined by a function of the compatibility of the AFPs and the
mis-matched regions (p) and/or gaps (q) created by the connection of the two AFPs;
T(k) is the number of twists requi"#000066" to connect the chain of AFPs leading up to S(k).
The TOPS++FATCAT (chaining) score is the best of all S(k) in the alignment graph.
P-value is used in TOPS++FATCAT to evaluate the significance of structural similarity detected by TOPS++FATCAT, the probability of observing a greater score. It was designed based on the observation that the TOPS++FATCAT similarity score between two unrelated structures follows the extreme value distribution. Briefly, TOPS++FATCAT similarity score incorporates the TOPS++FATCAT chaining score, RMSD of the resulting superposition, the number of equivalent positions in the alignment and the number of twists.
The TOPS++FATCAT similarity score is computed as
where cs is the TOPS++FATCAT chaining score; L is the number of equivalent positions in the alignment; RMSD is the overall RMSD between two structures when one structure is rearranged at the positions where twists are detected by TOPS++FATCAT; N is the number of blocks in the alignment (number of twists + 1).
P-value of s is then computed as
where the location and the scale parameter of the EVD of TOPS++FATCAT similarity scores of random structures were determined by empirical simulation.
The length of the alignment (including gaps)
The number of equivalent positions of the alignment
opt_len = align_len - gap
The root mean square deviation (RMSD) of aligned Cα atoms of the input structures, with one input structure rearranged if flexibility is detected (i.e., twists are introduced in the alignment)
The root mean square deviation (RMSD) of aligned Cα atoms of the input structures, without structural rearrangement even structural flexibility is detected in the alignment. So in the cases with flexibility (i.e., twists are introduced to get the alignment), the value of chain-rmsd could be artifically very high (because flexible alignment is longer than rigid alignment). Yet the comparison of chain-rmsd and opt-rmsd provides a way of showing how signifcantly the conformational flexibility introduced in comparing the structures improves the alignment.