___________________________________________________________________________ MATCH-BOX_server 1.1 15-Oct-97 09:52:08 Internet address: matchbox@urbm.fundp.ac.be WEB: http://www.fundp.ac.be/sciences/biologie/bms/matchbox_submit.html ___________________________________________________________________________ A L II GGG N N A A L G G N N N A A L II G N N N AAAAAAA L II G GGG N N N A A L II G G N NN A A LLLLLL II GGGGG N N ___________________________________________________________________________ Table 1: submitted set of 4 sequences ---------------------------------------- >1cbr_A sequence number 1 136 aa PNFAGTWKMR SSENFDELLK ALGVNAMLRK VAVAAASKPH VEIRQDGDQF YIKTSTTVRT TEINFKVGEG FEEETVDGRK CRSLPTWENE NKIHCTQTLL EGDGPKTYWT RELANDELIL TFGADDVVCT RIYVRE >1opa_A sequence number 2 134 aa MTKDQNGTWE MESNENFEGY MKALDIDFAT RKIAVRLTQT KIIVQDGDNF KTKTNSTFRN YDLDFTVGVE FDEHTKGLDG RNVKTLVTWE GNTLVCVQKG EKENRGWKQW VEGDKLYLEL TCGDQVCRQV FKKK-- >2hmb sequence number 3 132 aa VDAFLGTWKL VDSKNFDDYM KSLGVGFATR QVASMTKPTT IIEKNGDILT LKTHSTFKNT EISFKLGVEF DETTADDRKV KSIVTLDGGK LVHLQKWDGQ ETTLVRELID GKLILTLTHG TAVCTRTYEK EA---- >1ifb sequence number 4 131 aa AFDGTWKVDR NENYEKFMEK MGINVVKRKL GAHDNLKLTI TQEGNKFTVK ESSNFRNIDV VFELGVDFAY SLADGTELTG TWTMEGNKLV GKFKRVDNGK ELIAVREISG NELIQTYTYE GVEAKRIFKK E----- The basic principle of Match-Box is to delineate boxes of similar segments in ALL the sequences. In one box, any segment is significantly similar to any other one.Similarity between segments is computed from the scoring matrix, and the matching criterion is defined by a statistical cutoff. The current score matrix used for the sequence analysis is blosum62.sco In the final alignment, the selected boxes are only a subset of all the boxes found. Boxes incompatible with the proposed aligment, if any, are rejected. Table 2 shows how many boxes have been selected and rejected. in the final alignment, and their length. Table 3 displays selected boxes In a successful alignment, rejected boxes are normally short boxes. A large rejected box would be an indication of a possible misalignment. Table 2: Boxes length distribution ------------------------------------ Length Frequency Selected Rejected 9 1 9 10 1 3 11 0 1 14 1 0 35 1 0 36 1 0 Table 3 -------- Boxes selected for the optimal alignment (1) box number (2) pattern of gaps (3) first residue number (4) sequences (5) last residue number 1 1 2 nfagtwkmrssenfdellkalgvnamlrkvavaaas 37 1 3 4 dqngtwemesnenfegymkaldidfatrkiavrltq 39 1 2 3 aflgtwklvdsknfddymkslgvgfatrqvasmtkp 38 1 0 1 afdgtwkvdrnenyekfmekmginvvkrklgahdnl 36 2 3 43 irqdgdqfyiktsttvrtteinfkvgegfeeetvd 77 2 3 43 ivqdgdnfktktnstfrnydldftvgvefdehtkg 77 2 2 42 iekngdiltlkthstfknteisfklgvefdettad 76 2 0 40 itqegnkftvkessnfrnidvvfelgvdfayslad 74 3 3 86 twenenkih 94 3 5 88 twegntlvc 96 3 2 85 tldggklvh 93 3 0 83 tmegnklvg 91 4 5 111 relandeliltfga 124 4 3 109 qwvegdklyleltc 122 4 0 106 relidgkliltlth 119 4 0 106 reisgneliqtyty 119 5 5 127 vvctriyvre 136 5 3 125 qvcrqvfkkk 134 5 0 122 avctrtyeke 131 5 0 122 veakrifkke 131 Table 4 : optimal multiple alignment with indices of reliability ---------------------------------------------------------------- Sequences number, length and name _________________________________ 1 136 1cbr_A 2 134 1opa_A 3 132 2hmb 4 131 1ifb 10 20 30 40 50 60 70 + + + + + + + 1 --PnfagtwkmrssenfdellkalgvnamlrkvavaaasKPHVEirqdgdqfyiktsttvrtteinfkvg 2 MTKdqngtwemesnenfegymkaldidfatrkiavrltqTKI--ivqdgdnfktktnstfrnydldftvg 3 -VDaflgtwklvdsknfddymkslgvgfatrqvasmtkpTTI--iekngdiltlkthstfknteisfklg 4 ---afdgtwkvdrnenyekfmekmginvvkrklgahdnlKLT--itqegnkftvkessnfrnidvvfelg 444222222222233333344444444444555555 44444444444444444444444444 80 90 100 110 120 130 140 + + + + + + + 1 egfeeetvdGRKCRSLP--twenenkihCTQTLLEGDGPKTYWTrelandeliltfgaDDvvctriyvre 2 vefdehtkgLDGRNVKTLVtwegntlvcVQKGEKENRGWK----qwvegdklyleltcGDqvcrqvfkkk 3 vefdettadDRKVKSIV--tldggklvhLQKWDGQETTLV----relidgkliltlthGTavctrtyeke 4 vdfaysladGTELTGTW--tmegnklvgKFKRVDNGKELIAV--reisgneliqtytyEGveakrifkke 444445555 555555555 55555555555555 4444444444 150 160 170 180 190 200 210 + + + + + + + 1 - 2 - 3 A 4 - Table 4 : Aligned residues (included in boxes) are printed in lowercase. Other residues (uppercase) are NOT aligned. Only the multiple alignment of the WHOLE set of sequences is performed. A score for 1 to 9 is written below each position in the boxes. It is related to the statistical significance of the alignment at this position. Lower the score is, higher the reliability of the alignment. When lowercase amino-acids are aligned to gaps, it means that the position of the gaps is not completely defined. If two successive selected boxes are overlapping by a maximum of k amino acids in one of the sequences, the final alignment will show a gap aligned with lowercase amino acids. Part of this gap, or the whole gap, can then be moved partially or totally to the right by r positions (r being lower or equal to k). It means that Match-Box is not able to fix exactly the position of this gap, but that the gap can be placed somewhere to the right within a range of k amino acids. Please refer to the table 3 to get precisely the limits of the boxes. You may resubmit a subset of your sequences in order to refine within group alignment. Results of EXPLORE may help you in defining groups of sequences. A postscript file with the boxes outlined can be obtained . ___________________________________________________________________________ MATCH-BOX_server 1.1 15-Oct-97 09:52:08 Execution successful ___________________________________________________________________________