Query         021321
Match_columns 314
No_of_seqs    295 out of 2128
Neff          8.6 
Searched_HMMs 46136
Date          Fri Mar 29 09:19:22 2013
Command       hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/021321.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/021321hhsearch_cdd -cpu 12 -v 0 

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 PRK10139 serine endoprotease;  100.0   2E-38 4.3E-43  302.7  29.1  223   76-314    40-269 (455)
  2 PRK10898 serine endoprotease;  100.0 1.5E-37 3.2E-42  288.5  28.0  213   76-314    45-258 (353)
  3 TIGR02038 protease_degS peripl 100.0 1.9E-37 4.1E-42  287.9  28.7  216   73-314    42-257 (351)
  4 PRK10942 serine endoprotease;  100.0 6.4E-36 1.4E-40  286.8  28.1  223   76-314    38-290 (473)
  5 TIGR02037 degP_htrA_DO peripla 100.0 7.4E-35 1.6E-39  278.1  26.4  221   78-314     3-236 (428)
  6 COG0265 DegQ Trypsin-like seri 100.0 9.7E-27 2.1E-31  216.7  22.6  219   76-314    33-251 (347)
  7 KOG1320 Serine protease [Postt  99.8 7.3E-19 1.6E-23  164.8  14.6  224   74-309   126-356 (473)
  8 PF13365 Trypsin_2:  Trypsin-li  99.7 7.3E-17 1.6E-21  126.6   9.1  117  122-271     1-120 (120)
  9 KOG1421 Predicted signaling-as  99.6 3.8E-15 8.2E-20  142.0  13.0  209   76-313    52-268 (955)
 10 PF00089 Trypsin:  Trypsin;  In  99.6 7.7E-14 1.7E-18  120.5  19.0  170  119-300    24-220 (220)
 11 cd00190 Tryp_SPc Trypsin-like   99.5 1.1E-12 2.5E-17  114.0  19.4  176  118-302    23-231 (232)
 12 smart00020 Tryp_SPc Trypsin-li  99.4 1.6E-11 3.5E-16  106.9  16.5  172  118-298    24-227 (229)
 13 COG3591 V8-like Glu-specific e  99.2 1.9E-09 4.2E-14   94.2  15.9  170  117-304    61-250 (251)
 14 KOG3627 Trypsin [Amino acid tr  98.8 5.6E-07 1.2E-11   80.0  18.3  175  120-304    38-254 (256)
 15 PF00863 Peptidase_C4:  Peptida  98.8 1.8E-07 3.8E-12   81.3  13.9  167   83-294    14-185 (235)
 16 COG5640 Secreted trypsin-like   98.6 4.7E-06   1E-10   75.5  17.1   54  250-305   223-279 (413)
 17 KOG1320 Serine protease [Postt  98.6 1.1E-07 2.5E-12   89.9   6.7  195   81-301    55-251 (473)
 18 KOG1421 Predicted signaling-as  98.4 3.5E-06 7.6E-11   81.7  12.7  203   82-309   524-732 (955)
 19 PF05579 Peptidase_S32:  Equine  98.1 1.7E-05 3.6E-10   69.2   9.2  117  120-277   114-230 (297)
 20 PF03761 DUF316:  Domain of unk  97.7  0.0029 6.2E-08   57.2  17.4  111  176-299   159-274 (282)
 21 PF10459 Peptidase_S46:  Peptid  97.6 0.00033 7.2E-09   70.5  10.3   23  120-142    47-69  (698)
 22 PF00548 Peptidase_C3:  3C cyst  97.5   0.002 4.2E-08   54.0  12.1  140  117-275    22-170 (172)
 23 PF10459 Peptidase_S46:  Peptid  97.4 0.00033 7.1E-09   70.6   6.4   61  245-305   623-688 (698)
 24 PF05580 Peptidase_S55:  SpoIVB  96.7   0.036 7.7E-07   47.5  11.9   42  249-296   174-215 (218)
 25 PF08192 Peptidase_S64:  Peptid  96.6   0.014 3.1E-07   57.4  10.0  118  176-303   541-688 (695)
 26 PF00949 Peptidase_S7:  Peptida  96.1  0.0054 1.2E-07   48.7   3.1   33  246-278    88-120 (132)
 27 PF02122 Peptidase_S39:  Peptid  95.6    0.07 1.5E-06   45.7   8.1  154  116-296    26-184 (203)
 28 TIGR02860 spore_IV_B stage IV   95.2    0.24 5.1E-06   46.9  11.1   42  249-296   354-395 (402)
 29 PF00944 Peptidase_S3:  Alphavi  95.0   0.026 5.6E-07   44.4   3.2   29  250-278   101-129 (158)
 30 PF03510 Peptidase_C24:  2C end  94.0    0.19 4.1E-06   38.2   5.9  103  122-263     1-105 (105)
 31 PF09342 DUF1986:  Domain of un  94.0     2.1 4.5E-05   37.6  13.0   94  117-216    25-131 (267)
 32 PF05416 Peptidase_C37:  Southa  92.9    0.12 2.6E-06   48.4   3.8  137  119-277   378-528 (535)
 33 PF00947 Pico_P2A:  Picornaviru  90.9    0.36 7.8E-06   37.8   4.0   32  244-276    79-110 (127)
 34 PF02907 Peptidase_S29:  Hepati  90.9    0.35 7.7E-06   38.2   3.9   42  252-296   105-146 (148)
 35 PF02395 Peptidase_S6:  Immunog  87.0     3.6 7.7E-05   42.5   9.1   49  252-303   213-266 (769)
 36 PF01732 DUF31:  Putative pepti  81.5     1.1 2.3E-05   42.3   2.5   23  251-273   351-373 (374)
 37 COG5510 Predicted small secret  78.8     1.9 4.2E-05   26.9   2.1   24   29-52      1-24  (44)
 38 PF12381 Peptidase_C3G:  Tungro  75.0     3.2 6.9E-05   35.7   3.1   56  243-304   168-229 (231)
 39 PRK10081 entericidin B membran  68.6     4.5 9.8E-05   26.0   2.0   24   29-52      1-24  (48)
 40 COG3056 Uncharacterized lipopr  60.8      12 0.00027   31.3   3.7   16   41-56     22-37  (204)
 41 PF00571 CBS:  CBS domain CBS d  58.7       9  0.0002   24.8   2.3   22  254-275    28-49  (57)
 42 PRK14864 putative biofilm stre  58.0      66  0.0014   24.4   7.0   10  118-127    93-102 (104)
 43 COG0298 HypC Hydrogenase matur  51.9      38 0.00083   24.3   4.5   47  167-215     5-52  (82)
 44 PRK15396 murein lipoprotein; P  46.9      20 0.00044   25.7   2.5   21   32-52      3-23  (78)
 45 PF02743 Cache_1:  Cache domain  43.6      27 0.00059   24.5   2.9   30  259-303    19-48  (81)
 46 PF05578 Peptidase_S31:  Pestiv  42.8      92   0.002   25.4   6.0  128  119-278    50-185 (211)
 47 PF14827 Cache_3:  Sensory doma  39.4      21 0.00045   27.4   1.8   19  258-276    93-111 (116)
 48 PF01732 DUF31:  Putative pepti  34.9      28  0.0006   32.8   2.3   24  119-142    35-68  (374)
 49 COG3065 Slp Starvation-inducib  33.8 2.1E+02  0.0046   24.0   6.9   11   44-54     17-27  (191)
 50 cd04627 CBS_pair_14 The CBS do  33.3      28 0.00062   26.2   1.7   22  254-275    97-118 (123)
 51 cd04618 CBS_pair_5 The CBS dom  30.4      82  0.0018   22.8   3.8   50  254-307    22-72  (98)
 52 PRK10672 rare lipoprotein A; P  29.6 2.6E+02  0.0057   26.2   7.6   29  111-141    85-113 (361)
 53 COG3290 CitA Signal transducti  29.5      62  0.0014   31.9   3.6   18  259-276   143-160 (537)
 54 cd04603 CBS_pair_KefB_assoc Th  29.2      40 0.00087   24.9   1.9   22  254-275    85-106 (111)
 55 PF10049 DUF2283:  Protein of u  28.8      38 0.00082   21.8   1.5   12  263-274    36-47  (50)
 56 cd04620 CBS_pair_7 The CBS dom  28.5      39 0.00083   24.9   1.7   21  255-275    90-110 (115)
 57 PF07172 GRP:  Glycine rich pro  27.0      47   0.001   24.8   1.9    9   38-46      9-17  (95)
 58 cd04597 CBS_pair_DRTGG_assoc2   25.7      58  0.0013   24.4   2.3   22  254-275    87-108 (113)
 59 cd04643 CBS_pair_30 The CBS do  25.5      49  0.0011   24.3   1.8   17  259-275    95-111 (116)
 60 PF08669 GCV_T_C:  Glycine clea  25.5      78  0.0017   23.0   2.9   23  256-278    34-56  (95)
 61 cd01739 LSm11_C The eukaryotic  25.3 1.2E+02  0.0026   20.9   3.4   39  149-187     7-45  (66)
 62 cd04592 CBS_pair_EriC_assoc_eu  23.8      65  0.0014   25.2   2.3   22  254-275    22-43  (133)
 63 cd04641 CBS_pair_28 The CBS do  23.5      66  0.0014   24.0   2.3   22  253-274    21-42  (120)
 64 cd04619 CBS_pair_6 The CBS dom  23.2      57  0.0012   24.1   1.8   22  254-275    88-109 (114)
 65 PRK14864 putative biofilm stre  22.0      65  0.0014   24.5   1.8    9  149-157    77-85  (104)
 66 cd04602 CBS_pair_IMPDH_2 This   21.8      68  0.0015   23.6   2.0   22  254-275    88-109 (114)
 67 cd04614 CBS_pair_1 The CBS dom  21.7      74  0.0016   22.9   2.1   50  254-307    22-71  (96)
 68 cd04607 CBS_pair_NTP_transfera  21.6      64  0.0014   23.6   1.8   22  254-275    87-108 (113)
 69 COG3448 CBS-domain-containing   21.3      61  0.0013   29.6   1.8   22  254-275   344-365 (382)
 70 cd04582 CBS_pair_ABC_OpuCA_ass  21.0      67  0.0014   23.1   1.8   22  254-275    80-101 (106)
 71 COG5428 Uncharacterized conser  20.9      72  0.0016   22.2   1.7   16  263-278    37-52  (69)
 72 cd04583 CBS_pair_ABC_OpuCA_ass  20.7      74  0.0016   22.9   2.0   21  255-275    84-104 (109)
 73 cd04617 CBS_pair_4 The CBS dom  20.6      69  0.0015   23.8   1.8   22  254-275    89-113 (118)
 74 PF01455 HupF_HypC:  HupF/HypC   20.2 3.1E+02  0.0066   19.0   5.4   43  167-212     5-47  (68)
 75 PRK10781 rcsF outer membrane l  20.2 1.1E+02  0.0024   24.4   2.8   15   44-58     10-24  (133)
 76 PRK13835 conjugal transfer pro  20.1 2.2E+02  0.0048   23.0   4.6   23   74-96     43-65  (145)

No 1  
>PRK10139 serine endoprotease; Provisional
Probab=100.00  E-value=2e-38  Score=302.74  Aligned_cols=223  Identities=32%  Similarity=0.512  Sum_probs=180.7

Q ss_pred             hHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhcc------ccCCcccceEEEEEEcC-CCEEEeccccccCCCcCCC
Q 021321           76 DRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVD------GEYAKVEGTGSGFVWDK-FGHIVTNYHVVAKLATDTS  148 (314)
Q Consensus        76 ~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~------~~~~~~~~~GsGfiI~~-~g~VLT~aHvv~~~~~~~~  148 (314)
                      .++.++++++.||||.|.+......+......|...+      .......+.||||+|++ +||||||+|||+       
T Consensus        40 ~~~~~~~~~~~pavV~i~~~~~~~~~~~~~~~~~~~f~~~~~~~~~~~~~~~GSG~ii~~~~g~IlTn~HVv~-------  112 (455)
T PRK10139         40 PSLAPMLEKVLPAVVSVRVEGTASQGQKIPEEFKKFFGDDLPDQPAQPFEGLGSGVIIDAAKGYVLTNNHVIN-------  112 (455)
T ss_pred             ccHHHHHHHhCCcEEEEEEEEeecccccCchhHHHhccccCCccccccccceEEEEEEECCCCEEEeChHHhC-------
Confidence            3699999999999999998765432211111111111      11123457999999985 699999999999       


Q ss_pred             CcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEe
Q 021321          149 GLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVS  228 (314)
Q Consensus       149 ~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs  228 (314)
                      +++.+.|++.+++    .++|++++.|+.+||||||++.+ ..+++++|+++..+++|++|+++|||++...+++.|+|+
T Consensus       113 ~a~~i~V~~~dg~----~~~a~vvg~D~~~DlAvlkv~~~-~~l~~~~lg~s~~~~~G~~V~aiG~P~g~~~tvt~GivS  187 (455)
T PRK10139        113 QAQKISIQLNDGR----EFDAKLIGSDDQSDIALLQIQNP-SKLTQIAIADSDKLRVGDFAVAVGNPFGLGQTATSGIIS  187 (455)
T ss_pred             CCCEEEEEECCCC----EEEEEEEEEcCCCCEEEEEecCC-CCCceeEecCccccCCCCEEEEEecCCCCCCceEEEEEc
Confidence            5678899987644    78999999999999999999843 368899999999999999999999999999999999999


Q ss_pred             cccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcCc
Q 021321          229 GLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYGT  308 (314)
Q Consensus       229 ~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~~  308 (314)
                      ...+......  .+..++++|+.+++|+|||||||.+|+||||+++.....  +...|++||||++.+++++++|+++|+
T Consensus       188 ~~~r~~~~~~--~~~~~iqtda~in~GnSGGpl~n~~G~vIGi~~~~~~~~--~~~~gigfaIP~~~~~~v~~~l~~~g~  263 (455)
T PRK10139        188 ALGRSGLNLE--GLENFIQTDASINRGNSGGALLNLNGELIGINTAILAPG--GGSVGIGFAIPSNMARTLAQQLIDFGE  263 (455)
T ss_pred             cccccccCCC--CcceEEEECCccCCCCCcceEECCCCeEEEEEEEEEcCC--CCccceEEEEEhHHHHHHHHHHhhcCc
Confidence            8876422211  235689999999999999999999999999999876542  235789999999999999999999999


Q ss_pred             cCCCCC
Q 021321          309 PYSNRF  314 (314)
Q Consensus       309 ~~~~~~  314 (314)
                      +.|+|+
T Consensus       264 v~r~~L  269 (455)
T PRK10139        264 IKRGLL  269 (455)
T ss_pred             ccccce
Confidence            999986


No 2  
>PRK10898 serine endoprotease; Provisional
Probab=100.00  E-value=1.5e-37  Score=288.48  Aligned_cols=213  Identities=34%  Similarity=0.551  Sum_probs=175.9

Q ss_pred             hHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEE
Q 021321           76 DRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKV  155 (314)
Q Consensus        76 ~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v  155 (314)
                      .++.++++++.||||.|.+.....           .........+.||||+|+++||||||+||++       +++.+.|
T Consensus        45 ~~~~~~~~~~~psvV~v~~~~~~~-----------~~~~~~~~~~~GSGfvi~~~G~IlTn~HVv~-------~a~~i~V  106 (353)
T PRK10898         45 ASYNQAVRRAAPAVVNVYNRSLNS-----------TSHNQLEIRTLGSGVIMDQRGYILTNKHVIN-------DADQIIV  106 (353)
T ss_pred             chHHHHHHHhCCcEEEEEeEeccc-----------cCcccccccceeeEEEEeCCeEEEecccEeC-------CCCEEEE
Confidence            478899999999999999855321           0011223457999999998899999999999       5677888


Q ss_pred             EEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEeccccccc
Q 021321          156 SLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIP  235 (314)
Q Consensus       156 ~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~  235 (314)
                      .+.++  +  .+++++++.|+.+||||||++..  .+++++|+++..+++|++|+++|||.+...+++.|+|++..+...
T Consensus       107 ~~~dg--~--~~~a~vv~~d~~~DlAvl~v~~~--~l~~~~l~~~~~~~~G~~V~aiG~P~g~~~~~t~Giis~~~r~~~  180 (353)
T PRK10898        107 ALQDG--R--VFEALLVGSDSLTDLAVLKINAT--NLPVIPINPKRVPHIGDVVLAIGNPYNLGQTITQGIISATGRIGL  180 (353)
T ss_pred             EeCCC--C--EEEEEEEEEcCCCCEEEEEEcCC--CCCeeeccCcCcCCCCCEEEEEeCCCCcCCCcceeEEEecccccc
Confidence            88764  3  78899999999999999999854  578899988888999999999999999888999999998776432


Q ss_pred             CCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCC-CccceEEEEehHHHHHHHHHHHHcCccCCCCC
Q 021321          236 SPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTG-LSSGVNFAIPIDTVVRTVPYLIVYGTPYSNRF  314 (314)
Q Consensus       236 ~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~-~~~~~~~aipi~~i~~~l~~l~~~~~~~~~~~  314 (314)
                      ...+  ...++++|+.+.+|+|||||+|.+|+||||+++.....+.+ ...+++||||++.+++++++|+++|++.|+|+
T Consensus       181 ~~~~--~~~~iqtda~i~~GnSGGPl~n~~G~vvGI~~~~~~~~~~~~~~~g~~faIP~~~~~~~~~~l~~~G~~~~~~l  258 (353)
T PRK10898        181 SPTG--RQNFLQTDASINHGNSGGALVNSLGELMGINTLSFDKSNDGETPEGIGFAIPTQLATKIMDKLIRDGRVIRGYI  258 (353)
T ss_pred             CCcc--ccceEEeccccCCCCCcceEECCCCeEEEEEEEEecccCCCCcccceEEEEchHHHHHHHHHHhhcCccccccc
Confidence            2222  24689999999999999999999999999999876543221 23689999999999999999999999999986


No 3  
>TIGR02038 protease_degS periplasmic serine pepetdase DegS. This family consists of the periplasmic serine protease DegS (HhoB), a shorter paralog of protease DO (HtrA, DegP) and DegQ (HhoA). It is found in E. coli and several other Proteobacteria of the gamma subdivision. It contains a trypsin domain and a single copy of PDZ domain (in contrast to DegP with two copies). A critical role of this DegS is to sense stress in the periplasm and partially degrade an inhibitor of sigma(E).
Probab=100.00  E-value=1.9e-37  Score=287.90  Aligned_cols=216  Identities=37%  Similarity=0.565  Sum_probs=178.1

Q ss_pred             ccchHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCCCCcce
Q 021321           73 LEEDRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHR  152 (314)
Q Consensus        73 ~~~~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~  152 (314)
                      ....++.++++++.||||+|++.....+.           .......+.||||+|+++||||||+||++       +++.
T Consensus        42 ~~~~~~~~~~~~~~psVV~I~~~~~~~~~-----------~~~~~~~~~GSG~vi~~~G~IlTn~HVV~-------~~~~  103 (351)
T TIGR02038        42 TVEISFNKAVRRAAPAVVNIYNRSISQNS-----------LNQLSIQGLGSGVIMSKEGYILTNYHVIK-------KADQ  103 (351)
T ss_pred             ccchhHHHHHHhcCCcEEEEEeEeccccc-----------cccccccceEEEEEEeCCeEEEecccEeC-------CCCE
Confidence            44457999999999999999975532210           01123457899999998899999999998       5677


Q ss_pred             EEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccc
Q 021321          153 CKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGR  232 (314)
Q Consensus       153 ~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~  232 (314)
                      +.|.+.++  +  .+++++++.|+.+||||||++..  .+++++++++..+++|++|+++|||.+...+.+.|+|+...+
T Consensus       104 i~V~~~dg--~--~~~a~vv~~d~~~DlAvlkv~~~--~~~~~~l~~s~~~~~G~~V~aiG~P~~~~~s~t~GiIs~~~r  177 (351)
T TIGR02038       104 IVVALQDG--R--KFEAELVGSDPLTDLAVLKIEGD--NLPTIPVNLDRPPHVGDVVLAIGNPYNLGQTITQGIISATGR  177 (351)
T ss_pred             EEEEECCC--C--EEEEEEEEecCCCCEEEEEecCC--CCceEeccCcCccCCCCEEEEEeCCCCCCCcEEEEEEEeccC
Confidence            88888764  3  78999999999999999999854  478889988888999999999999999889999999998876


Q ss_pred             cccCCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcCccCCC
Q 021321          233 EIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYGTPYSN  312 (314)
Q Consensus       233 ~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~~~~~~  312 (314)
                      ......  ....++++|+.+.+|+|||||||.+|+||||+++.....+.....+++|+||++.+++++++|+++|++.|+
T Consensus       178 ~~~~~~--~~~~~iqtda~i~~GnSGGpl~n~~G~vIGI~~~~~~~~~~~~~~g~~faIP~~~~~~vl~~l~~~g~~~r~  255 (351)
T TIGR02038       178 NGLSSV--GRQNFIQTDAAINAGNSGGALINTNGELVGINTASFQKGGDEGGEGINFAIPIKLAHKIMGKIIRDGRVIRG  255 (351)
T ss_pred             cccCCC--CcceEEEECCccCCCCCcceEECCCCeEEEEEeeeecccCCCCccceEEEecHHHHHHHHHHHhhcCcccce
Confidence            433222  224689999999999999999999999999999765433223346899999999999999999999999998


Q ss_pred             CC
Q 021321          313 RF  314 (314)
Q Consensus       313 ~~  314 (314)
                      |+
T Consensus       256 ~l  257 (351)
T TIGR02038       256 YI  257 (351)
T ss_pred             Ee
Confidence            85


No 4  
>PRK10942 serine endoprotease; Provisional
Probab=100.00  E-value=6.4e-36  Score=286.81  Aligned_cols=223  Identities=36%  Similarity=0.525  Sum_probs=179.2

Q ss_pred             hHHHHHHHHhCCceEEEEeeeeecCC---CC-Cccchhhc---c---c-------------------cCCcccceEEEEE
Q 021321           76 DRVVQLFQETSPSVVSIQDLELSKNP---KS-TSSELMLV---D---G-------------------EYAKVEGTGSGFV  126 (314)
Q Consensus        76 ~~~~~~~~~~~~svV~I~~~~~~~~~---~~-~~~~~~~~---~---~-------------------~~~~~~~~GsGfi  126 (314)
                      .++.++++++.||||.|++......+   .+ .+..||..   +   .                   ......+.||||+
T Consensus        38 ~~~~~~~~~~~pavv~i~~~~~~~~~~~~~~~~~~~ff~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GSG~i  117 (473)
T PRK10942         38 PSLAPMLEKVMPSVVSINVEGSTTVNTPRMPRQFQQFFGDNSPFCQEGSPFQSSPFCQGGQGGNGGGQQQKFMALGSGVI  117 (473)
T ss_pred             ccHHHHHHHhCCceEEEEEEEeccccCCCCChhHHHhhcccccccccccccccccccccccccccccccccccceEEEEE
Confidence            36999999999999999987654321   00 01122210   0   0                   0112356899999


Q ss_pred             EcC-CCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCC
Q 021321          127 WDK-FGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRV  205 (314)
Q Consensus       127 I~~-~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~  205 (314)
                      |++ +||||||+||+.       +++.++|++.+++    .+++++++.|+.+||||||++.+ ..+++++|+++..+++
T Consensus       118 i~~~~G~IlTn~HVv~-------~a~~i~V~~~dg~----~~~a~vv~~D~~~DlAvlki~~~-~~l~~~~lg~s~~l~~  185 (473)
T PRK10942        118 IDADKGYVVTNNHVVD-------NATKIKVQLSDGR----KFDAKVVGKDPRSDIALIQLQNP-KNLTAIKMADSDALRV  185 (473)
T ss_pred             EECCCCEEEeChhhcC-------CCCEEEEEECCCC----EEEEEEEEecCCCCEEEEEecCC-CCCceeEecCccccCC
Confidence            986 599999999999       5678899887643    78999999999999999999743 3688999999999999


Q ss_pred             CCEEEEEEcCCCCCCCeEeeEEecccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCcc
Q 021321          206 GQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSS  285 (314)
Q Consensus       206 G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~  285 (314)
                      |++|+++|+|++...+++.|+|+...+....  ...+.+++++|+.+++|+|||||+|.+|+||||+++.....  +...
T Consensus       186 G~~V~aiG~P~g~~~tvt~GiVs~~~r~~~~--~~~~~~~iqtda~i~~GnSGGpL~n~~GeviGI~t~~~~~~--g~~~  261 (473)
T PRK10942        186 GDYTVAIGNPYGLGETVTSGIVSALGRSGLN--VENYENFIQTDAAINRGNSGGALVNLNGELIGINTAILAPD--GGNI  261 (473)
T ss_pred             CCEEEEEcCCCCCCcceeEEEEEEeecccCC--cccccceEEeccccCCCCCcCccCCCCCeEEEEEEEEEcCC--CCcc
Confidence            9999999999999899999999988764211  12345789999999999999999999999999999876543  2346


Q ss_pred             ceEEEEehHHHHHHHHHHHHcCccCCCCC
Q 021321          286 GVNFAIPIDTVVRTVPYLIVYGTPYSNRF  314 (314)
Q Consensus       286 ~~~~aipi~~i~~~l~~l~~~~~~~~~~~  314 (314)
                      +++|+||++.+++++++|+++|++.|||+
T Consensus       262 g~gfaIP~~~~~~v~~~l~~~g~v~rg~l  290 (473)
T PRK10942        262 GIGFAIPSNMVKNLTSQMVEYGQVKRGEL  290 (473)
T ss_pred             cEEEEEEHHHHHHHHHHHHhcccccccee
Confidence            89999999999999999999999999985


No 5  
>TIGR02037 degP_htrA_DO periplasmic serine protease, Do/DeqQ family. This family consists of a set proteins various designated DegP, heat shock protein HtrA, and protease DO. The ortholog in Pseudomonas aeruginosa is designated MucD and is found in an operon that controls mucoid phenotype. This family also includes the DegQ (HhoA) paralog in E. coli which can rescue a DegP mutant, but not the smaller DegS paralog, which cannot. Members of this family are located in the periplasm and have separable functions as both protease and chaperone. Members have a trypsin domain and two copies of a PDZ domain. This protein protects bacteria from thermal and other stresses and may be important for the survival of bacterial pathogens.// The chaperone function is dominant at low temperatures, whereas the proteolytic activity is turned on at elevated temperatures.
Probab=100.00  E-value=7.4e-35  Score=278.12  Aligned_cols=221  Identities=40%  Similarity=0.554  Sum_probs=178.8

Q ss_pred             HHHHHHHhCCceEEEEeeeeecCCCC---C---ccchhhc-c------ccCCcccceEEEEEEcCCCEEEeccccccCCC
Q 021321           78 VVQLFQETSPSVVSIQDLELSKNPKS---T---SSELMLV-D------GEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLA  144 (314)
Q Consensus        78 ~~~~~~~~~~svV~I~~~~~~~~~~~---~---~~~~~~~-~------~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~  144 (314)
                      +.++++++.||||.|.+.........   .   ...++.. .      .......+.||||+|+++||||||+||++   
T Consensus         3 ~~~~~~~~~p~vv~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GSGfii~~~G~IlTn~Hvv~---   79 (428)
T TIGR02037         3 FAPLVEKVAPAVVNISVEGTVKRRNRPPALPPFFRQFFGDDMPNFPRQQRERKVRGLGSGVIISADGYILTNNHVVD---   79 (428)
T ss_pred             HHHHHHHhCCceEEEEEEEEecccCCCcccchhHHHhhcccccCcccccccccccceeeEEEECCCCEEEEcHHHcC---
Confidence            67899999999999998764432111   0   1112211 0      01223567999999998899999999999   


Q ss_pred             cCCCCcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEe
Q 021321          145 TDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTT  224 (314)
Q Consensus       145 ~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~  224 (314)
                          +++.+.|.+.++.    .+++++++.|+.+||||||++.+ ..++++.|+++..+++|++|+++|||++...+++.
T Consensus        80 ----~~~~i~V~~~~~~----~~~a~vv~~d~~~DlAllkv~~~-~~~~~~~l~~~~~~~~G~~v~aiG~p~g~~~~~t~  150 (428)
T TIGR02037        80 ----GADEITVTLSDGR----EFKAKLVGKDPRTDIAVLKIDAK-KNLPVIKLGDSDKLRVGDWVLAIGNPFGLGQTVTS  150 (428)
T ss_pred             ----CCCeEEEEeCCCC----EEEEEEEEecCCCCEEEEEecCC-CCceEEEccCCCCCCCCCEEEEEECCCcCCCcEEE
Confidence                5677888887643    78899999999999999999854 46899999888899999999999999999999999


Q ss_pred             eEEecccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHH
Q 021321          225 GVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLI  304 (314)
Q Consensus       225 G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~  304 (314)
                      |+|+...+...  ....+..++++|+.+.+|+|||||||.+|+||||+++.....  +...+++||||++.+++++++|+
T Consensus       151 G~vs~~~~~~~--~~~~~~~~i~tda~i~~GnSGGpl~n~~G~viGI~~~~~~~~--g~~~g~~faiP~~~~~~~~~~l~  226 (428)
T TIGR02037       151 GIVSALGRSGL--GIGDYENFIQTDAAINPGNSGGPLVNLRGEVIGINTAIYSPS--GGNVGIGFAIPSNMAKNVVDQLI  226 (428)
T ss_pred             EEEEecccCcc--CCCCccceEEECCCCCCCCCCCceECCCCeEEEEEeEEEcCC--CCccceEEEEEhHHHHHHHHHHH
Confidence            99998876521  122345689999999999999999999999999999876542  23568999999999999999999


Q ss_pred             HcCccCCCCC
Q 021321          305 VYGTPYSNRF  314 (314)
Q Consensus       305 ~~~~~~~~~~  314 (314)
                      ++|++.|+|+
T Consensus       227 ~~g~~~~~~l  236 (428)
T TIGR02037       227 EGGKVQRGWL  236 (428)
T ss_pred             hcCcCcCCcC
Confidence            9999999986


No 6  
>COG0265 DegQ Trypsin-like serine proteases, typically periplasmic, contain C-terminal PDZ domain [Posttranslational modification, protein turnover, chaperones]
Probab=99.95  E-value=9.7e-27  Score=216.65  Aligned_cols=219  Identities=43%  Similarity=0.613  Sum_probs=178.1

Q ss_pred             hHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEE
Q 021321           76 DRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKV  155 (314)
Q Consensus        76 ~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v  155 (314)
                      ..+.++++++.|+||+|........     ..++..........+.||||+++++|||+|+.|++.       +++.+.+
T Consensus        33 ~~~~~~~~~~~~~vV~~~~~~~~~~-----~~~~~~~~~~~~~~~~gSg~i~~~~g~ivTn~hVi~-------~a~~i~v  100 (347)
T COG0265          33 LSFATAVEKVAPAVVSIATGLTAKL-----RSFFPSDPPLRSAEGLGSGFIISSDGYIVTNNHVIA-------GAEEITV  100 (347)
T ss_pred             cCHHHHHHhcCCcEEEEEeeeeecc-----hhcccCCcccccccccccEEEEcCCeEEEecceecC-------CcceEEE
Confidence            5788999999999999998665432     011100000001158999999998899999999999       5577777


Q ss_pred             EEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEeccccccc
Q 021321          156 SLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIP  235 (314)
Q Consensus       156 ~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~  235 (314)
                      .+.+  |+  .+++++++.|+..|+|++|++.... ++.+.++++..++.|+++.++|+|++...+++.|+++...+. .
T Consensus       101 ~l~d--g~--~~~a~~vg~d~~~dlavlki~~~~~-~~~~~~~~s~~l~vg~~v~aiGnp~g~~~tvt~Givs~~~r~-~  174 (347)
T COG0265         101 TLAD--GR--EVPAKLVGKDPISDLAVLKIDGAGG-LPVIALGDSDKLRVGDVVVAIGNPFGLGQTVTSGIVSALGRT-G  174 (347)
T ss_pred             EeCC--CC--EEEEEEEecCCccCEEEEEeccCCC-CceeeccCCCCcccCCEEEEecCCCCcccceeccEEeccccc-c
Confidence            7744  44  7899999999999999999986543 788899999999999999999999999999999999998885 2


Q ss_pred             CCCCccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcCccCCCCC
Q 021321          236 SPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYGTPYSNRF  314 (314)
Q Consensus       236 ~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~~~~~~~~  314 (314)
                      ......+.+++|+|+.+++|+||||++|.+|++|||++......+.  ..+++|+||++.+++++++++.+|++.|+++
T Consensus       175 v~~~~~~~~~IqtdAain~gnsGgpl~n~~g~~iGint~~~~~~~~--~~gigfaiP~~~~~~v~~~l~~~G~v~~~~l  251 (347)
T COG0265         175 VGSAGGYVNFIQTDAAINPGNSGGPLVNIDGEVVGINTAIIAPSGG--SSGIGFAIPVNLVAPVLDELISKGKVVRGYL  251 (347)
T ss_pred             ccCcccccchhhcccccCCCCCCCceEcCCCcEEEEEEEEecCCCC--cceeEEEecHHHHHHHHHHHHHcCCcccccc
Confidence            2111224678999999999999999999999999999998776432  4568999999999999999999899999874


No 7  
>KOG1320 consensus Serine protease [Posttranslational modification, protein turnover, chaperones]
Probab=99.80  E-value=7.3e-19  Score=164.82  Aligned_cols=224  Identities=39%  Similarity=0.424  Sum_probs=170.8

Q ss_pred             cchHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCC--CCcc
Q 021321           74 EEDRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDT--SGLH  151 (314)
Q Consensus        74 ~~~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~--~~~~  151 (314)
                      .....+.+.++...|+|.|....--....        ...........||||+++.||+++||+||+.......  .+..
T Consensus       126 ~~~~v~~~~~~cd~Avv~Ie~~~f~~~~~--------~~e~~~ip~l~~S~~Vv~gd~i~VTnghV~~~~~~~y~~~~~~  197 (473)
T KOG1320|consen  126 YKAFVAAVFEECDLAVVYIESEEFWKGMN--------PFELGDIPSLNGSGFVVGGDGIIVTNGHVVRVEPRIYAHSSTV  197 (473)
T ss_pred             hhhhHHHhhhcccceEEEEeeccccCCCc--------ccccCCCcccCccEEEEcCCcEEEEeeEEEEEEeccccCCCcc
Confidence            35678889999999999999743211110        1223345667999999999999999999997532211  1112


Q ss_pred             --eEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEec
Q 021321          152 --RCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSG  229 (314)
Q Consensus       152 --~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~  229 (314)
                        .+.|...++.|+  ..++.+.+.|+..|+|+++++.+....++++++-+..+..|+++..+|.|++...+.+.|.++.
T Consensus       198 l~~vqi~aa~~~~~--s~ep~i~g~d~~~gvA~l~ik~~~~i~~~i~~~~~~~~~~G~~~~a~~~~f~~~nt~t~g~vs~  275 (473)
T KOG1320|consen  198 LLRVQIDAAIGPGN--SGEPVIVGVDKVAGVAFLKIKTPENILYVIPLGVSSHFRTGVEVSAIGNGFGLLNTLTQGMVSG  275 (473)
T ss_pred             eeeEEEEEeecCCc--cCCCeEEccccccceEEEEEecCCcccceeecceeeeecccceeeccccCceeeeeeeeccccc
Confidence              244444444334  6788999999999999999976543377888888899999999999999999999999999988


Q ss_pred             ccccccCCCC---ccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHc
Q 021321          230 LGREIPSPNG---RAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVY  306 (314)
Q Consensus       230 ~~~~~~~~~~---~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~  306 (314)
                      ..+.......   ....+++++|+.+..|.||+|++|.+|++||+++.....  .+...+++|++|++.++.++.+..++
T Consensus       276 ~~R~~~~lg~~~g~~i~~~~qtd~ai~~~nsg~~ll~~DG~~IgVn~~~~~r--i~~~~~iSf~~p~d~vl~~v~r~~e~  353 (473)
T KOG1320|consen  276 QLRKSFKLGLETGVLISKINQTDAAINPGNSGGPLLNLDGEVIGVNTRKVTR--IGFSHGISFKIPIDTVLVIVLRLGEF  353 (473)
T ss_pred             ccccccccCcccceeeeeecccchhhhcccCCCcEEEecCcEeeeeeeeeEE--eeccccceeccCchHhhhhhhhhhhh
Confidence            8775443222   345678999999999999999999999999999887654  23357899999999999999888654


Q ss_pred             Ccc
Q 021321          307 GTP  309 (314)
Q Consensus       307 ~~~  309 (314)
                      ...
T Consensus       354 ~~~  356 (473)
T KOG1320|consen  354 QIS  356 (473)
T ss_pred             cee
Confidence            443


No 8  
>PF13365 Trypsin_2:  Trypsin-like peptidase domain; PDB: 1Y8T_A 2Z9I_A 3QO6_A 1L1J_A 1QY6_A 2O8L_A 3OTP_E 2ZLE_I 1KY9_A 3CS0_A ....
Probab=99.70  E-value=7.3e-17  Score=126.58  Aligned_cols=117  Identities=33%  Similarity=0.492  Sum_probs=69.1

Q ss_pred             EEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEE--EEEEEeCCC-CcEEEEEEeeCCCccceeecC
Q 021321          122 GSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYRE--GKMVGCDPA-YDLAVLKVDVEGFELKPVVLG  198 (314)
Q Consensus       122 GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~--a~v~~~d~~-~DlAlL~v~~~~~~~~~~~l~  198 (314)
                      ||||+|+++|+||||+||+.+...... .....+.+...++.  ...  ++++..++. +|+|||+++            
T Consensus         1 GTGf~i~~~g~ilT~~Hvv~~~~~~~~-~~~~~~~~~~~~~~--~~~~~~~~~~~~~~~~D~All~v~------------   65 (120)
T PF13365_consen    1 GTGFLIGPDGYILTAAHVVEDWNDGKQ-PDNSSVEVVFPDGR--RVPPVAEVVYFDPDDYDLALLKVD------------   65 (120)
T ss_dssp             EEEEEEETTTEEEEEHHHHTCCTT--G--TCSEEEEEETTSC--EEETEEEEEEEETT-TTEEEEEES------------
T ss_pred             CEEEEEcCCceEEEchhheeccccccc-CCCCEEEEEecCCC--EEeeeEEEEEECCccccEEEEEEe------------
Confidence            899999997799999999996432110 12334444433344  345  899999998 999999997            


Q ss_pred             CCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEEE
Q 021321          199 TSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGV  271 (314)
Q Consensus       199 ~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI  271 (314)
                               .....+...     ...+...........  ... ...+ +++.+.+|+|||||||.+|+||||
T Consensus        66 ---------~~~~~~~~~-----~~~~~~~~~~~~~~~--~~~-~~~~-~~~~~~~G~SGgpv~~~~G~vvGi  120 (120)
T PF13365_consen   66 ---------PWTGVGGGV-----RVPGSTSGVSPTSTN--DNR-MLYI-TDADTRPGSSGGPVFDSDGRVVGI  120 (120)
T ss_dssp             ---------CEEEEEEEE-----EEEEEEEEEEEEEEE--ETE-EEEE-ESSS-STTTTTSEEEETTSEEEEE
T ss_pred             ---------cccceeeee-----EeeeeccccccccCc--ccc-eeEe-eecccCCCcEeHhEECCCCEEEeC
Confidence                     000000000     000000000000000  000 0113 799999999999999999999997


No 9  
>KOG1421 consensus Predicted signaling-associated protein (contains a PDZ domain) [General function prediction only]
Probab=99.62  E-value=3.8e-15  Score=142.04  Aligned_cols=209  Identities=25%  Similarity=0.301  Sum_probs=161.4

Q ss_pred             hHHHHHHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcC-CCEEEeccccccCCCcCCCCcceEE
Q 021321           76 DRVVQLFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDK-FGHIVTNYHVVAKLATDTSGLHRCK  154 (314)
Q Consensus        76 ~~~~~~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~-~g~VLT~aHvv~~~~~~~~~~~~~~  154 (314)
                      .++...+.++-+|||.|+......            ++......+.+|||++++ .||+|||+|++..      +.-...
T Consensus        52 e~w~~~ia~VvksvVsI~~S~v~~------------fdtesag~~~atgfvvd~~~gyiLtnrhvv~p------gP~va~  113 (955)
T KOG1421|consen   52 EDWRNTIANVVKSVVSIRFSAVRA------------FDTESAGESEATGFVVDKKLGYILTNRHVVAP------GPFVAS  113 (955)
T ss_pred             hhhhhhhhhhcccEEEEEehheee------------cccccccccceeEEEEecccceEEEeccccCC------CCceeE
Confidence            378889999999999999765321            222345567899999996 4899999999985      445556


Q ss_pred             EEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCC---ccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEeccc
Q 021321          155 VSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGF---ELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLG  231 (314)
Q Consensus       155 v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~---~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~  231 (314)
                      +.+.+..    ..+...++.|+.+|+.+++.++...   .+..+.+ ..+..++|.+++++|+..+...++..|.++.+.
T Consensus       114 avf~n~e----e~ei~pvyrDpVhdfGf~r~dps~ir~s~vt~i~l-ap~~akvgseirvvgNDagEklsIlagflSrld  188 (955)
T KOG1421|consen  114 AVFDNHE----EIEIYPVYRDPVHDFGFFRYDPSTIRFSIVTEICL-APELAKVGSEIRVVGNDAGEKLSILAGFLSRLD  188 (955)
T ss_pred             EEecccc----cCCcccccCCchhhcceeecChhhcceeeeecccc-CccccccCCceEEecCCccceEEeehhhhhhcc
Confidence            6665543    4556677889999999999986532   2445556 335668999999999988888888889999888


Q ss_pred             ccccCCCCcccc----ceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcC
Q 021321          232 REIPSPNGRAIR----GAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYG  307 (314)
Q Consensus       232 ~~~~~~~~~~~~----~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~  307 (314)
                      +....+.+..+.    .++|..+....|.||+|++|.+|..|.++..+...      .+..|++|++.+++.|.=+..+.
T Consensus       189 r~apdyg~~~yndfnTfy~QaasstsggssgspVv~i~gyAVAl~agg~~s------sas~ffLpLdrV~RaL~clq~n~  262 (955)
T KOG1421|consen  189 RNAPDYGEDTYNDFNTFYIQAASSTSGGSSGSPVVDIPGYAVALNAGGSIS------SASDFFLPLDRVVRALRCLQNNT  262 (955)
T ss_pred             CCCccccccccccccceeeeehhcCCCCCCCCceecccceEEeeecCCccc------ccccceeeccchhhhhhhhhcCC
Confidence            866554433222    35677777889999999999999999999887654      34569999999999999998888


Q ss_pred             ccCCCC
Q 021321          308 TPYSNR  313 (314)
Q Consensus       308 ~~~~~~  313 (314)
                      .++||-
T Consensus       263 PItRGt  268 (955)
T KOG1421|consen  263 PITRGT  268 (955)
T ss_pred             Ccccce
Confidence            888874


No 10 
>PF00089 Trypsin:  Trypsin;  InterPro: IPR001254 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This group of serine proteases belong to the MEROPS peptidase family S1 (chymotrypsin family, clan PA(S))and to peptidase family S6 (Hap serine peptidases). The chymotrypsin family is almost totally confined to animals, although trypsin-like enzymes are found in actinomycetes of the genera Streptomyces and Saccharopolyspora, and in the fungus Fusarium oxysporum []. The enzymes are inherently secreted, being synthesised with a signal peptide that targets them to the secretory pathway. Animal enzymes are either secreted directly, packaged into vesicles for regulated secretion, or are retained in leukocyte granules []. The Hap family, 'Haemophilus adhesion and penetration', are proteins that play a role in the interaction with human epithelial cells. The serine protease activity is localized at the N-terminal domain, whereas the binding domain is in the C-terminal region. ; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis; PDB: 1SPJ_A 1A5I_A 2ZGH_A 2ZKS_A 2ZGJ_A 2ZGC_A 2ODP_A 2I6Q_A 2I6S_A 2ODQ_A ....
Probab=99.61  E-value=7.7e-14  Score=120.49  Aligned_cols=170  Identities=21%  Similarity=0.254  Sum_probs=110.1

Q ss_pred             cceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEec-----CCCCeEEEEEEEEEeC-------CCCcEEEEEEe
Q 021321          119 EGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFD-----AKGNGFYREGKMVGCD-------PAYDLAVLKVD  186 (314)
Q Consensus       119 ~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~-----~~g~~~~~~a~v~~~d-------~~~DlAlL~v~  186 (314)
                      ...|+|++|++ .+|||++||+..       ...+.+.+..     ..+....+...-+..+       ..+|+|||+++
T Consensus        24 ~~~C~G~li~~-~~vLTaahC~~~-------~~~~~v~~g~~~~~~~~~~~~~~~v~~~~~h~~~~~~~~~~DiAll~L~   95 (220)
T PF00089_consen   24 RFFCTGTLISP-RWVLTAAHCVDG-------ASDIKVRLGTYSIRNSDGSEQTIKVSKIIIHPKYDPSTYDNDIALLKLD   95 (220)
T ss_dssp             EEEEEEEEEET-TEEEEEGGGHTS-------GGSEEEEESESBTTSTTTTSEEEEEEEEEEETTSBTTTTTTSEEEEEES
T ss_pred             CeeEeEEeccc-cccccccccccc-------ccccccccccccccccccccccccccccccccccccccccccccccccc
Confidence            67899999997 799999999994       3456665543     1221123333333222       25799999998


Q ss_pred             eC---CCccceeecCCC-CCCCCCCEEEEEEcCCCCCC----CeEeeEEecccc---cccCCCCccccceEEEee----c
Q 021321          187 VE---GFELKPVVLGTS-HDLRVGQSCFAIGNPYGFED----TLTTGVVSGLGR---EIPSPNGRAIRGAIQTDA----A  251 (314)
Q Consensus       187 ~~---~~~~~~~~l~~~-~~~~~G~~v~~iG~p~~~~~----~~~~G~vs~~~~---~~~~~~~~~~~~~i~~~~----~  251 (314)
                      .+   ...+.++.+... ..+..|+.+.++||+.....    ......+..+..   ... .........++...    .
T Consensus        96 ~~~~~~~~~~~~~l~~~~~~~~~~~~~~~~G~~~~~~~~~~~~~~~~~~~~~~~~~c~~~-~~~~~~~~~~c~~~~~~~~  174 (220)
T PF00089_consen   96 RPITFGDNIQPICLPSAGSDPNVGTSCIVVGWGRTSDNGYSSNLQSVTVPVVSRKTCRSS-YNDNLTPNMICAGSSGSGD  174 (220)
T ss_dssp             SSSEHBSSBEESBBTSTTHTTTTTSEEEEEESSBSSTTSBTSBEEEEEEEEEEHHHHHHH-TTTTSTTTEEEEETTSSSB
T ss_pred             cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc-ccccccccccccccccccc
Confidence            76   334677887552 34588999999999975332    233333332211   111 11112245677765    7


Q ss_pred             cCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHH
Q 021321          252 INSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTV  300 (314)
Q Consensus       252 ~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l  300 (314)
                      .+.|+|||||++.++.|+||++.+.. +..  .....+.+++..+.+||
T Consensus       175 ~~~g~sG~pl~~~~~~lvGI~s~~~~-c~~--~~~~~v~~~v~~~~~WI  220 (220)
T PF00089_consen  175 ACQGDSGGPLICNNNYLVGIVSFGEN-CGS--PNYPGVYTRVSSYLDWI  220 (220)
T ss_dssp             GGTTTTTSEEEETTEEEEEEEEEESS-SSB--TTSEEEEEEGGGGHHHH
T ss_pred             ccccccccccccceeeecceeeecCC-CCC--CCcCEEEEEHHHhhccC
Confidence            89999999999876679999998832 221  22357889999998886


No 11 
>cd00190 Tryp_SPc Trypsin-like serine protease; Many of these are synthesized as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms. Alignment contains also inactive enzymes that have substitutions of the catalytic triad residues.
Probab=99.53  E-value=1.1e-12  Score=113.96  Aligned_cols=176  Identities=18%  Similarity=0.150  Sum_probs=104.7

Q ss_pred             ccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCC-----eEEEEEEEEEeC-------CCCcEEEEEE
Q 021321          118 VEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGN-----GFYREGKMVGCD-------PAYDLAVLKV  185 (314)
Q Consensus       118 ~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~-----~~~~~a~v~~~d-------~~~DlAlL~v  185 (314)
                      ....|+|++|++ .+|||+|||+...     ....+.|.+......     ...+...-+..+       ..+|||||++
T Consensus        23 ~~~~C~GtlIs~-~~VLTaAhC~~~~-----~~~~~~v~~g~~~~~~~~~~~~~~~v~~~~~hp~y~~~~~~~DiAll~L   96 (232)
T cd00190          23 GRHFCGGSLISP-RWVLTAAHCVYSS-----APSNYTVRLGSHDLSSNEGGGQVIKVKKVIVHPNYNPSTYDNDIALLKL   96 (232)
T ss_pred             CcEEEEEEEeeC-CEEEECHHhcCCC-----CCccEEEEeCcccccCCCCceEEEEEEEEEECCCCCCCCCcCCEEEEEE
Confidence            356899999997 8999999999853     134566665432211     112223333333       3589999999


Q ss_pred             eeCC---CccceeecCCCC-CCCCCCEEEEEEcCCCCCC-----CeEeeEEeccc---ccccCCC-CccccceEEE----
Q 021321          186 DVEG---FELKPVVLGTSH-DLRVGQSCFAIGNPYGFED-----TLTTGVVSGLG---REIPSPN-GRAIRGAIQT----  248 (314)
Q Consensus       186 ~~~~---~~~~~~~l~~~~-~~~~G~~v~~iG~p~~~~~-----~~~~G~vs~~~---~~~~~~~-~~~~~~~i~~----  248 (314)
                      +.+-   ..+.|+.|.... .+..|+.++++||+.....     ......+..+.   +...... .......++.    
T Consensus        97 ~~~~~~~~~v~picl~~~~~~~~~~~~~~~~G~g~~~~~~~~~~~~~~~~~~~~~~~~C~~~~~~~~~~~~~~~C~~~~~  176 (232)
T cd00190          97 KRPVTLSDNVRPICLPSSGYNLPAGTTCTVSGWGRTSEGGPLPDVLQEVNVPIVSNAECKRAYSYGGTITDNMLCAGGLE  176 (232)
T ss_pred             CCcccCCCcccceECCCccccCCCCCEEEEEeCCcCCCCCCCCceeeEEEeeeECHHHhhhhccCcccCCCceEeeCCCC
Confidence            8652   236788885543 6778899999999765322     12222222111   1100000 0011223433    


Q ss_pred             -eeccCCCCcccceecCC---CeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHH
Q 021321          249 -DAAINSGNSGGPLMNSF---GHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPY  302 (314)
Q Consensus       249 -~~~~~~G~SGGPl~n~~---G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~  302 (314)
                       +...|.|+|||||+...   +.++||.+++.. ++.  .........+....+||++
T Consensus       177 ~~~~~c~gdsGgpl~~~~~~~~~lvGI~s~g~~-c~~--~~~~~~~t~v~~~~~WI~~  231 (232)
T cd00190         177 GGKDACQGDSGGPLVCNDNGRGVLVGIVSWGSG-CAR--PNYPGVYTRVSSYLDWIQK  231 (232)
T ss_pred             CCCccccCCCCCcEEEEeCCEEEEEEEEehhhc-cCC--CCCCCEEEEcHHhhHHhhc
Confidence             23478999999999764   789999999864 321  1223355667888888764


No 12 
>smart00020 Tryp_SPc Trypsin-like serine protease. Many of these are synthesised as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms. A few, however, are active as single chain molecules, and others are inactive due to substitutions of the catalytic triad residues.
Probab=99.40  E-value=1.6e-11  Score=106.90  Aligned_cols=172  Identities=18%  Similarity=0.167  Sum_probs=99.0

Q ss_pred             ccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCe----EEEEEEEEEeC-------CCCcEEEEEEe
Q 021321          118 VEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNG----FYREGKMVGCD-------PAYDLAVLKVD  186 (314)
Q Consensus       118 ~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~----~~~~a~v~~~d-------~~~DlAlL~v~  186 (314)
                      ....|+|++|++ .+|||+|||+....     ...+.|.+.......    ......-+..+       ..+|||||+++
T Consensus        24 ~~~~C~GtlIs~-~~VLTaahC~~~~~-----~~~~~v~~g~~~~~~~~~~~~~~v~~~~~~p~~~~~~~~~DiAll~L~   97 (229)
T smart00020       24 GRHFCGGSLISP-RWVLTAAHCVYGSD-----PSNIRVRLGSHDLSSGEEGQVIKVSKVIIHPNYNPSTYDNDIALLKLK   97 (229)
T ss_pred             CCcEEEEEEecC-CEEEECHHHcCCCC-----CcceEEEeCcccCCCCCCceEEeeEEEEECCCCCCCCCcCCEEEEEEC
Confidence            356899999997 89999999998531     245677765433211    12233333322       46899999998


Q ss_pred             eC---CCccceeecCCC-CCCCCCCEEEEEEcCCCCC------CCeEeeEEecccc---cccCCCC-ccccceEEE----
Q 021321          187 VE---GFELKPVVLGTS-HDLRVGQSCFAIGNPYGFE------DTLTTGVVSGLGR---EIPSPNG-RAIRGAIQT----  248 (314)
Q Consensus       187 ~~---~~~~~~~~l~~~-~~~~~G~~v~~iG~p~~~~------~~~~~G~vs~~~~---~~~~~~~-~~~~~~i~~----  248 (314)
                      .+   ...+.++.|... ..+..++.+.+.||+....      .......+..+..   ....... ......++.    
T Consensus        98 ~~i~~~~~~~pi~l~~~~~~~~~~~~~~~~g~g~~~~~~~~~~~~~~~~~~~~~~~~~C~~~~~~~~~~~~~~~C~~~~~  177 (229)
T smart00020       98 SPVTLSDNVRPICLPSSNYNVPAGTTCTVSGWGRTSEGAGSLPDTLQEVNVPIVSNATCRRAYSGGGAITDNMLCAGGLE  177 (229)
T ss_pred             cccCCCCceeeccCCCcccccCCCCEEEEEeCCCCCCCCCcCCCEeeEEEEEEeCHHHhhhhhccccccCCCcEeecCCC
Confidence            65   223677777543 3567789999999986542      1111222221111   1000000 001123333    


Q ss_pred             -eeccCCCCcccceecCCC--eEEEEEcccccCCCCCCccceEEEEehHHHHH
Q 021321          249 -DAAINSGNSGGPLMNSFG--HVIGVNTATFTRKGTGLSSGVNFAIPIDTVVR  298 (314)
Q Consensus       249 -~~~~~~G~SGGPl~n~~G--~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~  298 (314)
                       ....|.|+|||||+...+  .++||++++. .++.  .........+....+
T Consensus       178 ~~~~~c~gdsG~pl~~~~~~~~l~Gi~s~g~-~C~~--~~~~~~~~~i~~~~~  227 (229)
T smart00020      178 GGKDACQGDSGGPLVCNDGRWVLVGIVSWGS-GCAR--PGKPGVYTRVSSYLD  227 (229)
T ss_pred             CCCcccCCCCCCeeEEECCCEEEEEEEEECC-CCCC--CCCCCEEEEeccccc
Confidence             345789999999997543  8999999986 3321  122334455554443


No 13 
>COG3591 V8-like Glu-specific endopeptidase [Amino acid transport and metabolism]
Probab=99.15  E-value=1.9e-09  Score=94.18  Aligned_cols=170  Identities=18%  Similarity=0.127  Sum_probs=97.1

Q ss_pred             cccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEE--ecCCCC-eEEEEEEEEE-eCC---CCcEEEEEEeeCC
Q 021321          117 KVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSL--FDAKGN-GFYREGKMVG-CDP---AYDLAVLKVDVEG  189 (314)
Q Consensus       117 ~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~--~~~~g~-~~~~~a~v~~-~d~---~~DlAlL~v~~~~  189 (314)
                      .++..+++|+|++ ..+||++||+.....   +...+.+..  ...++. .+.+...... ...   ..|.+...+....
T Consensus        61 tG~~~~~~~lI~p-ntvLTa~Hc~~s~~~---G~~~~~~~p~g~~~~~~~~~~~~~~~~~~~~g~~~~~d~~~~~v~~~~  136 (251)
T COG3591          61 TGRLCTAATLIGP-NTVLTAGHCIYSPDY---GEDDIAAAPPGVNSDGGPFYGITKIEIRVYPGELYKEDGASYDVGEAA  136 (251)
T ss_pred             CCcceeeEEEEcC-ceEEEeeeEEecCCC---ChhhhhhcCCcccCCCCCCCceeeEEEEecCCceeccCCceeeccHHH
Confidence            3444667799998 899999999985432   112222211  111111 1111111111 112   3455555553211


Q ss_pred             C--------ccceeecCCCCCCCCCCEEEEEEcCCCCCCCeE----eeEEecccccccCCCCccccceEEEeeccCCCCc
Q 021321          190 F--------ELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLT----TGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNS  257 (314)
Q Consensus       190 ~--------~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~----~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~S  257 (314)
                      .        ......+......+.++.+.++|||.......+    .+.+..+.           ...+.+++.+++|+|
T Consensus       137 ~~~g~~~~~~~~~~~~~~~~~~~~~d~i~v~GYP~dk~~~~~~~e~t~~v~~~~-----------~~~l~y~~dT~pG~S  205 (251)
T COG3591         137 LESGINIGDVVNYLKRNTASEAKANDRITVIGYPGDKPNIGTMWESTGKVNSIK-----------GNKLFYDADTLPGSS  205 (251)
T ss_pred             hccCCCccccccccccccccccccCceeEEEeccCCCCcceeEeeecceeEEEe-----------cceEEEEecccCCCC
Confidence            1        122223333456678899999999987653322    22222211           135888999999999


Q ss_pred             ccceecCCCeEEEEEcccccCCCCCCccceE-EEEehHHHHHHHHHHH
Q 021321          258 GGPLMNSFGHVIGVNTATFTRKGTGLSSGVN-FAIPIDTVVRTVPYLI  304 (314)
Q Consensus       258 GGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~-~aipi~~i~~~l~~l~  304 (314)
                      |+|+++.+.++||+++.+....+.   ...+ .+.-...++++|+++.
T Consensus       206 GSpv~~~~~~vigv~~~g~~~~~~---~~~n~~vr~t~~~~~~I~~~~  250 (251)
T COG3591         206 GSPVLISKDEVIGVHYNGPGANGG---SLANNAVRLTPEILNFIQQNI  250 (251)
T ss_pred             CCceEecCceEEEEEecCCCcccc---cccCcceEecHHHHHHHHHhh
Confidence            999999988999999988664432   2223 3344567788887764


No 14 
>KOG3627 consensus Trypsin [Amino acid transport and metabolism]
Probab=98.80  E-value=5.6e-07  Score=80.00  Aligned_cols=175  Identities=19%  Similarity=0.167  Sum_probs=98.3

Q ss_pred             ceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecC-------CC---CeEEEEEEEEEeC-------CC-CcEE
Q 021321          120 GTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDA-------KG---NGFYREGKMVGCD-------PA-YDLA  181 (314)
Q Consensus       120 ~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~-------~g---~~~~~~a~v~~~d-------~~-~DlA  181 (314)
                      ..+.|.+|++ .+|||++||+.+..    .. .+.|.+...       .+   ....+. +++ .+       .. +|||
T Consensus        38 ~~Cggsli~~-~~vltaaHC~~~~~----~~-~~~V~~G~~~~~~~~~~~~~~~~~~v~-~~i-~H~~y~~~~~~~nDia  109 (256)
T KOG3627|consen   38 HLCGGSLISP-RWVLTAAHCVKGAS----AS-LYTVRLGEHDINLSVSEGEEQLVGDVE-KII-VHPNYNPRTLENNDIA  109 (256)
T ss_pred             eeeeeEEeeC-CEEEEChhhCCCCC----Cc-ceEEEECccccccccccCchhhhceee-EEE-ECCCCCCCCCCCCCEE
Confidence            3788888865 79999999998531    00 455555311       01   111122 232 22       13 8999


Q ss_pred             EEEEeeC---CCccceeecCCCCC---CCCCCEEEEEEcCCCCC------CCeEeeEEeccc---ccccCCCC-ccccce
Q 021321          182 VLKVDVE---GFELKPVVLGTSHD---LRVGQSCFAIGNPYGFE------DTLTTGVVSGLG---REIPSPNG-RAIRGA  245 (314)
Q Consensus       182 lL~v~~~---~~~~~~~~l~~~~~---~~~G~~v~~iG~p~~~~------~~~~~G~vs~~~---~~~~~~~~-~~~~~~  245 (314)
                      ||+++.+   ...+.++.|.....   ...+..+++.||+....      .......+.-+.   +....... ......
T Consensus       110 ll~l~~~v~~~~~i~piclp~~~~~~~~~~~~~~~v~GWG~~~~~~~~~~~~L~~~~v~i~~~~~C~~~~~~~~~~~~~~  189 (256)
T KOG3627|consen  110 LLRLSEPVTFSSHIQPICLPSSADPYFPPGGTTCLVSGWGRTESGGGPLPDTLQEVDVPIISNSECRRAYGGLGTITDTM  189 (256)
T ss_pred             EEEECCCcccCCcccccCCCCCcccCCCCCCCEEEEEeCCCcCCCCCCCCceeEEEEEeEcChhHhcccccCccccCCCE
Confidence            9999865   23466777743332   34458899999975321      112222222221   11111100 001123


Q ss_pred             EEEe-----eccCCCCcccceecCC---CeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHH
Q 021321          246 IQTD-----AAINSGNSGGPLMNSF---GHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLI  304 (314)
Q Consensus       246 i~~~-----~~~~~G~SGGPl~n~~---G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~  304 (314)
                      ++..     ...|.|+|||||+..+   ..++||++++...++.....+.  ...+....+|+++.+
T Consensus       190 ~Ca~~~~~~~~~C~GDSGGPLv~~~~~~~~~~GivS~G~~~C~~~~~P~v--yt~V~~y~~WI~~~~  254 (256)
T KOG3627|consen  190 LCAGGPEGGKDACQGDSGGPLVCEDNGRWVLVGIVSWGSGGCGQPNYPGV--YTRVSSYLDWIKENI  254 (256)
T ss_pred             EeeCccCCCCccccCCCCCeEEEeeCCcEEEEEEEEecCCCCCCCCCCeE--EeEhHHhHHHHHHHh
Confidence            5554     2368999999999664   5999999999765433222333  566777888887754


No 15 
>PF00863 Peptidase_C4:  Peptidase family C4 This family belongs to family C4 of the peptidase classification.;  InterPro: IPR001730 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [].  Nuclear inclusion A (NIA) proteases from potyviruses are cysteine peptidases belong to the MEROPS peptidase family C4 (NIa protease family, clan PA(C)) [, ].  Potyviruses include plant viruses in which the single-stranded RNA encodes a polyprotein with NIA protease activity, where proteolytic cleavage is specific for Gln+Gly sites. The NIA protease acts on the polyprotein, releasing itself by Gln+Gly cleavage at both the N- and C-termini. It further processes the polyprotein by cleavage at five similar sites in the C-terminal half of the sequence. In addition to its C-terminal protease activity, the NIA protease contains an N-terminal domain that has been implicated in the transcription process []. This peptidase is present in the nuclear inclusion protein of potyviruses.; GO: 0008234 cysteine-type peptidase activity, 0006508 proteolysis; PDB: 3MMG_B 1Q31_B 1LVB_A 1LVM_A.
Probab=98.79  E-value=1.8e-07  Score=81.27  Aligned_cols=167  Identities=15%  Similarity=0.204  Sum_probs=85.1

Q ss_pred             HHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCC
Q 021321           83 QETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKG  162 (314)
Q Consensus        83 ~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g  162 (314)
                      .-+...|++|....                   ......=-|+..+  .+|+|++|.++..      ...++|..  ..|
T Consensus        14 n~Ia~~ic~l~n~s-------------------~~~~~~l~gigyG--~~iItn~HLf~~n------ng~L~i~s--~hG   64 (235)
T PF00863_consen   14 NPIASNICRLTNES-------------------DGGTRSLYGIGYG--SYIITNAHLFKRN------NGELTIKS--QHG   64 (235)
T ss_dssp             HHHHTTEEEEEEEE-------------------TTEEEEEEEEEET--TEEEEEGGGGSST------TCEEEEEE--TTE
T ss_pred             chhhheEEEEEEEe-------------------CCCeEEEEEEeEC--CEEEEChhhhccC------CCeEEEEe--Cce
Confidence            34566788887432                   2333455667776  5999999999753      23455554  333


Q ss_pred             CeEEEE---EEEEEeCCCCcEEEEEEeeCCCccceeec-CCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCC
Q 021321          163 NGFYRE---GKMVGCDPAYDLAVLKVDVEGFELKPVVL-GTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPN  238 (314)
Q Consensus       163 ~~~~~~---a~v~~~d~~~DlAlL~v~~~~~~~~~~~l-~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~  238 (314)
                      .- .+.   .--+..-+..||.++|+..   +++|.+- .....+..+++|+++|.-+.....  .-.|+.........+
T Consensus        65 ~f-~v~nt~~lkv~~i~~~DiviirmPk---DfpPf~~kl~FR~P~~~e~v~mVg~~fq~k~~--~s~vSesS~i~p~~~  138 (235)
T PF00863_consen   65 EF-TVPNTTQLKVHPIEGRDIVIIRMPK---DFPPFPQKLKFRAPKEGERVCMVGSNFQEKSI--SSTVSESSWIYPEEN  138 (235)
T ss_dssp             EE-EECEGGGSEEEE-TCSSEEEEE--T---TS----S---B----TT-EEEEEEEECSSCCC--EEEEEEEEEEEEETT
T ss_pred             EE-EcCCccccceEEeCCccEEEEeCCc---ccCCcchhhhccCCCCCCEEEEEEEEEEcCCe--eEEECCceEEeecCC
Confidence            21 111   1122344789999999963   4555431 133678899999999975443222  123332222111112


Q ss_pred             CccccceEEEeeccCCCCcccceecC-CCeEEEEEcccccCCCCCCccceEEEEehH
Q 021321          239 GRAIRGAIQTDAAINSGNSGGPLMNS-FGHVIGVNTATFTRKGTGLSSGVNFAIPID  294 (314)
Q Consensus       239 ~~~~~~~i~~~~~~~~G~SGGPl~n~-~G~vvGI~s~~~~~~~~~~~~~~~~aipi~  294 (314)
                          ..+-.+-..+..|+-|+||++. ||.+|||++.....      ...+|+.|+.
T Consensus       139 ----~~fWkHwIsTk~G~CG~PlVs~~Dg~IVGiHsl~~~~------~~~N~F~~f~  185 (235)
T PF00863_consen  139 ----SHFWKHWISTKDGDCGLPLVSTKDGKIVGIHSLTSNT------SSRNYFTPFP  185 (235)
T ss_dssp             ----TTEEEE-C---TT-TT-EEEETTT--EEEEEEEEETT------TSSEEEEE--
T ss_pred             ----CCeeEEEecCCCCccCCcEEEcCCCcEEEEEcCccCC------CCeEEEEcCC
Confidence                2345566667899999999986 99999999987653      3456777654


No 16 
>COG5640 Secreted trypsin-like serine protease [Posttranslational modification, protein turnover, chaperones]
Probab=98.58  E-value=4.7e-06  Score=75.50  Aligned_cols=54  Identities=22%  Similarity=0.266  Sum_probs=39.6

Q ss_pred             eccCCCCcccceecC--CCeE-EEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHH
Q 021321          250 AAINSGNSGGPLMNS--FGHV-IGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIV  305 (314)
Q Consensus       250 ~~~~~G~SGGPl~n~--~G~v-vGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~  305 (314)
                      ...|.|+||||+|-.  +|++ +||++|+.+.++.....+  ...-++....||++.++
T Consensus       223 ~daCqGDSGGPi~~~g~~G~vQ~GVvSwG~~~Cg~t~~~g--VyT~vsny~~WI~a~~~  279 (413)
T COG5640         223 KDACQGDSGGPIFHKGEEGRVQRGVVSWGDGGCGGTLIPG--VYTNVSNYQDWIAAMTN  279 (413)
T ss_pred             cccccCCCCCceEEeCCCccEEEeEEEecCCCCCCCCcce--eEEehhHHHHHHHHHhc
Confidence            357899999999954  5665 999999988765433334  44558889999888544


No 17 
>KOG1320 consensus Serine protease [Posttranslational modification, protein turnover, chaperones]
Probab=98.56  E-value=1.1e-07  Score=89.93  Aligned_cols=195  Identities=25%  Similarity=0.300  Sum_probs=128.6

Q ss_pred             HHHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecC
Q 021321           81 LFQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDA  160 (314)
Q Consensus        81 ~~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~  160 (314)
                      ..+....|++.+.+.....      .....|+... +....|+||.+.. ..++|++|++....      +...+.+. .
T Consensus        55 ~~~~~~~s~~~v~~~~~~~------~~~~pw~~~~-q~~~~~s~f~i~~-~~lltn~~~v~~~~------~~~~v~v~-~  119 (473)
T KOG1320|consen   55 VVDLALQSVVKVFSVSTEP------SSVLPWQRTR-QFSSGGSGFAIYG-KKLLTNAHVVAPNN------DHKFVTVK-K  119 (473)
T ss_pred             CccccccceeEEEeecccc------cccCcceeee-hhcccccchhhcc-cceeecCccccccc------cccccccc-c
Confidence            3445566777777654322      1111233222 6677899999986 78999999998432      22233332 3


Q ss_pred             CCCeEEEEEEEEEeCCCCcEEEEEEeeCCCc--cceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCC
Q 021321          161 KGNGFYREGKMVGCDPAYDLAVLKVDVEGFE--LKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPN  238 (314)
Q Consensus       161 ~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~--~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~  238 (314)
                      .|....+.+++...-.+.|+|++.++.....  ..++.++  +-+...+.++++|   +....++.|.|.......+...
T Consensus       120 ~gs~~k~~~~v~~~~~~cd~Avv~Ie~~~f~~~~~~~e~~--~ip~l~~S~~Vv~---gd~i~VTnghV~~~~~~~y~~~  194 (473)
T KOG1320|consen  120 HGSPRKYKAFVAAVFEECDLAVVYIESEEFWKGMNPFELG--DIPSLNGSGFVVG---GDGIIVTNGHVVRVEPRIYAHS  194 (473)
T ss_pred             CCCchhhhhhHHHhhhcccceEEEEeeccccCCCcccccC--CCcccCccEEEEc---CCcEEEEeeEEEEEEeccccCC
Confidence            3444466788888888999999999865332  2334443  3345557899998   6667899999998766543322


Q ss_pred             CccccceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHH
Q 021321          239 GRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVP  301 (314)
Q Consensus       239 ~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~  301 (314)
                      +. ....+++++.+.+|+||+|.+...+++.|+........     ..+.+.+|.-.+..++.
T Consensus       195 ~~-~l~~vqi~aa~~~~~s~ep~i~g~d~~~gvA~l~ik~~-----~~i~~~i~~~~~~~~~~  251 (473)
T KOG1320|consen  195 ST-VLLRVQIDAAIGPGNSGEPVIVGVDKVAGVAFLKIKTP-----ENILYVIPLGVSSHFRT  251 (473)
T ss_pred             Cc-ceeeEEEEEeecCCccCCCeEEccccccceEEEEEecC-----Ccccceeecceeeeecc
Confidence            22 23468999999999999999987689999999886432     13457777665555443


No 18 
>KOG1421 consensus Predicted signaling-associated protein (contains a PDZ domain) [General function prediction only]
Probab=98.42  E-value=3.5e-06  Score=81.72  Aligned_cols=203  Identities=15%  Similarity=0.128  Sum_probs=133.6

Q ss_pred             HHHhCCceEEEEeeeeecCCCCCccchhhccccCCcccceEEEEEEcC-CCEEEeccccccCCCcCCCCcceEEEEEecC
Q 021321           82 FQETSPSVVSIQDLELSKNPKSTSSELMLVDGEYAKVEGTGSGFVWDK-FGHIVTNYHVVAKLATDTSGLHRCKVSLFDA  160 (314)
Q Consensus        82 ~~~~~~svV~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsGfiI~~-~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~  160 (314)
                      .+++..+.|.+.......            -++.......|||.|++. .|++++.+.++.-      ...+.+|++.+.
T Consensus       524 ~~~i~~~~~~v~~~~~~~------------l~g~s~~i~kgt~~i~d~~~g~~vvsr~~vp~------d~~d~~vt~~dS  585 (955)
T KOG1421|consen  524 SADISNCLVDVEPMMPVN------------LDGVSSDIYKGTALIMDTSKGLGVVSRSVVPS------DAKDQRVTEADS  585 (955)
T ss_pred             hhHHhhhhhhheeceeec------------cccchhhhhcCceEEEEccCCceeEecccCCc------hhhceEEeeccc
Confidence            456666777776544322            111222456899999984 4999999999974      456778888776


Q ss_pred             CCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccc-cccCCCC
Q 021321          161 KGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGR-EIPSPNG  239 (314)
Q Consensus       161 ~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~-~~~~~~~  239 (314)
                      .    ...|.+...++...+|.+|.++.  ....++| ....+..||++...|+-.........-.+..+.. .+.....
T Consensus       586 ~----~i~a~~~fL~~t~n~a~~kydp~--~~~~~kl-~~~~v~~gD~~~f~g~~~~~r~ltaktsv~dvs~~~~ps~~~  658 (955)
T KOG1421|consen  586 D----GIPANVSFLHPTENVASFKYDPA--LEVQLKL-TDTTVLRGDECTFEGFTEDLRALTAKTSVTDVSVVIIPSSVM  658 (955)
T ss_pred             c----cccceeeEecCccceeEeccChh--Hhhhhcc-ceeeEecCCceeEecccccchhhcccceeeeeEEEEecCCCC
Confidence            5    56888999999999999999854  2345666 4467889999999999755432211111211110 0000000


Q ss_pred             ccc----cceEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcCcc
Q 021321          240 RAI----RGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYGTP  309 (314)
Q Consensus       240 ~~~----~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~~~  309 (314)
                      .++    .+.|.+++.+.-++--|-+.|.+|+|+|++-....+.-.+...-+-|.+.+.++++.|++|+..+.+
T Consensus       659 pr~r~~n~e~Is~~~nlsT~c~sg~ltdddg~vvalwl~~~ge~~~~kd~~y~~gl~~~~~l~vl~rlk~g~~~  732 (955)
T KOG1421|consen  659 PRFRATNLEVISFMDNLSTSCLSGRLTDDDGEVVALWLSVVGEDVGGKDYTYKYGLSMSYILPVLERLKLGPSA  732 (955)
T ss_pred             cceeecceEEEEEeccccccccceEEECCCCeEEEEEeeeeccccCCceeEEEeccchHHHHHHHHHHhcCCCC
Confidence            001    2345565554444445577888999999998887765545555677889999999999999766544


No 19 
>PF05579 Peptidase_S32:  Equine arteritis virus serine endopeptidase S32;  InterPro: IPR008760 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This group of serine peptidases belong to MEROPS peptidase family S32 (clan PA(S)). The type example is equine arteritis virus serine endopeptidase (equine arteritis virus), which is involved in processing of nidovirus polyproteins [].; GO: 0004252 serine-type endopeptidase activity, 0016032 viral reproduction, 0019082 viral protein processing; PDB: 3FAN_A 3FAO_A 1MBM_A.
Probab=98.12  E-value=1.7e-05  Score=69.22  Aligned_cols=117  Identities=22%  Similarity=0.301  Sum_probs=62.6

Q ss_pred             ceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCC
Q 021321          120 GTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGT  199 (314)
Q Consensus       120 ~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~  199 (314)
                      ++|+.|-++.+-.|+|+.||+.+        +..++....   .  .   +...++.+-|+|.-.++.-...+|.++++.
T Consensus       114 Gsggvft~~~~~vvvTAtHVlg~--------~~a~v~~~g---~--~---~~~tF~~~GDfA~~~~~~~~G~~P~~k~a~  177 (297)
T PF05579_consen  114 GSGGVFTIGGNTVVVTATHVLGG--------NTARVSGVG---T--R---RMLTFKKNGDFAEADITNWPGAAPKYKFAQ  177 (297)
T ss_dssp             EEEEEEECTTEEEEEEEHHHCBT--------TEEEEEETT---E--E---EEEEEEEETTEEEEEETTS-S---B--B-T
T ss_pred             cccceEEECCeEEEEEEEEEcCC--------CeEEEEecc---e--E---EEEEEeccCcEEEEECCCCCCCCCceeecC
Confidence            34444445544579999999973        333444321   1  1   334556778999999943333577777742


Q ss_pred             CCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEEEEccccc
Q 021321          200 SHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIGVNTATFT  277 (314)
Q Consensus       200 ~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~  277 (314)
                         -..|.--|.-      ...+..|.|..-             ..+++   ..+||||+|++..+|.+||+|+..-.
T Consensus       178 ---~~~GrAyW~t------~tGvE~G~ig~~-------------~~~~f---T~~GDSGSPVVt~dg~liGVHTGSn~  230 (297)
T PF05579_consen  178 ---NYTGRAYWLT------STGVEPGFIGGG-------------GAVCF---TGPGDSGSPVVTEDGDLIGVHTGSNK  230 (297)
T ss_dssp             ---T-SEEEEEEE------TTEEEEEEEETT-------------EEEES---S-GGCTT-EEEETTC-EEEEEEEEET
T ss_pred             ---CcccceEEEc------ccCcccceecCc-------------eEEEE---cCCCCCCCccCcCCCCEEEEEecCCC
Confidence               1233333322      122444554421             12333   35799999999999999999997643


No 20 
>PF03761 DUF316:  Domain of unknown function (DUF316) ;  InterPro: IPR005514 This is a family of uncharacterised proteins from Caenorhabditis elegans.
Probab=97.74  E-value=0.0029  Score=57.21  Aligned_cols=111  Identities=17%  Similarity=0.210  Sum_probs=67.2

Q ss_pred             CCCcEEEEEEeeC-CCccceeecCCCC-CCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCCCccccceEEEeeccC
Q 021321          176 PAYDLAVLKVDVE-GFELKPVVLGTSH-DLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPNGRAIRGAIQTDAAIN  253 (314)
Q Consensus       176 ~~~DlAlL~v~~~-~~~~~~~~l~~~~-~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~  253 (314)
                      ..++++||+++.+ .....++.|+++. ....|+.+.+.|+...  .......+.-.....       ....+......+
T Consensus       159 ~~~~~mIlEl~~~~~~~~~~~Cl~~~~~~~~~~~~~~~yg~~~~--~~~~~~~~~i~~~~~-------~~~~~~~~~~~~  229 (282)
T PF03761_consen  159 RPYSPMILELEEDFSKNVSPPCLADSSTNWEKGDEVDVYGFNST--GKLKHRKLKITNCTK-------CAYSICTKQYSC  229 (282)
T ss_pred             cccceEEEEEcccccccCCCEEeCCCccccccCceEEEeecCCC--CeEEEEEEEEEEeec-------cceeEecccccC
Confidence            4579999999865 2467788886643 4667899999888211  112222222111100       122355556678


Q ss_pred             CCCcccceecC-CC--eEEEEEcccccCCCCCCccceEEEEehHHHHHH
Q 021321          254 SGNSGGPLMNS-FG--HVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRT  299 (314)
Q Consensus       254 ~G~SGGPl~n~-~G--~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~  299 (314)
                      .|++|||++.. +|  .||||.+.......    ....+++.+..+++-
T Consensus       230 ~~d~Gg~lv~~~~gr~tlIGv~~~~~~~~~----~~~~~f~~v~~~~~~  274 (282)
T PF03761_consen  230 KGDRGGPLVKNINGRWTLIGVGASGNYECN----KNNSYFFNVSWYQDE  274 (282)
T ss_pred             CCCccCeEEEEECCCEEEEEEEccCCCccc----ccccEEEEHHHhhhh
Confidence            99999999832 44  58999987653321    124577777776654


No 21 
>PF10459 Peptidase_S46:  Peptidase S46;  InterPro: IPR019500 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This entry represents S46 peptidases, where dipeptidyl-peptidase 7 (DPP-7) is the best-characterised member of this family. It is a serine peptidase that is located on the cell surface and is predicted to have two N-terminal transmembrane domains. 
Probab=97.63  E-value=0.00033  Score=70.52  Aligned_cols=23  Identities=30%  Similarity=0.226  Sum_probs=20.8

Q ss_pred             ceEEEEEEcCCCEEEeccccccC
Q 021321          120 GTGSGFVWDKFGHIVTNYHVVAK  142 (314)
Q Consensus       120 ~~GsGfiI~~~g~VLT~aHvv~~  142 (314)
                      +-|||.||+++|+||||.||.-+
T Consensus        47 gGCSgsfVS~~GLvlTNHHC~~~   69 (698)
T PF10459_consen   47 GGCSGSFVSPDGLVLTNHHCGYG   69 (698)
T ss_pred             CceeEEEEcCCceEEecchhhhh
Confidence            46999999999999999999863


No 22 
>PF00548 Peptidase_C3:  3C cysteine protease (picornain 3C);  InterPro: IPR000199 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [].  This signature defines cysteine peptidases belong to MEROPS peptidase family C3 (picornain, clan PA(C)), subfamilies C3A and C3B. The protein fold of this peptidase domain for members of this family resembles that of the serine peptidase, chymotrypsin [], the type example for clan PA. Picornaviral proteins are expressed as a single polyprotein which is cleaved by the viral C3 cysteine protease. The poliovirus polyprotein is selectively cleaved between the Gln-|-Gly bond. In other picornavirus reactions Glu may be substituted for Gln, and Ser or Thr for Gly. ; GO: 0004197 cysteine-type endopeptidase activity, 0006508 proteolysis; PDB: 3SJO_E 2H6M_A 1QA7_C 1HAV_B 2HAL_A 2H9H_A 3QZQ_B 3QZR_A 3R0F_B 3SJ9_A ....
Probab=97.53  E-value=0.002  Score=54.00  Aligned_cols=140  Identities=18%  Similarity=0.231  Sum_probs=78.5

Q ss_pred             cccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCC---CCcEEEEEEeeCCCccc
Q 021321          117 KVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDP---AYDLAVLKVDVEGFELK  193 (314)
Q Consensus       117 ~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~---~~DlAlL~v~~~~~~~~  193 (314)
                      .....++++.|.. .++|...|.-.        ..  ++.+.   |..+.....+...+.   ..|+++++++.. .+++
T Consensus        22 ~g~~t~l~~gi~~-~~~lvp~H~~~--------~~--~i~i~---g~~~~~~d~~~lv~~~~~~~Dl~~v~l~~~-~kfr   86 (172)
T PF00548_consen   22 KGEFTMLALGIYD-RYFLVPTHEEP--------ED--TIYID---GVEYKVDDSVVLVDRDGVDTDLTLVKLPRN-PKFR   86 (172)
T ss_dssp             TEEEEEEEEEEEB-TEEEEEGGGGG--------CS--EEEET---TEEEEEEEEEEEEETTSSEEEEEEEEEESS-S-B-
T ss_pred             CceEEEecceEee-eEEEEECcCCC--------cE--EEEEC---CEEEEeeeeEEEecCCCcceeEEEEEccCC-cccC
Confidence            4567889989986 89999999221        22  33332   333333333333443   469999999753 2332


Q ss_pred             eee--cCCCCCCCCCCEEEEEEcCCCCCCC-eEeeEEecccccccCCCCccccceEEEeeccCCCCcccceecC---CCe
Q 021321          194 PVV--LGTSHDLRVGQSCFAIGNPYGFEDT-LTTGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNS---FGH  267 (314)
Q Consensus       194 ~~~--l~~~~~~~~G~~v~~iG~p~~~~~~-~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~---~G~  267 (314)
                      -+.  |.+ ......+...++ +....... ...+.+...+..  ..++......+.++++...|+.||||+..   .++
T Consensus        87 DIrk~~~~-~~~~~~~~~l~v-~~~~~~~~~~~v~~v~~~~~i--~~~g~~~~~~~~Y~~~t~~G~CG~~l~~~~~~~~~  162 (172)
T PF00548_consen   87 DIRKFFPE-SIPEYPECVLLV-NSTKFPRMIVEVGFVTNFGFI--NLSGTTTPRSLKYKAPTKPGMCGSPLVSRIGGQGK  162 (172)
T ss_dssp             -GGGGSBS-SGGTEEEEEEEE-ESSSSTCEEEEEEEEEEEEEE--EETTEEEEEEEEEESEEETTGTTEEEEESCGGTTE
T ss_pred             chhhhhcc-ccccCCCcEEEE-ECCCCccEEEEEEEEeecCcc--ccCCCEeeEEEEEccCCCCCccCCeEEEeeccCcc
Confidence            221  111 111233334444 33333322 233444433221  12334456678888888999999999953   678


Q ss_pred             EEEEEccc
Q 021321          268 VIGVNTAT  275 (314)
Q Consensus       268 vvGI~s~~  275 (314)
                      ++|||.++
T Consensus       163 i~GiHvaG  170 (172)
T PF00548_consen  163 IIGIHVAG  170 (172)
T ss_dssp             EEEEEEEE
T ss_pred             EEEEEecc
Confidence            99999985


No 23 
>PF10459 Peptidase_S46:  Peptidase S46;  InterPro: IPR019500 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This entry represents S46 peptidases, where dipeptidyl-peptidase 7 (DPP-7) is the best-characterised member of this family. It is a serine peptidase that is located on the cell surface and is predicted to have two N-terminal transmembrane domains. 
Probab=97.37  E-value=0.00033  Score=70.56  Aligned_cols=61  Identities=20%  Similarity=0.288  Sum_probs=47.7

Q ss_pred             eEEEeeccCCCCcccceecCCCeEEEEEcccccCCCCC-----CccceEEEEehHHHHHHHHHHHH
Q 021321          245 AIQTDAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTG-----LSSGVNFAIPIDTVVRTVPYLIV  305 (314)
Q Consensus       245 ~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~-----~~~~~~~aipi~~i~~~l~~l~~  305 (314)
                      .+.++..+..|+||+|++|.+|||||+++-+..+.-.+     .....+..+-+.++..+|+++-.
T Consensus       623 ~FlstnDitGGNSGSPvlN~~GeLVGl~FDgn~Esl~~D~~fdp~~~R~I~VDiRyvL~~ldkv~g  688 (698)
T PF10459_consen  623 NFLSTNDITGGNSGSPVLNAKGELVGLAFDGNWESLSGDIAFDPELNRTIHVDIRYVLWALDKVYG  688 (698)
T ss_pred             EEEeccCcCCCCCCCccCCCCceEEEEeecCchhhcccccccccccceeEEEEHHHHHHHHHHHhC
Confidence            46677889999999999999999999999775543221     12345788899999999988743


No 24 
>PF05580 Peptidase_S55:  SpoIVB peptidase S55;  InterPro: IPR008763 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This group of serine peptidases belong to the MEROPS peptidase family S55 (SpoIVB peptidase family, clan PA(S)). The protein SpoIVB plays a key role in signalling in the final sigma-K checkpoint of Bacillus subtilis [, ].
Probab=96.73  E-value=0.036  Score=47.50  Aligned_cols=42  Identities=29%  Similarity=0.483  Sum_probs=32.6

Q ss_pred             eeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHH
Q 021321          249 DAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTV  296 (314)
Q Consensus       249 ~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i  296 (314)
                      +..+..||||+|++ .+|++||=++..+..     ....+|.++++..
T Consensus       174 TGGIvqGMSGSPI~-qdGKLiGAVthvf~~-----dp~~Gygi~ie~M  215 (218)
T PF05580_consen  174 TGGIVQGMSGSPII-QDGKLIGAVTHVFVN-----DPTKGYGIFIEWM  215 (218)
T ss_pred             hCCEEecccCCCEE-ECCEEEEEEEEEEec-----CCCceeeecHHHH
Confidence            34577899999999 599999999988654     2456788986653


No 25 
>PF08192 Peptidase_S64:  Peptidase family S64;  InterPro: IPR012985 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This family of fungal proteins is involved in the processing of membrane bound transcription factor Stp1 [] and belongs to MEROPS petidase family S64 (clan PA). The processing causes the signalling domain of Stp1 to be passed to the nucleus where several permease genes are induced. The permeases are important for uptake of amino acids, and processing of tp1 only occurs in an amino acid-rich environment. This family is predicted to be distantly related to the trypsin family (MEROPS peptidase family S1) and to have a typical trypsin-like catalytic triad [].
Probab=96.63  E-value=0.014  Score=57.44  Aligned_cols=118  Identities=19%  Similarity=0.345  Sum_probs=70.7

Q ss_pred             CCCcEEEEEEeeCC-------Ccc------ceeecCCC------CCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccC
Q 021321          176 PAYDLAVLKVDVEG-------FEL------KPVVLGTS------HDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPS  236 (314)
Q Consensus       176 ~~~DlAlL~v~~~~-------~~~------~~~~l~~~------~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~  236 (314)
                      .-.|+||++++..-       .++      |.+.+.+.      ..+.+|..|+-+|.-.+    .+.|++.++.- ...
T Consensus       541 ~LsD~AIIkV~~~~~~~N~LGddi~f~~~dP~l~f~NlyV~~~~~~~~~G~~VfK~GrTTg----yT~G~lNg~kl-vyw  615 (695)
T PF08192_consen  541 RLSDWAIIKVNKERKCQNYLGDDIQFNEPDPTLMFQNLYVREVVSNLVPGMEVFKVGRTTG----YTTGILNGIKL-VYW  615 (695)
T ss_pred             cccceEEEEeCCCceecCCCCccccccCCCccccccccchhhhhhccCCCCeEEEecccCC----ccceEecceEE-EEe
Confidence            34699999997431       011      22223211      34577899999987654    45666665532 112


Q ss_pred             CCCccc-cceEEEe----eccCCCCcccceecCCC------eEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHH
Q 021321          237 PNGRAI-RGAIQTD----AAINSGNSGGPLMNSFG------HVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYL  303 (314)
Q Consensus       237 ~~~~~~-~~~i~~~----~~~~~G~SGGPl~n~~G------~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l  303 (314)
                      .++... .+++...    .-...||||+-|++.-+      .|+||..+..++     ...++...|+..|.+-|++.
T Consensus       616 ~dG~i~s~efvV~s~~~~~Fa~~GDSGS~VLtk~~d~~~gLgvvGMlhsydge-----~kqfglftPi~~il~rl~~v  688 (695)
T PF08192_consen  616 ADGKIQSSEFVVSSDNNPAFASGGDSGSWVLTKLEDNNKGLGVVGMLHSYDGE-----QKQFGLFTPINEILDRLEEV  688 (695)
T ss_pred             cCCCeEEEEEEEecCCCccccCCCCcccEEEecccccccCceeeEEeeecCCc-----cceeeccCcHHHHHHHHHHh
Confidence            222221 2333333    12457999999998633      399999886443     35688889988877666654


No 26 
>PF00949 Peptidase_S7:  Peptidase S7, Flavivirus NS3 serine protease ;  InterPro: IPR001850 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This signature identifies serine peptidases belong to MEROPS peptidase family S7 (flavivirin family, clan PA(S)). The protein fold of the peptidase domain for members of this family resembles that of chymotrypsin, the type example for clan PA.  Flaviviruses produce a polyprotein from the ssRNA genome. The N terminus of the NS3 protein (approx. 180 aa) is required for the processing of the polyprotein. NS3 also has conserved homology with NTP-binding proteins and DEAD family of RNA helicase [, , ].; GO: 0003723 RNA binding, 0003724 RNA helicase activity, 0005524 ATP binding; PDB: 2IJO_B 3E90_D 2GGV_B 2FP7_B 2WV9_A 3U1I_B 3U1J_B 2WZQ_A 2WHX_A 3L6P_A ....
Probab=96.08  E-value=0.0054  Score=48.74  Aligned_cols=33  Identities=24%  Similarity=0.440  Sum_probs=23.5

Q ss_pred             EEEeeccCCCCcccceecCCCeEEEEEcccccC
Q 021321          246 IQTDAAINSGNSGGPLMNSFGHVIGVNTATFTR  278 (314)
Q Consensus       246 i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~  278 (314)
                      ...+....+|.||+|+||.+|++|||...+...
T Consensus        88 ~~~~~d~~~GsSGSpi~n~~g~ivGlYg~g~~~  120 (132)
T PF00949_consen   88 GAIDLDFPKGSSGSPIFNQNGEIVGLYGNGVEV  120 (132)
T ss_dssp             EEE---S-TTGTT-EEEETTSCEEEEEEEEEE-
T ss_pred             EeeecccCCCCCCCceEcCCCcEEEEEccceee
Confidence            444556789999999999999999999887654


No 27 
>PF02122 Peptidase_S39:  Peptidase S39;  InterPro: IPR000382 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. ORF2 of Potato leafroll virus (PLrV) encodes a polyprotein which is translated following a -1 frameshift. The polyprotein has a putative linear arrangement of membrane achor-VPg-peptidase-polmerase domains. The serine peptidase domain which is found in this group of sequences belongs to MEROPS peptidase family S39 (clan PA(S)). It is likely that the peptidase domain is involved in the cleavage of the polyprotein []. The nucleotide sequence for the RNA of PLrV has been determined [, ]. The sequence contains six large open reading frames (ORFs). The 5' coding region encodes two polypeptides of 28K and 70K, which overlap in different reading frames; it is suggested that the third ORF in the 5' block is translated by frameshift readthrough near the end of the 70K protein, yielding a 118K polypeptide []. Segments of the predicted amino acid sequences of these ORFs resemble those of known viral RNA polymerases, ATP-binding proteins and viral genome-linked proteins. The nucleotide sequence of the genomic RNA of Beet western yellows virus (BWYV) has been determined []. The sequence contains six long ORFs. A cluster of three of these ORFs, including the coat protein cistron, display extensive amino acid sequence similarity to corresponding ORFs of a second luteovirus: Barley yellow dwarf virus [].; GO: 0004252 serine-type endopeptidase activity, 0022415 viral reproductive process, 0016021 integral to membrane; PDB: 1ZYO_A.
Probab=95.59  E-value=0.07  Score=45.72  Aligned_cols=154  Identities=19%  Similarity=0.163  Sum_probs=47.3

Q ss_pred             CcccceEEEEEE-cCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEE-EEEEEEeCCCCcEEEEEEeeC---CC
Q 021321          116 AKVEGTGSGFVW-DKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYR-EGKMVGCDPAYDLAVLKVDVE---GF  190 (314)
Q Consensus       116 ~~~~~~GsGfiI-~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~-~a~v~~~d~~~DlAlL~v~~~---~~  190 (314)
                      +...+.++.+-. +-+..++|++||....       ... ....+  |+.... +-+.+..+...|++||+....   ..
T Consensus        26 ~~hvGya~cv~l~~g~~~L~ta~Hv~~~~-------~~~-~~~k~--g~kipl~~f~~~~~~~~~D~~il~~P~n~~s~L   95 (203)
T PF02122_consen   26 GSHVGYATCVRLFDGEDALLTARHVWSRP-------SKV-TSLKT--GEKIPLAEFTDLLESRIADFVILRGPPNWESKL   95 (203)
T ss_dssp             --------EEEE----EEEEE-HHHHTSS-------S----EEET--TEEEE--S-EEEEE-TTT-EEEEE--HHHHHHH
T ss_pred             ccccccceEEECcCCccceecccccCCCc-------cce-eEcCC--CCcccchhChhhhCCCccCEEEEecCcCHHHHh
Confidence            444455555442 2234799999999852       111 12222  221111 123444678899999999722   00


Q ss_pred             ccceeecCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccCCCCccccceEEEeeccCCCCcccceecCCCeEEE
Q 021321          191 ELKPVVLGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMNSFGHVIG  270 (314)
Q Consensus       191 ~~~~~~l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n~~G~vvG  270 (314)
                      ..+.+.+.....+       .-|  .-..+....+........+....    ..+..+-+...+|.||.|+|+.+ +++|
T Consensus        96 g~k~~~~~~~~~~-------~~g--~~~~y~~~~~~~~~~sa~i~g~~----~~~~~vls~T~~G~SGtp~y~g~-~vvG  161 (203)
T PF02122_consen   96 GVKAAQLSQNSQL-------AKG--PVSFYGFSSGEWPCSSAKIPGTE----GKFASVLSNTSPGWSGTPYYSGK-NVVG  161 (203)
T ss_dssp             T-----B----SE-------EEE--ESSTTSEEEEEEEEEE-S----S----TTEEEE-----TT-TT-EEE-SS--EEE
T ss_pred             Ccccccccchhhh-------CCC--CeeeeeecCCCceeccCcccccc----CcCCceEcCCCCCCCCCCeEECC-CceE
Confidence            1233333111111       001  01112222222211111111111    23556667788999999999877 8999


Q ss_pred             EEcccccCCCCCCccceEEEEehHHH
Q 021321          271 VNTATFTRKGTGLSSGVNFAIPIDTV  296 (314)
Q Consensus       271 I~s~~~~~~~~~~~~~~~~aipi~~i  296 (314)
                      ++.......   ..+++++-.|+--+
T Consensus       162 vH~G~~~~~---~~~n~n~~spip~~  184 (203)
T PF02122_consen  162 VHTGSPSGS---NRENNNRMSPIPPI  184 (203)
T ss_dssp             EEEEE---------------------
T ss_pred             eecCccccc---cccccccccccccc
Confidence            999852211   13455555555444


No 28 
>TIGR02860 spore_IV_B stage IV sporulation protein B. SpoIVB, the stage IV sporulation protein B of endospore-forming bacteria such as Bacillus subtilis, is a serine proteinase, expressed in the spore (rather than mother cell) compartment, that participates in a proteolytic activation cascade for Sigma-K. It appears to be universal among endospore-forming bacteria and occurs nowhere else.
Probab=95.23  E-value=0.24  Score=46.88  Aligned_cols=42  Identities=29%  Similarity=0.511  Sum_probs=31.6

Q ss_pred             eeccCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHH
Q 021321          249 DAAINSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTV  296 (314)
Q Consensus       249 ~~~~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i  296 (314)
                      +..+..||||+|++ .+|++||=++-.+.+.     +..+|+|-++..
T Consensus       354 tgGivqGMSGSPi~-q~gkliGAvtHVfvnd-----pt~GYGi~ie~M  395 (402)
T TIGR02860       354 TGGIVQGMSGSPII-QNGKVIGAVTHVFVND-----PTSGYGVYIEWM  395 (402)
T ss_pred             hCCEEecccCCCEE-ECCEEEEEEEEEEecC-----CCcceeehHHHH
Confidence            34567899999999 6999999888876652     445688855443


No 29 
>PF00944 Peptidase_S3:  Alphavirus core protein ;  InterPro: IPR000930 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. Togavirin, also known as Sindbis virus core endopeptidase, is a serine protease resident at the N terminus of the p130 polyprotein of togaviruses []. The endopeptidase signature identifies the peptidase as belonging to the MEROPS peptidase family S3 (togavirin family, clan PA(S)). The polyprotein also includes structural proteins for the nucleocapsid core and for the glycoprotein spikes []. Togavirin is only active while part of the polyprotein, cleavage at a Trp-Ser bond resulting in total lack of activity []. Mutagenesis studies have identified the location of the His-Asp-Ser catalytic triad, and X-ray studies have revealed the protein fold to be similar to that of chymotrypsin [, ].; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis, 0016020 membrane; PDB: 2YEW_D 1EP5_A 3J0C_F 1EP6_C 1WYK_D 1DYL_A 1VCQ_B 1VCP_B 1LD4_D 1KXA_A ....
Probab=94.96  E-value=0.026  Score=44.39  Aligned_cols=29  Identities=21%  Similarity=0.438  Sum_probs=24.8

Q ss_pred             eccCCCCcccceecCCCeEEEEEcccccC
Q 021321          250 AAINSGNSGGPLMNSFGHVIGVNTATFTR  278 (314)
Q Consensus       250 ~~~~~G~SGGPl~n~~G~vvGI~s~~~~~  278 (314)
                      ..-.+|+||-|++|..|+||||+-.+..+
T Consensus       101 g~g~~GDSGRpi~DNsGrVVaIVLGG~ne  129 (158)
T PF00944_consen  101 GVGKPGDSGRPIFDNSGRVVAIVLGGANE  129 (158)
T ss_dssp             TS-STTSTTEEEESTTSBEEEEEEEEEEE
T ss_pred             CCCCCCCCCCccCcCCCCEEEEEecCCCC
Confidence            34579999999999999999999988764


No 30 
>PF03510 Peptidase_C24:  2C endopeptidase (C24) cysteine protease family;  InterPro: IPR000317 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [].  The two signatures that defines this group of calivirus polyproteins identify a cysteine peptidase signature that belongs to MEROPS peptidase family C24 (clan PA(C)). Caliciviruses are positive-stranded ssRNA viruses that cause gastroenteritis. The calicivirus genome contains two open reading frames, ORF1 and ORF2. ORF2 encodes a structural protein []; while ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity. The regions of the polyprotein in which these activities lie are similar to proteins produced by the picornaviruses. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely those classified as small round structured viruses (SRSVs) and those classed as non-SRSVs. Calicivirus proteases from the non-SRSV group, which are members of the PA protease clan, constitute family C24 of the cysteine proteases (proteases from SRSVs belong to the C37 family). As mentioned above, the protease activity resides within a polyprotein. The enzyme cleaves the polyprotein at sites N-terminal to itself, liberating the polyprotein helicase.; GO: 0004197 cysteine-type endopeptidase activity, 0006508 proteolysis
Probab=94.04  E-value=0.19  Score=38.17  Aligned_cols=103  Identities=20%  Similarity=0.365  Sum_probs=54.0

Q ss_pred             EEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCC
Q 021321          122 GSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSH  201 (314)
Q Consensus       122 GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~  201 (314)
                      |-++-|.+ |.++|+.||.+..       +.+.       |..  +  ++  ...+.|+|+++.+..  ..+..++++  
T Consensus         1 G~avHIGn-G~~vt~tHva~~~-------~~v~-------g~~--f--~~--~~~~ge~~~v~~~~~--~~p~~~ig~--   55 (105)
T PF03510_consen    1 GWAVHIGN-GRYVTVTHVAKSS-------DSVD-------GQP--F--KI--VKTDGELCWVQSPLV--HLPAAQIGT--   55 (105)
T ss_pred             CceEEeCC-CEEEEEEEEeccC-------ceEc-------CcC--c--EE--EEeccCEEEEECCCC--CCCeeEecc--
Confidence            34677886 9999999999843       2221       221  1  22  224569999998753  355666643  


Q ss_pred             CCCCCCEEEEEEcCCCCCCCeE--eeEEecccccccCCCCccccceEEEeeccCCCCcccceec
Q 021321          202 DLRVGQSCFAIGNPYGFEDTLT--TGVVSGLGREIPSPNGRAIRGAIQTDAAINSGNSGGPLMN  263 (314)
Q Consensus       202 ~~~~G~~v~~iG~p~~~~~~~~--~G~vs~~~~~~~~~~~~~~~~~i~~~~~~~~G~SGGPl~n  263 (314)
                          |.+++   |+.+......  .+...       ...+....-...+...+.+||-|-|.||
T Consensus        56 ----g~Pv~---~~~~~~~~t~~~~~~~~-------t~~~~v~G~~~~~~~~T~~GDCGlPY~d  105 (105)
T PF03510_consen   56 ----GKPVY---DTWGLHPVTTWSEGTYN-------TPTGTVNGWHVKITNPTKKGDCGLPYFD  105 (105)
T ss_pred             ----CCCEE---ecCCCccEEEeccceEE-------cCCcEEEEEEEeCCCCccCCccCCcccC
Confidence                44455   3333222111  11111       0111100112233336789999999986


No 31 
>PF09342 DUF1986:  Domain of unknown function (DUF1986);  InterPro: IPR015420 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This domain is found in serine endopeptidases belonging to MEROPS peptidase family S1A (clan PA). It is found in unusual mosaic proteins, which are encoded by the Drosophila nudel gene (see P98159 from SWISSPROT). Nudel is involved in defining embryonic dorsoventral polarity. Three proteases; ndl, gd and snk process easter to create active easter. Active easter defines cell identities along the dorsal-ventral continuum by activating the spz ligand for the Tl receptor in the ventral region of the embryo. Nudel, pipe and windbeutel together trigger the protease cascade within the extraembryonic perivitelline compartment which induces dorsoventral polarity of the Drosophila embryo [].
Probab=94.04  E-value=2.1  Score=37.59  Aligned_cols=94  Identities=16%  Similarity=0.260  Sum_probs=57.1

Q ss_pred             cccceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEE-E---EEEEEEeC-----CCCcEEEEEEee
Q 021321          117 KVEGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFY-R---EGKMVGCD-----PAYDLAVLKVDV  187 (314)
Q Consensus       117 ~~~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~-~---~a~v~~~d-----~~~DlAlL~v~~  187 (314)
                      .+...++|++|++ .|+|++-.|+.+..-   ....+.+.+  +.|+.+. +   .-++...|     +..++.||.++.
T Consensus        25 dG~~~CsgvLlD~-~WlLvsssCl~~I~L---~~~Yvsall--G~~Kt~~~v~Gp~EQI~rVD~~~~V~~S~v~LLHL~~   98 (267)
T PF09342_consen   25 DGRYWCSGVLLDP-HWLLVSSSCLRGISL---SHHYVSALL--GGGKTYLSVDGPHEQISRVDCFKDVPESNVLLLHLEQ   98 (267)
T ss_pred             cCeEEEEEEEecc-ceEEEeccccCCccc---ccceEEEEe--cCcceecccCCChheEEEeeeeeeccccceeeeeecC
Confidence            4567999999998 899999999986421   113344444  3233211 0   01233333     678999999987


Q ss_pred             CCC---ccceeecCC-CCCCCCCCEEEEEEcCC
Q 021321          188 EGF---ELKPVVLGT-SHDLRVGQSCFAIGNPY  216 (314)
Q Consensus       188 ~~~---~~~~~~l~~-~~~~~~G~~v~~iG~p~  216 (314)
                      +..   .+.|.-+.+ .......+.++++|.-.
T Consensus        99 ~~~fTr~VlP~flp~~~~~~~~~~~CVAVg~d~  131 (267)
T PF09342_consen   99 PANFTRYVLPTFLPETSNENESDDECVAVGHDD  131 (267)
T ss_pred             cccceeeecccccccccCCCCCCCceEEEEccc
Confidence            632   244555533 23444456899999765


No 32 
>PF05416 Peptidase_C37:  Southampton virus-type processing peptidase;  InterPro: IPR001665 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [].  This group of cysteine peptidases belong to the MEROPS peptidase family C37, (clan PA(C)). The type example is calicivirin from Southampton virus, an endopeptidase that cleaves the polyprotein at sites N-terminal to itself, liberating the polyprotein helicase. Southampton virus is a positive-stranded ssRNA virus belonging to the Caliciviruses, which are viruses that cause gastroenteritis. The calicivirus genome contains two open reading frames, ORF1 and ORF2. ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity []. The regions of the polyprotein in which these activities lie are similar to proteins produced by the picornaviruses []. ORF2 encodes a structural, capsid protein. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely the Norwalk-like viruses or small round structured viruses (SRSVs), and those classed as non-SRSVs.; GO: 0004197 cysteine-type endopeptidase activity, 0006508 proteolysis; PDB: 2FYQ_A 2FYR_A 1WQS_D 4ASH_A 2IPH_B.
Probab=92.85  E-value=0.12  Score=48.42  Aligned_cols=137  Identities=21%  Similarity=0.316  Sum_probs=67.9

Q ss_pred             cceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEeeC-CCccceeec
Q 021321          119 EGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDVE-GFELKPVVL  197 (314)
Q Consensus       119 ~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~~-~~~~~~~~l  197 (314)
                      -++|-||-|++ ...+|+-||+..      +...+   |    |.    +..-+..+..-+++-+++..+ ..+++-+-|
T Consensus       378 fGsGWGfWVS~-~lfITttHViP~------g~~E~---F----Gv----~i~~i~vh~sGeF~~~rFpk~iRPDvtgmiL  439 (535)
T PF05416_consen  378 FGSGWGFWVSP-TLFITTTHVIPP------GAKEA---F----GV----PISQIQVHKSGEFCRFRFPKPIRPDVTGMIL  439 (535)
T ss_dssp             ETTEEEEESSS-SEEEEEGGGS-S------TTSEE---T----TE----ECGGEEEEEETTEEEEEESS-SSTTS---EE
T ss_pred             cCCceeeeecc-eEEEEeeeecCC------cchhh---h----CC----ChhHeEEeeccceEEEecCCCCCCCccceee
Confidence            36899999998 899999999974      22211   1    11    111123445577888888654 224555556


Q ss_pred             CCCCCCCCCCEEEE-EEcCCCC--CCCeEeeEEecccccccCCCCccccceEEE-------eeccCCCCcccceecCCC-
Q 021321          198 GTSHDLRVGQSCFA-IGNPYGF--EDTLTTGVVSGLGREIPSPNGRAIRGAIQT-------DAAINSGNSGGPLMNSFG-  266 (314)
Q Consensus       198 ~~~~~~~~G~~v~~-iG~p~~~--~~~~~~G~vs~~~~~~~~~~~~~~~~~i~~-------~~~~~~G~SGGPl~n~~G-  266 (314)
                       . +-...|.-+.+ |=.+.|.  ...+..|...+..-.-....++  ..++.+       |..+.+|+.|+|-+-..| 
T Consensus       440 -E-eGapEGtV~siLiKR~sGEllpLAvRMgt~AsmkIqgr~v~GQ--~GMLLTGaNAK~mDLGT~PGDCGcPYvyKrgN  515 (535)
T PF05416_consen  440 -E-EGAPEGTVCSILIKRPSGELLPLAVRMGTHASMKIQGRTVHGQ--MGMLLTGANAKGMDLGTIPGDCGCPYVYKRGN  515 (535)
T ss_dssp             ---SS--TT-EEEEEEE-TTSBEEEEEEEEEEEEEEEETTEEEEEE--EEEETTSTT-SSTTTS--TTGTT-EEEEEETT
T ss_pred             -c-cCCCCceEEEEEEEcCCccchhhhhhhccceeEEEcceeecce--eeeeeecCCccccccCCCCCCCCCceeeecCC
Confidence             2 33455665544 4455443  2234444443321100000011  112222       334679999999997655 


Q ss_pred             --eEEEEEccccc
Q 021321          267 --HVIGVNTATFT  277 (314)
Q Consensus       267 --~vvGI~s~~~~  277 (314)
                        -|+|+|.+...
T Consensus       516 d~VV~GVH~AAtr  528 (535)
T PF05416_consen  516 DWVVIGVHAAATR  528 (535)
T ss_dssp             EEEEEEEEEEE-S
T ss_pred             cEEEEEEEehhcc
Confidence              48999998754


No 33 
>PF00947 Pico_P2A:  Picornavirus core protein 2A;  InterPro: IPR000081 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [].  This domain defines cysteine peptidases belong to MEROPS peptidase family C3 (picornain, clan PA(C)), subfamilies 3CA and 3CB. The protein fold of this peptidase domain for members of this family resembles that of the serine peptidase, chymotrypsin [], the type example for clan PA. Picornaviral proteins are expressed as a single polyprotein which is cleaved by the viral 3C cysteine protease []. The poliovirus polyprotein is selectively cleaved between the Gln-|-Gly bond. In other picornavirus reactions Glu may be substituted for Gln, and Ser or Thr for Gly. ; GO: 0008233 peptidase activity, 0006508 proteolysis, 0016032 viral reproduction; PDB: 2HRV_B 1Z8R_A.
Probab=90.94  E-value=0.36  Score=37.82  Aligned_cols=32  Identities=28%  Similarity=0.353  Sum_probs=23.8

Q ss_pred             ceEEEeeccCCCCcccceecCCCeEEEEEcccc
Q 021321          244 GAIQTDAAINSGNSGGPLMNSFGHVIGVNTATF  276 (314)
Q Consensus       244 ~~i~~~~~~~~G~SGGPl~n~~G~vvGI~s~~~  276 (314)
                      +++.......||+.||+|+.. --||||++++.
T Consensus        79 ~~l~g~Gp~~PGdCGg~L~C~-HGViGi~Tagg  110 (127)
T PF00947_consen   79 NLLIGEGPAEPGDCGGILRCK-HGVIGIVTAGG  110 (127)
T ss_dssp             CEEEEE-SSSTT-TCSEEEET-TCEEEEEEEEE
T ss_pred             CceeecccCCCCCCCceeEeC-CCeEEEEEeCC
Confidence            455556678999999999964 45999999874


No 34 
>PF02907 Peptidase_S29:  Hepatitis C virus NS3 protease;  InterPro: IPR004109 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This signature identifies the Hepatitis C virus NS3 protein as a serine protease which belongs to MEROPS peptidase family S29 (hepacivirin family, clan PA(S)), which has a trypsin-like fold. The non-structural (NS) protein NS3 is one of the NS proteins involved in replication of the HCV genome. The NS2 proteinase (IPR002518 from INTERPRO), a zinc-dependent enzyme, performs a single proteolytic cut to release the N terminus of NS3. The action of NS3 proteinase (NS3P), which resides in the N-terminal one-third of the NS3 protein, then yields all remaining non-structural proteins. The C-terminal two-thirds of the NS3 protein contain a helicase. The functional relationship between the proteinase and helicase domains is unknown. NS3 has a structural zinc-binding site and requires cofactor NS4. It has been suggested that the NS3 serine protease of hepatitus C is involved in cell transformation and that the ability to transform requires an active enzyme [].; GO: 0008236 serine-type peptidase activity, 0006508 proteolysis, 0019087 transformation of host cell by virus; PDB: 2QV1_B 3LOX_C 2OBQ_C 2OC1_C 2OC0_A 3LON_A 3KNX_A 2O8M_A 2OBO_A 2OC8_A ....
Probab=90.89  E-value=0.35  Score=38.20  Aligned_cols=42  Identities=31%  Similarity=0.717  Sum_probs=28.2

Q ss_pred             cCCCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHH
Q 021321          252 INSGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTV  296 (314)
Q Consensus       252 ~~~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i  296 (314)
                      ...|.||||++..+|.+|||..+..-..  +....+-|. |++.+
T Consensus       105 ~lkGSSGgPiLC~~GH~vG~f~aa~~tr--gvak~i~f~-P~e~l  146 (148)
T PF02907_consen  105 DLKGSSGGPILCPSGHAVGMFRAAVCTR--GVAKAIDFI-PVETL  146 (148)
T ss_dssp             HHTT-TT-EEEETTSEEEEEEEEEEEET--TEEEEEEEE-EHHHH
T ss_pred             EEecCCCCcccCCCCCEEEEEEEEEEcC--CceeeEEEE-eeeec
Confidence            4579999999999999999988775432  223344554 77654


No 35 
>PF02395 Peptidase_S6:  Immunoglobulin A1 protease Serine protease Prosite pattern;  InterPro: IPR000710 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This group of serine peptidases belong to the MEROPS peptidase family S6 (clan PA(S)). The type sample being the IgA1-specific serine endopeptidase from Neisseria gonorrhoeae []. These cleave prolyl bonds in the hinge regions of immunoglobulin A heavy chains. Similar specificity is shown by the unrelated family of M26 metalloendopeptidases.; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis; PDB: 3SZE_A 3H09_B 3SYJ_A 1WXR_A 3AK5_B.
Probab=87.01  E-value=3.6  Score=42.51  Aligned_cols=49  Identities=22%  Similarity=0.226  Sum_probs=30.4

Q ss_pred             cCCCCccccee--cC-C--CeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHH
Q 021321          252 INSGNSGGPLM--NS-F--GHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYL  303 (314)
Q Consensus       252 ~~~G~SGGPl~--n~-~--G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l  303 (314)
                      ..+||||+|||  |. +  ..++|+.+......+   .......+|.+++.++.++.
T Consensus       213 ~~~GDSGSPlF~YD~~~kKWvl~Gv~~~~~~~~g---~~~~~~~~~~~f~~~~~~~d  266 (769)
T PF02395_consen  213 GSPGDSGSPLFAYDKEKKKWVLVGVLSGGNGYNG---KGNWWNVIPPDFINQIKQND  266 (769)
T ss_dssp             --TT-TT-EEEEEETTTTEEEEEEEEEEECCCCH---SEEEEEEECHHHHHHHHHHC
T ss_pred             cccCcCCCceEEEEccCCeEEEEEEEccccccCC---ccceeEEecHHHHHHHHhhh
Confidence            46899999998  43 3  347899887654322   12445678888887777664


No 36 
>PF01732 DUF31:  Putative peptidase (DUF31);  InterPro: IPR022382  This domain has no known function. It is found in various hypothetical proteins and putative lipoproteins from mycoplasmas. 
Probab=81.46  E-value=1.1  Score=42.35  Aligned_cols=23  Identities=22%  Similarity=0.505  Sum_probs=20.8

Q ss_pred             ccCCCCcccceecCCCeEEEEEc
Q 021321          251 AINSGNSGGPLMNSFGHVIGVNT  273 (314)
Q Consensus       251 ~~~~G~SGGPl~n~~G~vvGI~s  273 (314)
                      .+..|.||+.|+|.+|++|||..
T Consensus       351 ~l~gGaSGS~V~n~~~~lvGIy~  373 (374)
T PF01732_consen  351 SLGGGASGSMVINQNNELVGIYF  373 (374)
T ss_pred             CCCCCCCcCeEECCCCCEEEEeC
Confidence            56689999999999999999975


No 37 
>COG5510 Predicted small secreted protein [Function unknown]
Probab=78.84  E-value=1.9  Score=26.93  Aligned_cols=24  Identities=25%  Similarity=0.350  Sum_probs=17.2

Q ss_pred             ccchhhHHHHHHHHHHHHhhhcCC
Q 021321           29 TRRSSIGFGSSVILSSFLVNFCSP   52 (314)
Q Consensus        29 ~~~~~~~~~~~~~~~~~~~~~~~~   52 (314)
                      ||++.+.+.++++++++++++|++
T Consensus         1 mmk~t~l~i~~vll~s~llaaCNT   24 (44)
T COG5510           1 MMKKTILLIALVLLASTLLAACNT   24 (44)
T ss_pred             CchHHHHHHHHHHHHHHHHHHhhh
Confidence            355666667777778888899974


No 38 
>PF12381 Peptidase_C3G:  Tungro spherical virus-type peptidase;  InterPro: IPR024387 This entry represents a rice tungro spherical waikavirus-type peptidase that belongs to MEROPS peptidase family C3G. It is a picornain 3C-type protease, and is responsible for the self-cleavage of the positive single-stranded polyproteins of a number of plant viral genomes. The location of the protease activity of the polyprotein is at the C-terminal end, adjacent and N-terminal to the putative RNA polymerase [, ].
Probab=75.01  E-value=3.2  Score=35.71  Aligned_cols=56  Identities=18%  Similarity=0.431  Sum_probs=38.8

Q ss_pred             cceEEEeeccCCCCcccceecC----CCeEEEEEcccccCCCCCCccceEEEEehH--HHHHHHHHHH
Q 021321          243 RGAIQTDAAINSGNSGGPLMNS----FGHVIGVNTATFTRKGTGLSSGVNFAIPID--TVVRTVPYLI  304 (314)
Q Consensus       243 ~~~i~~~~~~~~G~SGGPl~n~----~G~vvGI~s~~~~~~~~~~~~~~~~aipi~--~i~~~l~~l~  304 (314)
                      ...+++......|+-|||++-.    .-+++||+.++...      .+.+||-++.  .+.+.+.+|.
T Consensus       168 r~gleY~~~t~~GdCGs~i~~~~t~~~RKIvGiHVAG~~~------~~~gYAe~itQEDL~~A~~~l~  229 (231)
T PF12381_consen  168 RQGLEYQMPTMNGDCGSPIVRNNTQMVRKIVGIHVAGSAN------HAMGYAESITQEDLMRAINKLE  229 (231)
T ss_pred             eeeeeEECCCcCCCccceeeEcchhhhhhhheeeeccccc------ccceehhhhhHHHHHHHHHhhc
Confidence            3456777888999999999832    34799999998643      4566776653  4555555543


No 39 
>PRK10081 entericidin B membrane lipoprotein; Provisional
Probab=68.56  E-value=4.5  Score=26.01  Aligned_cols=24  Identities=21%  Similarity=0.265  Sum_probs=14.8

Q ss_pred             ccchhhHHHHHHHHHHHHhhhcCC
Q 021321           29 TRRSSIGFGSSVILSSFLVNFCSP   52 (314)
Q Consensus        29 ~~~~~~~~~~~~~~~~~~~~~~~~   52 (314)
                      ||++++.++++++++++.+.+|.+
T Consensus         1 MmKk~i~~i~~~l~~~~~l~~CnT   24 (48)
T PRK10081          1 MVKKTIAAIFSVLVLSTVLTACNT   24 (48)
T ss_pred             ChHHHHHHHHHHHHHHHHHhhhhh
Confidence            355655555555556666688873


No 40 
>COG3056 Uncharacterized lipoprotein [Cell envelope biogenesis, outer membrane]
Probab=60.85  E-value=12  Score=31.32  Aligned_cols=16  Identities=25%  Similarity=0.482  Sum_probs=10.5

Q ss_pred             HHHHHHhhhcCCCCCC
Q 021321           41 ILSSFLVNFCSPSSTL   56 (314)
Q Consensus        41 ~~~~~~~~~~~~~~~~   56 (314)
                      +++.+++++|...+..
T Consensus        22 laa~~lLagC~a~~~t   37 (204)
T COG3056          22 LAAIFLLAGCAAPPTT   37 (204)
T ss_pred             HHHHHHHHhcCCCCce
Confidence            3445666899876664


No 41 
>PF00571 CBS:  CBS domain CBS domain web page. Mutations in the CBS domain of Swiss:P35520 lead to homocystinuria.;  InterPro: IPR000644 CBS (cystathionine-beta-synthase) domains are small intracellular modules, mostly found in two or four copies within a protein, that occur in a variety of proteins in bacteria, archaea, and eukaryotes [, ]. Tandem pairs of CBS domains can act as binding domains for adenosine derivatives and may regulate the activity of attached enzymatic or other domains []. In some cases, CBS domains may act as sensors of cellular energy status by being activated by AMP and inhibited by ATP []. In chloride ion channels, the CBS domains have been implicated in intracellular targeting and trafficking, as well as in protein-protein interactions, but results vary with different channels: in the CLC-5 channel, the CBS domain was shown to be required for trafficking [], while in the CLC-1 channel, the CBS domain was shown to be critical for channel function, but not necessary for trafficking []. Recent experiments revealing that CBS domains can bind adenosine-containing ligands such ATP, AMP, or S-adenosylmethionine have led to the hypothesis that CBS domains function as sensors of intracellular metabolites [, ]. Crystallographic studies of CBS domains have shown that pairs of CBS sequences form a globular domain where each CBS unit adopts a beta-alpha-beta-beta-alpha pattern []. Crystal structure of the CBS domains of the AMP-activated protein kinase in complexes with AMP and ATP shows that the phosphate groups of AMP/ATP lie in a surface pocket at the interface of two CBS domains, which is lined with basic residues, many of which are associated with disease-causing mutations [].  In humans, mutations in conserved residues within CBS domains cause a variety of human hereditary diseases, including (with the gene mutated in parentheses): homocystinuria (cystathionine beta-synthase); Wolff-Parkinson-White syndrome (gamma 2 subunit of AMP-activated protein kinase); retinitis pigmentosa (IMP dehydrogenase-1); congenital myotonia, idiopathic generalized epilepsy, hypercalciuric nephrolithiasis, and classic Bartter syndrome (CLC chloride channel family members).; GO: 0005515 protein binding; PDB: 3JTF_A 3TE5_C 3TDH_C 3T4N_C 2QLV_C 3OI8_A 3LV9_A 2QH1_B 1PVM_B 3LQN_A ....
Probab=58.68  E-value=9  Score=24.75  Aligned_cols=22  Identities=23%  Similarity=0.483  Sum_probs=18.7

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+.+.-|++|.+|+++|+++..
T Consensus        28 ~~~~~~~V~d~~~~~~G~is~~   49 (57)
T PF00571_consen   28 NGISRLPVVDEDGKLVGIISRS   49 (57)
T ss_dssp             HTSSEEEEESTTSBEEEEEEHH
T ss_pred             cCCcEEEEEecCCEEEEEEEHH
Confidence            4678899999999999999853


No 42 
>PRK14864 putative biofilm stress and motility protein A; Provisional
Probab=58.02  E-value=66  Score=24.45  Aligned_cols=10  Identities=0%  Similarity=0.019  Sum_probs=5.8

Q ss_pred             ccceEEEEEE
Q 021321          118 VEGTGSGFVW  127 (314)
Q Consensus       118 ~~~~GsGfiI  127 (314)
                      ....||..|.
T Consensus        93 ~~~~atA~iY  102 (104)
T PRK14864         93 GQWYSQAILY  102 (104)
T ss_pred             CeEEEEEEEe
Confidence            3456666665


No 43 
>COG0298 HypC Hydrogenase maturation factor [Posttranslational modification, protein turnover, chaperones]
Probab=51.85  E-value=38  Score=24.33  Aligned_cols=47  Identities=23%  Similarity=0.330  Sum_probs=30.9

Q ss_pred             EEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEE-EEcC
Q 021321          167 REGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFA-IGNP  215 (314)
Q Consensus       167 ~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~-iG~p  215 (314)
                      .+++++..+.++++|++.+-.-...+ .+.|-. ..++.|++|.+ +||-
T Consensus         5 iPgqI~~I~~~~~~A~Vd~gGvkreV-~l~Lv~-~~v~~GdyVLVHvGfA   52 (82)
T COG0298           5 IPGQIVEIDDNNHLAIVDVGGVKREV-NLDLVG-EEVKVGDYVLVHVGFA   52 (82)
T ss_pred             cccEEEEEeCCCceEEEEeccEeEEE-Eeeeec-CccccCCEEEEEeeEE
Confidence            46788888888889999885422122 222312 37899999876 6774


No 44 
>PRK15396 murein lipoprotein; Provisional
Probab=46.88  E-value=20  Score=25.70  Aligned_cols=21  Identities=33%  Similarity=0.290  Sum_probs=13.0

Q ss_pred             hhhHHHHHHHHHHHHhhhcCC
Q 021321           32 SSIGFGSSVILSSFLVNFCSP   52 (314)
Q Consensus        32 ~~~~~~~~~~~~~~~~~~~~~   52 (314)
                      +..+++.+++++++++++|+.
T Consensus         3 ~~kl~l~av~ls~~LLaGCAs   23 (78)
T PRK15396          3 RTKLVLGAVILGSTLLAGCSS   23 (78)
T ss_pred             hhHHHHHHHHHHHHHHHHcCC
Confidence            334555555665667799973


No 45 
>PF02743 Cache_1:  Cache domain;  InterPro: IPR004010 Cache is an extracellular domain that is predicted to have a role in small-molecule recognition in a wide range of proteins, including the animal dihydropyridine-sensitive voltage-gated Ca2+ channel; alpha-2delta subunit, and various bacterial chemotaxis receptors. The name Cache comes from CAlcium channels and CHEmotaxis receptors. This domain consists of an N-terminal part with three predicted strands and an alpha-helix, and a C-terminal part with a strand dyad followed by a relatively unstructured region. The N-terminal portion of the (unpermuted) Cache domain contains three predicted strands that could form a sheet analogous to that present in the core of the PAS domain structure. Cache domains are particularly widespread in bacteria, with Vibrio cholerae. The animal calcium channel alpha-2delta subunits might have acquired a part of their extracellular domains from a bacterial source []. The Cache domain appears to have arisen from the GAF-PAS fold despite their divergent functions [].; GO: 0016020 membrane; PDB: 3C8C_A 3LIB_D 3LIA_A 3LI8_A 3LI9_A.
Probab=43.56  E-value=27  Score=24.53  Aligned_cols=30  Identities=23%  Similarity=0.472  Sum_probs=21.9

Q ss_pred             cceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHH
Q 021321          259 GPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYL  303 (314)
Q Consensus       259 GPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l  303 (314)
                      -|+++.+|+++|++..               .+.++.+.++++++
T Consensus        19 ~pi~~~~g~~~Gvv~~---------------di~l~~l~~~i~~~   48 (81)
T PF02743_consen   19 VPIYDDDGKIIGVVGI---------------DISLDQLSEIISNI   48 (81)
T ss_dssp             EEEEETTTEEEEEEEE---------------EEEHHHHHHHHTTS
T ss_pred             EEEECCCCCEEEEEEE---------------EeccceeeeEEEee
Confidence            5788889999998864               35667777766664


No 46 
>PF05578 Peptidase_S31:  Pestivirus NS3 polyprotein peptidase S31;  InterPro: IPR000280 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases []. Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base []. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ]. This group of serine peptidases belong to MEROPS peptidase family S31 (clan PA(S)). The type example is pestivirus NS3 polyprotein peptidase from bovine viral diarrhea virus, which is Type 1 pestivirus. The pestiviruses are single-stranded RNA viruses whose genomes encode one large polyprotein []. The p80 endopeptidase resides towards the middle of the polyprotein and is responsible for processing all non-structural pestivirus proteins [, ]. The p80 enzyme is similar to other proteases in the PA(S) clan and is predicted to have a fold similar to that of chymotrypsin [, ]. An HDS catalytic triad has been identified [].; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis
Probab=42.84  E-value=92  Score=25.38  Aligned_cols=128  Identities=23%  Similarity=0.285  Sum_probs=65.8

Q ss_pred             cceEEEEEEcCCCEEEeccccccCCCcCCCCcceEEEEEecCCCCeEEEEEEEEEeCCC--CcEEEEEEeeCCCccceee
Q 021321          119 EGTGSGFVWDKFGHIVTNYHVVAKLATDTSGLHRCKVSLFDAKGNGFYREGKMVGCDPA--YDLAVLKVDVEGFELKPVV  196 (314)
Q Consensus       119 ~~~GsGfiI~~~g~VLT~aHvv~~~~~~~~~~~~~~v~~~~~~g~~~~~~a~v~~~d~~--~DlAlL~v~~~~~~~~~~~  196 (314)
                      ++.-+|+-+...|-|-.--||..+.          .+.+-|.-|+.     +++..+.+  .|=.-         + -++
T Consensus        50 rgletgwaythqggissvdhvt~gk----------d~lvcdsmgrt-----rvvcqsnnk~tde~e---------y-gvk  104 (211)
T PF05578_consen   50 RGLETGWAYTHQGGISSVDHVTAGK----------DLLVCDSMGRT-----RVVCQSNNKMTDETE---------Y-GVK  104 (211)
T ss_pred             hcccccceeeccCCcccceeeecCC----------ceEEecCCCce-----EEEEccCCcccchhh---------c-ccc
Confidence            3466788887778787778887642          12233333332     23222211  11100         0 111


Q ss_pred             cCCCCCCCCCCEEEEEEcCCCCCCCeEeeEEecccccccC-----CCCccccceEEEeeccCCCCcccceecC-CCeEEE
Q 021321          197 LGTSHDLRVGQSCFAIGNPYGFEDTLTTGVVSGLGREIPS-----PNGRAIRGAIQTDAAINSGNSGGPLMNS-FGHVIG  270 (314)
Q Consensus       197 l~~~~~~~~G~~v~~iG~p~~~~~~~~~G~vs~~~~~~~~-----~~~~~~~~~i~~~~~~~~G~SGGPl~n~-~G~vvG  270 (314)
                        .......|..+|++ +|.....+.+.|.+-.+.+.-..     ..+.    --.+|..-..|.||=|+|.. .|++||
T Consensus       105 --tdsgcp~garcyv~-npea~nisgtkga~vhlqk~ggef~cvta~gt----paf~~~knlkg~s~~pifeassgr~vg  177 (211)
T PF05578_consen  105 --TDSGCPDGARCYVL-NPEATNISGTKGAMVHLQKTGGEFTCVTASGT----PAFFDLKNLKGWSGLPIFEASSGRVVG  177 (211)
T ss_pred             --cCCCCCCCcEEEEe-CCcccccccCcceEEEEeccCCceEEEeccCC----cceeeccccCCCCCCceeeccCCcEEE
Confidence              12335667888888 66554444444444333221000     0000    01223334579999999965 899999


Q ss_pred             EEcccccC
Q 021321          271 VNTATFTR  278 (314)
Q Consensus       271 I~s~~~~~  278 (314)
                      =+-.+.++
T Consensus       178 r~k~gkn~  185 (211)
T PF05578_consen  178 RVKVGKNE  185 (211)
T ss_pred             EEEecCCC
Confidence            88766554


No 47 
>PF14827 Cache_3:  Sensory domain of two-component sensor kinase; PDB: 1OJG_A 3BY8_A 1P0Z_I 2V9A_A 2J80_B.
Probab=39.37  E-value=21  Score=27.36  Aligned_cols=19  Identities=37%  Similarity=0.482  Sum_probs=13.8

Q ss_pred             ccceecCCCeEEEEEcccc
Q 021321          258 GGPLMNSFGHVIGVNTATF  276 (314)
Q Consensus       258 GGPl~n~~G~vvGI~s~~~  276 (314)
                      -.|++|.+|++||+++.+.
T Consensus        93 ~~PV~d~~g~viG~V~VG~  111 (116)
T PF14827_consen   93 FAPVYDSDGKVIGVVSVGV  111 (116)
T ss_dssp             EEEEE-TTS-EEEEEEEEE
T ss_pred             EEeeECCCCcEEEEEEEEE
Confidence            3688888999999998654


No 48 
>PF01732 DUF31:  Putative peptidase (DUF31);  InterPro: IPR022382  This domain has no known function. It is found in various hypothetical proteins and putative lipoproteins from mycoplasmas. 
Probab=34.89  E-value=28  Score=32.80  Aligned_cols=24  Identities=29%  Similarity=0.476  Sum_probs=19.2

Q ss_pred             cceEEEEEEcC----CC------EEEeccccccC
Q 021321          119 EGTGSGFVWDK----FG------HIVTNYHVVAK  142 (314)
Q Consensus       119 ~~~GsGfiI~~----~g------~VLT~aHvv~~  142 (314)
                      ...|||+|++-    ++      ++.||.||+..
T Consensus        35 ~~~GT~WIlDy~~~~~~~~p~k~y~ATNlHVa~~   68 (374)
T PF01732_consen   35 SVSGTGWILDYKKPEDNKYPTKWYFATNLHVASN   68 (374)
T ss_pred             cCcceEEEEEEeccCCCCCCeEEEEEechhhhcc
Confidence            46899999971    22      89999999983


No 49 
>COG3065 Slp Starvation-inducible outer membrane lipoprotein [Cell envelope biogenesis, outer membrane]
Probab=33.83  E-value=2.1e+02  Score=24.03  Aligned_cols=11  Identities=27%  Similarity=0.570  Sum_probs=6.7

Q ss_pred             HHHhhhcCCCC
Q 021321           44 SFLVNFCSPSS   54 (314)
Q Consensus        44 ~~~~~~~~~~~   54 (314)
                      +|++++|...+
T Consensus        17 aflLsgC~tiP   27 (191)
T COG3065          17 AFLLSGCVTIP   27 (191)
T ss_pred             HHHHhhcccCC
Confidence            45568887433


No 50 
>cd04627 CBS_pair_14 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener
Probab=33.28  E-value=28  Score=26.20  Aligned_cols=22  Identities=27%  Similarity=0.407  Sum_probs=18.0

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+.+.=|++|.+|+++|+++..
T Consensus        97 ~~~~~lpVvd~~~~~vGiit~~  118 (123)
T cd04627          97 EGISSVAVVDNQGNLIGNISVT  118 (123)
T ss_pred             cCCceEEEECCCCcEEEEEeHH
Confidence            4556679999889999999875


No 51 
>cd04618 CBS_pair_5 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera
Probab=30.42  E-value=82  Score=22.84  Aligned_cols=50  Identities=18%  Similarity=0.094  Sum_probs=31.4

Q ss_pred             CCCcccceecCC-CeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcC
Q 021321          254 SGNSGGPLMNSF-GHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYG  307 (314)
Q Consensus       254 ~G~SGGPl~n~~-G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~  307 (314)
                      .+.++-|++|.+ |+++|+++...-...    ......-|-..+.+.++.+.+++
T Consensus        22 ~~~~~~~Vvd~~~~~~~Givt~~Dl~~~----~~~~~v~~~~~l~~a~~~m~~~~   72 (98)
T cd04618          22 NGIRSAPLWDSRKQQFVGMLTITDFILI----LRLVSIHPERSLFDAALLLLKNK   72 (98)
T ss_pred             cCCceEEEEeCCCCEEEEEEEHHHHhhh----eeeEEeCCCCcHHHHHHHHHHCC
Confidence            456788999874 899999996422110    00233445556777777776654


No 52 
>PRK10672 rare lipoprotein A; Provisional
Probab=29.63  E-value=2.6e+02  Score=26.25  Aligned_cols=29  Identities=24%  Similarity=0.155  Sum_probs=18.0

Q ss_pred             ccccCCcccceEEEEEEcCCCEEEecccccc
Q 021321          111 VDGEYAKVEGTGSGFVWDKFGHIVTNYHVVA  141 (314)
Q Consensus       111 ~~~~~~~~~~~GsGfiI~~~g~VLT~aHvv~  141 (314)
                      |.+.....+...+|-.++.  +-+|+||-.-
T Consensus        85 wYg~~f~G~~TA~Ge~~~~--~~~tAAH~tL  113 (361)
T PRK10672         85 IYDAEAGSNLTASGERFDP--NALTAAHPTL  113 (361)
T ss_pred             EeCCccCCCcCcCceeecC--CcCeeeccCC
Confidence            3333334455667777764  5799999654


No 53 
>COG3290 CitA Signal transduction histidine kinase regulating citrate/malate metabolism [Signal transduction mechanisms]
Probab=29.49  E-value=62  Score=31.92  Aligned_cols=18  Identities=28%  Similarity=0.551  Sum_probs=15.7

Q ss_pred             cceecCCCeEEEEEcccc
Q 021321          259 GPLMNSFGHVIGVNTATF  276 (314)
Q Consensus       259 GPl~n~~G~vvGI~s~~~  276 (314)
                      .|+||.+|++||+++-++
T Consensus       143 ~PI~d~~g~~IGvVsVG~  160 (537)
T COG3290         143 VPIFDEDGKQIGVVSVGY  160 (537)
T ss_pred             cceECCCCCEEEEEEEee
Confidence            599999999999998764


No 54 
>cd04603 CBS_pair_KefB_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with the KefB (Kef-type K+ transport systems) domain which is involved in inorganic ion transport and metabolism. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown.
Probab=29.16  E-value=40  Score=24.88  Aligned_cols=22  Identities=9%  Similarity=0.146  Sum_probs=17.2

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+.+--|++|.+|+++|+++..
T Consensus        85 ~~~~~lpVvd~~~~~~Giit~~  106 (111)
T cd04603          85 TEPPVVAVVDKEGKLVGTIYER  106 (111)
T ss_pred             cCCCeEEEEcCCCeEEEEEEhH
Confidence            3455569999889999999864


No 55 
>PF10049 DUF2283:  Protein of unknown function (DUF2283);  InterPro: IPR019270  Members of this family of hypothetical proteins have no known function. 
Probab=28.76  E-value=38  Score=21.84  Aligned_cols=12  Identities=17%  Similarity=0.559  Sum_probs=7.7

Q ss_pred             cCCCeEEEEEcc
Q 021321          263 NSFGHVIGVNTA  274 (314)
Q Consensus       263 n~~G~vvGI~s~  274 (314)
                      |.+|++|||-..
T Consensus        36 d~~G~ivGIEIl   47 (50)
T PF10049_consen   36 DEDGRIVGIEIL   47 (50)
T ss_pred             CCCCCEEEEEEE
Confidence            456777777543


No 56 
>cd04620 CBS_pair_7 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera
Probab=28.51  E-value=39  Score=24.93  Aligned_cols=21  Identities=29%  Similarity=0.431  Sum_probs=17.1

Q ss_pred             CCcccceecCCCeEEEEEccc
Q 021321          255 GNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       255 G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      +...-|++|.+|+++|+++..
T Consensus        90 ~~~~~pVvd~~~~~~Gvit~~  110 (115)
T cd04620          90 QIRHLPVLDDQGQLIGLVTAE  110 (115)
T ss_pred             CCceEEEEcCCCCEEEEEEhH
Confidence            445679999889999999864


No 57 
>PF07172 GRP:  Glycine rich protein family;  InterPro: IPR010800 This family consists of glycine rich proteins. Some of them may be involved in resistance to environmental stress [].
Probab=26.98  E-value=47  Score=24.80  Aligned_cols=9  Identities=22%  Similarity=0.560  Sum_probs=3.3

Q ss_pred             HHHHHHHHH
Q 021321           38 SSVILSSFL   46 (314)
Q Consensus        38 ~~~~~~~~~   46 (314)
                      ++++|+++|
T Consensus         9 L~l~LA~lL   17 (95)
T PF07172_consen    9 LGLLLAALL   17 (95)
T ss_pred             HHHHHHHHH
Confidence            333333333


No 58 
>cd04597 CBS_pair_DRTGG_assoc2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a DRTGG domain upstream. The function of the DRTGG domain, named after its conserved residues, is unknown. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown.
Probab=25.69  E-value=58  Score=24.40  Aligned_cols=22  Identities=18%  Similarity=0.176  Sum_probs=18.3

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+...-|++|.+|+++|+++..
T Consensus        87 ~~~~~lpVvd~~~~l~Givt~~  108 (113)
T cd04597          87 HNIRTLPVVDDDGTPAGIITLL  108 (113)
T ss_pred             cCCCEEEEECCCCeEEEEEEHH
Confidence            4667789999899999999864


No 59 
>cd04643 CBS_pair_30 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener
Probab=25.52  E-value=49  Score=24.29  Aligned_cols=17  Identities=29%  Similarity=0.430  Sum_probs=14.9

Q ss_pred             cceecCCCeEEEEEccc
Q 021321          259 GPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       259 GPl~n~~G~vvGI~s~~  275 (314)
                      -|++|.+|+++|+++..
T Consensus        95 ~~Vv~~~~~~~Gvit~~  111 (116)
T cd04643          95 LPVVDDDGIFIGIITRR  111 (116)
T ss_pred             eeEEeCCCeEEEEEEHH
Confidence            68999889999999874


No 60 
>PF08669 GCV_T_C:  Glycine cleavage T-protein C-terminal barrel domain;  InterPro: IPR013977  This entry shows glycine cleavage T-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. The T-protein is an aminomethyl transferase. ; PDB: 3ADA_A 1VRQ_A 1X31_A 3AD9_A 3AD8_A 3AD7_A 3GIR_A 1WOO_A 1WOS_A 1WOR_A ....
Probab=25.50  E-value=78  Score=23.01  Aligned_cols=23  Identities=22%  Similarity=0.329  Sum_probs=18.9

Q ss_pred             CcccceecCCCeEEEEEcccccC
Q 021321          256 NSGGPLMNSFGHVIGVNTATFTR  278 (314)
Q Consensus       256 ~SGGPl~n~~G~vvGI~s~~~~~  278 (314)
                      ..|.|+++.+|+.||.+++....
T Consensus        34 ~~g~~v~~~~g~~vG~vTS~~~s   56 (95)
T PF08669_consen   34 RGGEPVYDEDGKPVGRVTSGAYS   56 (95)
T ss_dssp             STTCEEEETTTEEEEEEEEEEEE
T ss_pred             CCCCEEEECCCcEEeEEEEEeEC
Confidence            45789998799999999988553


No 61 
>cd01739 LSm11_C The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation.  Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm11 is an SmD2 - like subunit which binds U7 snRNA along with LSm10 and five other Sm subunits to form a 7-member ring structure. LSm11 and the U7 snRNP of which it is a part are thought to play an important role in histone mRNA 3' processing.
Probab=25.31  E-value=1.2e+02  Score=20.93  Aligned_cols=39  Identities=26%  Similarity=0.284  Sum_probs=29.9

Q ss_pred             CcceEEEEEecCCCCeEEEEEEEEEeCCCCcEEEEEEee
Q 021321          149 GLHRCKVSLFDAKGNGFYREGKMVGCDPAYDLAVLKVDV  187 (314)
Q Consensus       149 ~~~~~~v~~~~~~g~~~~~~a~v~~~d~~~DlAlL~v~~  187 (314)
                      ....++|.+...+|-.-...+.++++|...+++|.-++.
T Consensus         7 er~RVrV~iR~~~gvrG~~~G~lvAFDK~wNm~L~DV~E   45 (66)
T cd01739           7 ERIRVRVHIRTFKGLRGVCSGFLVAFDKFWNMALVDVDE   45 (66)
T ss_pred             CCcEEEEEEecccCcccEEEEEEEeeeeehhheehhhhh
Confidence            345677777665555557889999999999999988864


No 62 
>cd04592 CBS_pair_EriC_assoc_euk This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in the EriC CIC-type chloride channels in eukaryotes. These ion channels are proteins with a seemingly simple task of allowing the passive flow of chloride ions across biological membranes. CIC-type chloride channels come from all kingdoms of life, have several gene families, and can be gated by voltage. The members of the CIC-type chloride channel are double-barreled: two proteins forming homodimers at a broad interface formed by four helices from each protein. The two pores are not found at this interface, but are completely contained within each subunit, as deduced from the mutational analyses, unlike many other channels, in which four or five identical or structurally related subunits jointly form one pore. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually 
Probab=23.82  E-value=65  Score=25.15  Aligned_cols=22  Identities=18%  Similarity=0.058  Sum_probs=18.2

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+.++-|++|.+|+++|+++..
T Consensus        22 ~~~~~~~VvD~~g~l~Givt~~   43 (133)
T cd04592          22 EKQSCVLVVDSDDFLEGILTLG   43 (133)
T ss_pred             cCCCEEEEECCCCeEEEEEEHH
Confidence            3557889999999999999954


No 63 
>cd04641 CBS_pair_28 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener
Probab=23.55  E-value=66  Score=23.96  Aligned_cols=22  Identities=23%  Similarity=0.362  Sum_probs=18.2

Q ss_pred             CCCCcccceecCCCeEEEEEcc
Q 021321          253 NSGNSGGPLMNSFGHVIGVNTA  274 (314)
Q Consensus       253 ~~G~SGGPl~n~~G~vvGI~s~  274 (314)
                      ..+.+.-|++|.+|+++|+++.
T Consensus        21 ~~~~~~~pVv~~~~~~~Giv~~   42 (120)
T cd04641          21 ERRVSALPIVDENGKVVDVYSR   42 (120)
T ss_pred             HcCCCeeeEECCCCeEEEEEeH
Confidence            3466788999989999999874


No 64 
>cd04619 CBS_pair_6 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera
Probab=23.23  E-value=57  Score=24.09  Aligned_cols=22  Identities=23%  Similarity=0.327  Sum_probs=17.4

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+...=|++|.+|+++|+++..
T Consensus        88 ~~~~~lpVvd~~~~~~Gvi~~~  109 (114)
T cd04619          88 RGLKNIPVVDENARPLGVLNAR  109 (114)
T ss_pred             cCCCeEEEECCCCcEEEEEEhH
Confidence            3555678998889999999864


No 65 
>PRK14864 putative biofilm stress and motility protein A; Provisional
Probab=22.04  E-value=65  Score=24.51  Aligned_cols=9  Identities=11%  Similarity=0.250  Sum_probs=3.9

Q ss_pred             CcceEEEEE
Q 021321          149 GLHRCKVSL  157 (314)
Q Consensus       149 ~~~~~~v~~  157 (314)
                      |+..++|.-
T Consensus        77 GA~yYrIi~   85 (104)
T PRK14864         77 GADYYVIVM   85 (104)
T ss_pred             CCCEEEEEE
Confidence            444444443


No 66 
>cd04602 CBS_pair_IMPDH_2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in the inosine 5' monophosphate dehydrogenase (IMPDH) protein.  IMPDH is an essential enzyme that catalyzes the first step unique to GTP synthesis, playing a key role in the regulation of cell proliferation and differentiation. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain in IMPDH have been associated with retinitis pigmentos
Probab=21.76  E-value=68  Score=23.63  Aligned_cols=22  Identities=23%  Similarity=0.365  Sum_probs=17.4

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+...-|++|.+|+++|+++..
T Consensus        88 ~~~~~~pVv~~~~~~~Gvit~~  109 (114)
T cd04602          88 SKKGKLPIVNDDGELVALVTRS  109 (114)
T ss_pred             cCCCceeEECCCCeEEEEEEHH
Confidence            3445679998889999999864


No 67 
>cd04614 CBS_pair_1 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera
Probab=21.74  E-value=74  Score=22.87  Aligned_cols=50  Identities=18%  Similarity=0.074  Sum_probs=31.2

Q ss_pred             CCCcccceecCCCeEEEEEcccccCCCCCCccceEEEEehHHHHHHHHHHHHcC
Q 021321          254 SGNSGGPLMNSFGHVIGVNTATFTRKGTGLSSGVNFAIPIDTVVRTVPYLIVYG  307 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~~~~~~~~~~~~~~~aipi~~i~~~l~~l~~~~  307 (314)
                      .+.++-|++|.+|+++|+++...-...    ....+.-+-+.+.+.++.+.+++
T Consensus        22 ~~~~~~~V~d~~~~~~Giv~~~dl~~~----~~~~~v~~~~~l~~a~~~m~~~~   71 (96)
T cd04614          22 ANVKALPVLDDDGKLSGIITERDLIAK----SEVVTATKRTTVSECAQKMKRNR   71 (96)
T ss_pred             cCCCeEEEECCCCCEEEEEEHHHHhcC----CCcEEecCCCCHHHHHHHHHHhC
Confidence            466788999989999999986532110    11333344455666776666554


No 68 
>cd04607 CBS_pair_NTP_transferase_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domain associated with the NTP (Nucleotidyl transferase) domain downstream.  CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown.
Probab=21.60  E-value=64  Score=23.62  Aligned_cols=22  Identities=23%  Similarity=0.398  Sum_probs=17.5

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+...-|++|.+|+++|+++..
T Consensus        87 ~~~~~~~Vv~~~~~~~Gvit~~  108 (113)
T cd04607          87 RSIRHLPILDEEGRVVGLATLD  108 (113)
T ss_pred             CCCCEEEEECCCCCEEEEEEhH
Confidence            3455678998889999999864


No 69 
>COG3448 CBS-domain-containing membrane protein [Signal transduction mechanisms]
Probab=21.29  E-value=61  Score=29.60  Aligned_cols=22  Identities=23%  Similarity=0.551  Sum_probs=17.6

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .|.--=|++|.+|+++||++..
T Consensus       344 ~g~H~lpvld~~g~lvGIvsQt  365 (382)
T COG3448         344 EGLHALPVLDAAGKLVGIVSQT  365 (382)
T ss_pred             CCcceeeEEcCCCcEEEEeeHH
Confidence            3444569999999999999864


No 70 
>cd04582 CBS_pair_ABC_OpuCA_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in association with the ABC transporter OpuCA. OpuCA is the ATP binding component of a bacterial solute transporter that serves a protective role to cells growing in a hyperosmolar environment but the function of the CBS domains in OpuCA remains unknown.  In the related ABC transporter, OpuA, the tandem CBS domains have been shown to function as sensors for ionic strength, whereby they control the transport activity through an electronic switching mechanism. ABC transporters are a large family of proteins involved in the transport of a wide variety of different compounds, like sugars, ions, peptides, and more complex organic molecules. They are a subset of nucleotide hydrolases that contain a signature motif, Q-loop, and H-loop/switch region, in addition to the Walker A motif/P-loop and Walker B motif commonly found in a number of ATP- and GTP-binding and hydrolyzi
Probab=20.97  E-value=67  Score=23.10  Aligned_cols=22  Identities=23%  Similarity=0.167  Sum_probs=17.2

Q ss_pred             CCCcccceecCCCeEEEEEccc
Q 021321          254 SGNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      .+.+--|++|.+|+++|+++..
T Consensus        80 ~~~~~~~Vv~~~~~~~Gvi~~~  101 (106)
T cd04582          80 HDMSWLPCVDEDGRYVGEVTQR  101 (106)
T ss_pred             CCCCeeeEECCCCcEEEEEEHH
Confidence            3445578999889999999864


No 71 
>COG5428 Uncharacterized conserved small protein [Function unknown]
Probab=20.85  E-value=72  Score=22.22  Aligned_cols=16  Identities=25%  Similarity=0.405  Sum_probs=13.2

Q ss_pred             cCCCeEEEEEcccccC
Q 021321          263 NSFGHVIGVNTATFTR  278 (314)
Q Consensus       263 n~~G~vvGI~s~~~~~  278 (314)
                      |.+|+|+||-.|....
T Consensus        37 de~GkV~GiEi~~As~   52 (69)
T COG5428          37 DENGKVIGIEIWNASA   52 (69)
T ss_pred             cCCCcEEEEEEEchhh
Confidence            5789999999997653


No 72 
>cd04583 CBS_pair_ABC_OpuCA_assoc2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in association with the ABC transporter OpuCA. OpuCA is the ATP binding component of a bacterial solute transporter that serves a protective role to cells growing in a hyperosmolar environment but the function of the CBS domains in OpuCA remains unknown.  In the related ABC transporter, OpuA, the tandem CBS domains have been shown to function as sensors for ionic strength, whereby they control the transport activity through an electronic switching mechanism. ABC transporters are a large family of proteins involved in the transport of a wide variety of different compounds, like sugars, ions, peptides, and more complex organic molecules. They are a subset of nucleotide hydrolases that contain a signature motif, Q-loop, and H-loop/switch region, in addition to the Walker A motif/P-loop and Walker B motif commonly found in a number of ATP- and GTP-binding and hydrolyz
Probab=20.67  E-value=74  Score=22.91  Aligned_cols=21  Identities=24%  Similarity=0.453  Sum_probs=16.9

Q ss_pred             CCcccceecCCCeEEEEEccc
Q 021321          255 GNSGGPLMNSFGHVIGVNTAT  275 (314)
Q Consensus       255 G~SGGPl~n~~G~vvGI~s~~  275 (314)
                      +...-|++|.+|+++|+++..
T Consensus        84 ~~~~~~vv~~~g~~~Gvit~~  104 (109)
T cd04583          84 GPKYVPVVDEDGKLVGLITRS  104 (109)
T ss_pred             CCceeeEECCCCeEEEEEehH
Confidence            445568999889999999864


No 73 
>cd04617 CBS_pair_4 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria.  The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair.  The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here.  It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera
Probab=20.62  E-value=69  Score=23.79  Aligned_cols=22  Identities=27%  Similarity=0.229  Sum_probs=16.5

Q ss_pred             CCCcccceecCC---CeEEEEEccc
Q 021321          254 SGNSGGPLMNSF---GHVIGVNTAT  275 (314)
Q Consensus       254 ~G~SGGPl~n~~---G~vvGI~s~~  275 (314)
                      .+..-=|++|.+   |+++|+++..
T Consensus        89 ~~~~~lpVvd~~~~~~~l~Gvit~~  113 (118)
T cd04617          89 HQVDSLPVVEKVDEGLEVIGRITKT  113 (118)
T ss_pred             cCCCEeeEEeCCCccceEEEEEEhh
Confidence            344557888876   7999999875


No 74 
>PF01455 HupF_HypC:  HupF/HypC family;  InterPro: IPR001109 The large subunit of [NiFe]-hydrogenase, as well as other nickel metalloenzymes, is synthesised as a precursor devoid of the metalloenzyme active site. This precursor then undergoes a complex post-translational maturation process that requires a number of accessory proteins. The hydrogenase expression/formation proteins (HupF/HypC) form a family of small proteins that are hydrogenase precursor-specific chaperones required for this maturation process []. They are believed to keep the hydrogenase precursor in a conformation accessible for metal incorporation [, ].; PDB: 3D3R_A 2Z1C_C 2OT2_A.
Probab=20.25  E-value=3.1e+02  Score=18.96  Aligned_cols=43  Identities=23%  Similarity=0.297  Sum_probs=28.0

Q ss_pred             EEEEEEEeCCCCcEEEEEEeeCCCccceeecCCCCCCCCCCEEEEE
Q 021321          167 REGKMVGCDPAYDLAVLKVDVEGFELKPVVLGTSHDLRVGQSCFAI  212 (314)
Q Consensus       167 ~~a~v~~~d~~~DlAlL~v~~~~~~~~~~~l~~~~~~~~G~~v~~i  212 (314)
                      ++++++..+.....|++....   ....+.+.--.++++||+|.+-
T Consensus         5 iP~~Vv~v~~~~~~A~v~~~G---~~~~V~~~lv~~v~~Gd~VLVH   47 (68)
T PF01455_consen    5 IPGRVVEVDEDGGMAVVDFGG---VRREVSLALVPDVKVGDYVLVH   47 (68)
T ss_dssp             EEEEEEEEETTTTEEEEEETT---EEEEEEGTTCTSB-TT-EEEEE
T ss_pred             ccEEEEEEeCCCCEEEEEcCC---cEEEEEEEEeCCCCCCCEEEEe
Confidence            567888887788999988753   2344444333458999998863


No 75 
>PRK10781 rcsF outer membrane lipoprotein; Reviewed
Probab=20.17  E-value=1.1e+02  Score=24.39  Aligned_cols=15  Identities=27%  Similarity=0.416  Sum_probs=8.2

Q ss_pred             HHHhhhcCCCCCCCC
Q 021321           44 SFLVNFCSPSSTLPS   58 (314)
Q Consensus        44 ~~~~~~~~~~~~~~~   58 (314)
                      .+++.+|......+.
T Consensus        10 ~L~LsGCS~l~~tp~   24 (133)
T PRK10781         10 ALMLTGCSMLSRSPV   24 (133)
T ss_pred             HHHHhhccccCcCCC
Confidence            344577875555433


No 76 
>PRK13835 conjugal transfer protein TrbH; Provisional
Probab=20.11  E-value=2.2e+02  Score=22.99  Aligned_cols=23  Identities=22%  Similarity=0.307  Sum_probs=16.7

Q ss_pred             cchHHHHHHHHhCCceEEEEeee
Q 021321           74 EEDRVVQLFQETSPSVVSIQDLE   96 (314)
Q Consensus        74 ~~~~~~~~~~~~~~svV~I~~~~   96 (314)
                      ..+-+.++.+.+.|+--+|...+
T Consensus        43 A~D~vsqLae~~pPa~tt~~l~q   65 (145)
T PRK13835         43 AGDMVSRLAEQIGPGTTTIKLKK   65 (145)
T ss_pred             HHHHHHHHHHhcCCCceEEEEee
Confidence            44566778999999987776543


Done!