Query         044785
Match_columns 346
No_of_seqs    253 out of 2728
Neff          8.0 
Searched_HMMs 46136
Date          Fri Mar 29 06:44:06 2013
Command       hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/044785.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/044785hhsearch_cdd -cpu 12 -v 0 

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 PF13947 GUB_WAK_bind:  Wall-as 100.0 1.4E-28 2.9E-33  196.9  10.5  104   29-134     1-106 (106)
  2 PF08488 WAK:  Wall-associated   99.6 2.4E-15 5.2E-20  118.8   8.4   73  163-238     1-75  (103)
  3 PF07645 EGF_CA:  Calcium-bindi  98.6 5.7E-08 1.2E-12   64.0   3.4   37  288-325     1-37  (42)
  4 KOG1214 Nidogen and related ba  98.3 6.7E-07 1.5E-11   91.0   5.9   79  242-335   785-866 (1289)
  5 KOG1214 Nidogen and related ba  97.8 2.2E-05 4.8E-10   80.2   5.6   80  247-337   700-779 (1289)
  6 KOG1219 Uncharacterized conser  97.8 2.9E-05 6.3E-10   86.2   6.1   67  247-325  3909-3975(4289)
  7 KOG4289 Cadherin EGF LAG seven  97.5  0.0001 2.2E-09   79.2   4.7   53  266-325  1220-1272(2531)
  8 KOG1219 Uncharacterized conser  97.5 0.00023 4.9E-09   79.6   6.8   72  241-325  3865-3936(4289)
  9 smart00179 EGF_CA Calcium-bind  97.4 0.00022 4.7E-09   45.4   4.0   33  288-322     1-33  (39)
 10 KOG4260 Uncharacterized conser  97.4 9.2E-05   2E-09   67.0   2.1   52  269-324   218-270 (350)
 11 PF14670 FXa_inhibition:  Coagu  97.3 0.00015 3.2E-09   45.9   2.2   32  297-332     5-36  (36)
 12 PF12947 EGF_3:  EGF domain;  I  97.3 0.00015 3.2E-09   45.9   2.0   31  296-326     4-34  (36)
 13 PF12662 cEGF:  Complement Clr-  97.2 0.00029 6.2E-09   40.2   2.3   24  267-291     1-24  (24)
 14 cd00054 EGF_CA Calcium-binding  96.8  0.0017 3.7E-08   40.6   3.7   35  288-324     1-35  (38)
 15 PF00008 EGF:  EGF-like domain   96.8 0.00092   2E-08   41.1   2.2   28  297-324     3-31  (32)
 16 KOG4260 Uncharacterized conser  96.6  0.0014   3E-08   59.5   2.8   80  229-325   220-307 (350)
 17 cd01475 vWA_Matrilin VWA_Matri  96.2  0.0041 8.9E-08   56.1   3.7   38  285-325   183-220 (224)
 18 cd00053 EGF Epidermal growth f  96.0   0.011 2.5E-07   36.1   3.8   28  297-324     5-32  (36)
 19 PF12947 EGF_3:  EGF domain;  I  95.9   0.006 1.3E-07   38.5   2.2   30  247-282     6-35  (36)
 20 KOG4289 Cadherin EGF LAG seven  95.8   0.018 3.8E-07   62.9   6.4   60  247-320  1245-1308(2531)
 21 PF12662 cEGF:  Complement Clr-  95.8  0.0099 2.1E-07   33.9   2.5   22  312-335     1-22  (24)
 22 smart00181 EGF Epidermal growt  95.6   0.018   4E-07   35.5   3.6   26  298-324     6-31  (35)
 23 PF07645 EGF_CA:  Calcium-bindi  94.1   0.045 9.8E-07   35.6   2.4   32  242-279     4-36  (42)
 24 KOG1217 Fibrillins and related  93.9   0.095 2.1E-06   51.7   5.4   63  253-325   243-305 (487)
 25 PF00008 EGF:  EGF-like domain   93.2   0.044 9.5E-07   33.5   1.2   28  247-279     4-31  (32)
 26 cd00053 EGF Epidermal growth f  91.0    0.39 8.5E-06   28.9   3.7   27  247-279     6-32  (36)
 27 smart00179 EGF_CA Calcium-bind  89.7    0.58 1.2E-05   29.1   3.6   30  242-277     4-33  (39)
 28 KOG1217 Fibrillins and related  89.3    0.51 1.1E-05   46.5   4.7   54  267-325   151-204 (487)
 29 PF06247 Plasmod_Pvs28:  Plasmo  88.6     0.2 4.3E-06   43.5   1.0   62  267-334    19-87  (197)
 30 smart00181 EGF Epidermal growt  87.5    0.78 1.7E-05   27.9   3.1   25  248-279     7-31  (35)
 31 PF12946 EGF_MSP1_1:  MSP1 EGF   86.3    0.42   9E-06   30.2   1.3   28  298-325     5-33  (37)
 32 cd00054 EGF_CA Calcium-binding  86.0    0.98 2.1E-05   27.6   3.0   32  242-279     4-35  (38)
 33 PF07974 EGF_2:  EGF-like domai  80.8     2.9 6.3E-05   25.5   3.4   25  298-324     6-30  (32)
 34 PF12661 hEGF:  Human growth fa  76.1     1.3 2.9E-05   21.3   0.7   12  314-325     1-12  (13)
 35 PF14670 FXa_inhibition:  Coagu  75.4     1.9   4E-05   27.1   1.4   22  253-280    10-31  (36)
 36 KOG1225 Teneurin-1 and related  74.3     5.9 0.00013   40.2   5.4   24  298-325   316-339 (525)
 37 PF00954 S_locus_glycop:  S-loc  73.2     5.9 0.00013   31.3   4.2   41  230-278    68-108 (110)
 38 PF07172 GRP:  Glycine rich pro  68.0     5.1 0.00011   31.1   2.7   12    1-12      1-12  (95)
 39 KOG0994 Extracellular matrix g  65.1     8.8 0.00019   42.1   4.5   64  267-334   840-908 (1758)
 40 PHA03099 epidermal growth fact  63.6     7.3 0.00016   31.7   2.8   37  288-325    41-79  (139)
 41 PF00954 S_locus_glycop:  S-loc  61.1     9.1  0.0002   30.2   3.0   32  290-324    78-109 (110)
 42 KOG1225 Teneurin-1 and related  57.7      14  0.0003   37.6   4.3   22  299-325   344-365 (525)
 43 cd01475 vWA_Matrilin VWA_Matri  48.1      14  0.0003   33.0   2.3   22  253-280   199-220 (224)
 44 PF05887 Trypan_PARP:  Procycli  44.0     7.6 0.00016   31.9   0.0   18    1-18      1-18  (143)
 45 PF09064 Tme5_EGF_like:  Thromb  43.8      19  0.0004   22.3   1.7   13  313-325    18-30  (34)
 46 PF08261 Carcinustatin:  Carcin  42.1      13 0.00027   15.3   0.5    6   42-47      3-8   (8)
 47 PF14380 WAK_assoc:  Wall-assoc  41.8      40 0.00088   25.7   3.8   39  229-275    54-93  (94)
 48 PHA02887 EGF-like protein; Pro  40.9      23 0.00051   28.4   2.3   35  290-325    84-120 (126)
 49 KOG0994 Extracellular matrix g  40.3      29 0.00064   38.4   3.6   17  267-283   884-901 (1758)
 50 PF08685 GON:  GON domain;  Int  34.7 1.1E+02  0.0024   27.1   5.8   59   96-155   126-185 (201)
 51 COG2991 Uncharacterized protei  32.6      73  0.0016   23.3   3.5   16   34-52     31-46  (77)
 52 PHA02887 EGF-like protein; Pro  31.1      42 0.00091   27.0   2.3   35  241-280    84-120 (126)
 53 PF10916 DUF2712:  Protein of u  24.3      77  0.0017   26.4   2.8   16   38-53     32-47  (146)
 54 PRK02710 plastocyanin; Provisi  23.5      87  0.0019   25.0   3.0   17    1-17      1-17  (119)
 55 smart00051 DSL delta serrate l  22.1 1.3E+02  0.0027   21.4   3.2   46  267-324    16-61  (63)
 56 KOG1836 Extracellular matrix g  20.6      90  0.0019   36.6   3.3   52  267-325   755-810 (1705)

No 1  
>PF13947 GUB_WAK_bind:  Wall-associated receptor kinase galacturonan-binding
Probab=99.95  E-value=1.4e-28  Score=196.90  Aligned_cols=104  Identities=39%  Similarity=0.860  Sum_probs=91.0

Q ss_pred             CCCCCCCCCCccccCCCCCCCCCCCCCCCeEecCCCCCCCCeecccCCccEEEEEee-cCeEeEeeeeecccccC-Cccc
Q 044785           29 KPGCPSRCGDVEIPYPFGTRPGCFLNKYFVITCNKTHYNPPKPFLRKSNIEVVNITI-DGRMNVMQFVAKECYRK-GNSV  106 (346)
Q Consensus        29 ~~~C~~~CG~v~IpYPFgig~~C~~~~~F~l~C~~~~~~~p~l~l~~~~~~V~~is~-~~~~~v~~~~~~~c~~~-~~~~  106 (346)
                      +++||++||||+||||||||++|+++|+|+|+|+++ +++|+|++.+.+|+|++|+| +++++|..++.+.|+.. ....
T Consensus         1 ~~~C~~~CGnv~IpYPFgi~~~C~~~~~F~L~C~~~-~~~~~l~l~~~~~~V~~I~~~~~~i~v~~~~~~~~~~~~~~~~   79 (106)
T PF13947_consen    1 KPGCPSSCGNVSIPYPFGIGPGCGRDPGFELTCNNN-TSPPKLLLSSGNYEVLSISYENGTIRVSDPISSNCYSSSSSNS   79 (106)
T ss_pred             CCCCCCccCCEeecCCCccCCCCCCCCCcEEECCCC-CCCceeEecCCcEEEEEEecCCCEEEEEeccccceecCCCCcc
Confidence            478999999999999999999999999999999988 67899999889999999999 99999999999998876 2222


Q ss_pred             ccccceeCCCceEeccCCCEEEEEcccc
Q 044785          107 DSYSPTFSLSKFTVSNTENRFVVIGCDS  134 (346)
Q Consensus       107 ~~~~~~l~~~pf~~s~~~n~~~~~gC~~  134 (346)
                      ....|++.+ ||.+|+.+|+|+++||++
T Consensus        80 ~~~~~~~~~-~~~~s~~~N~~~~~GC~t  106 (106)
T PF13947_consen   80 SNSNLSLNG-PFFFSSSSNKFTVVGCNT  106 (106)
T ss_pred             cccEEeecC-CceEccCCcEEEEECCCC
Confidence            234566666 899998899999999985


No 2  
>PF08488 WAK:  Wall-associated kinase;  InterPro: IPR013695 This domain is found together with the eukaryotic protein kinase domain IPR000719 from INTERPRO in plant wall-associated kinases (WAKs) and related proteins. WAKs are serine-threonine kinases which might be involved in signalling to the cytoplasm and are required for cell expansion []. ; GO: 0004674 protein serine/threonine kinase activity, 0016021 integral to membrane
Probab=99.61  E-value=2.4e-15  Score=118.84  Aligned_cols=73  Identities=36%  Similarity=0.590  Sum_probs=57.8

Q ss_pred             CCCCCCccceeeecCCC-CcceEEEEeeeccCCCcCCCCCcceEEEecCCceeecccccccCCCCC-ccceEEeeeec
Q 044785          163 NGSCVGTGCCQIEIPRG-LKELEVEAFSFNNHTNVSPFNPCTYASVVDKSQFHFSSNYLAWEGTPE-KFPLVLDWEIT  238 (346)
Q Consensus       163 ~~~CsG~gCCq~~ip~~-l~~~~~~~~~~~~~~~~~~~~~c~~a~~~~~~~~~f~~~~~~~~~~~~-~~p~~l~W~i~  238 (346)
                      +++|+|++|||++||.+ ++.|.+++...+....  ..++|++|||+|+.||.+...+.. .+.+. .+||+|+|.|+
T Consensus         1 n~~CsG~~CCQt~iP~~~~qv~~~~i~~~~~~~~--~~~~Ck~AFL~d~~~~~~n~t~p~-~~~~~~y~pv~L~W~i~   75 (103)
T PF08488_consen    1 NSSCSGIGCCQTSIPSGLLQVFNVTIESTDGNNQ--TSNGCKVAFLVDEDWFSSNITDPE-DFHAMGYVPVVLDWFID   75 (103)
T ss_pred             CCccCCcceeccCCCCCCceEEEEEeEecCCCcc--cCCCceEEEEeccccccccCCChH-HhccCCeeEEEEEEEEe
Confidence            36899999999999996 5789999987765332  458999999999999987655432 34443 69999999997


No 3  
>PF07645 EGF_CA:  Calcium-binding EGF domain;  InterPro: IPR001881 A sequence of about forty amino-acid residues found in epidermal growth factor (EGF) has been shown [, , , , , ] to be present in a large number of membrane-bound and extracellular, mostly animal, proteins. Many of these proteins require calcium for their biological function and a calcium-binding site has been found at the N terminus of some EGF-like domains []. Calcium-binding may be crucial for numerous protein-protein interactions. For human coagulation factor IX it has been shown [] that the calcium-ligands form a pentagonal bipyramid. The first, third and fourth conserved negatively charged or polar residues are side chain ligands. The latter is possibly hydroxylated (see aspartic acid and asparagine hydroxylation site) []. A conserved aromatic residue, as well as the second conserved negative residue, are thought to be involved in stabilising the calcium-binding site. As in non-calcium binding EGF-like domains, there are six conserved cysteines and the structure of both types is very similar as calcium-binding induces only strictly local structural changes [].  +------------------+ +---------+ | | | | nxnnC-x(3,14)-C-x(3,7)-CxxbxxxxaxC-x(1,6)-C-x(8,13)-Cx | | +------------------+ 'n': negatively charged or polar residue [DEQN] 'b': possibly beta-hydroxylated residue [DN] 'a': aromatic amino acid 'C': cysteine, involved in disulphide bond 'x': any amino acid. ; GO: 0005509 calcium ion binding; PDB: 2VJ3_A 1TOZ_A 1LMJ_A 1UZQ_A 1UZK_A 1UZJ_B 1UZP_A 1EMO_A 1EMN_A 2RR0_A ....
Probab=98.56  E-value=5.7e-08  Score=64.00  Aligned_cols=37  Identities=35%  Similarity=0.919  Sum_probs=32.6

Q ss_pred             eCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          288 DVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       288 dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      |||||.. ..+.|...+.|+|+.|+|.|.|++||+.+.
T Consensus         1 DidEC~~-~~~~C~~~~~C~N~~Gsy~C~C~~Gy~~~~   37 (42)
T PF07645_consen    1 DIDECAE-GPHNCPENGTCVNTEGSYSCSCPPGYELND   37 (42)
T ss_dssp             ESSTTTT-TSSSSSTTSEEEEETTEEEEEESTTEEECT
T ss_pred             CccccCC-CCCcCCCCCEEEcCCCCEEeeCCCCcEECC
Confidence            7999976 558998889999999999999999999443


No 4  
>KOG1214 consensus Nidogen and related basement membrane protein proteins [Cell wall/membrane/envelope biogenesis; Extracellular structures]
Probab=98.33  E-value=6.7e-07  Score=90.97  Aligned_cols=79  Identities=41%  Similarity=1.053  Sum_probs=61.4

Q ss_pred             cCccC-CCCCCC--CeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecC
Q 044785          242 TCEEA-KICGLN--ASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCR  318 (346)
Q Consensus       242 ~C~~~-~~C~~~--s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~  318 (346)
                      .|++. +.|..+  +.|... +    +..|.|.|.+||.|++..   |.|+|||   +...|++...|.|++|++.|+|.
T Consensus       785 ~Ce~g~h~C~i~g~a~c~~h-G----gs~y~C~CLPGfsGDG~~---c~dvDeC---~psrChp~A~CyntpgsfsC~C~  853 (1289)
T KOG1214|consen  785 PCEDGSHTCAIAGQARCVHH-G----GSTYSCACLPGFSGDGHQ---CTDVDEC---SPSRCHPAATCYNTPGSFSCRCQ  853 (1289)
T ss_pred             ccccCccccCcCCceEEEec-C----CceEEEeecCCccCCccc---ccccccc---CccccCCCceEecCCCcceeecc
Confidence            34443 456633  344333 2    347999999999999766   9999999   45789999999999999999999


Q ss_pred             CCCccCCCCCCCCceeC
Q 044785          319 KGFHGDGTKDGRGCIPN  335 (346)
Q Consensus       319 ~G~~~~~~~~~~~C~~~  335 (346)
                      +||.+++..    |+|.
T Consensus       854 pGy~GDGf~----CVP~  866 (1289)
T KOG1214|consen  854 PGYYGDGFQ----CVPD  866 (1289)
T ss_pred             cCccCCCce----ecCC
Confidence            999999855    7664


No 5  
>KOG1214 consensus Nidogen and related basement membrane protein proteins [Cell wall/membrane/envelope biogenesis; Extracellular structures]
Probab=97.84  E-value=2.2e-05  Score=80.25  Aligned_cols=80  Identities=35%  Similarity=0.897  Sum_probs=66.4

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCCC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDGT  326 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~~  326 (346)
                      ..|..++.|....+     ..|+|.|..||.|.+..   |.|++||+. ..+.|.++..|+|.+|+|+|+|..||...+ 
T Consensus       700 h~cdt~a~C~pg~~-----~~~tcecs~g~~gdgr~---c~d~~eca~-~~~~CGp~s~Cin~pg~~rceC~~gy~F~d-  769 (1289)
T KOG1214|consen  700 HMCDTTARCHPGTG-----VDYTCECSSGYQGDGRN---CVDENECAT-GFHRCGPNSVCINLPGSYRCECRSGYEFAD-  769 (1289)
T ss_pred             cccCCCccccCCCC-----cceEEEEeeccCCCCCC---CCChhhhcc-CCCCCCCCceeecCCCceeEEEeecceecc-
Confidence            34666677876554     37999999999999866   999999976 678899999999999999999999999877 


Q ss_pred             CCCCCceeCCC
Q 044785          327 KDGRGCIPNQN  337 (346)
Q Consensus       327 ~~~~~C~~~~~  337 (346)
                       ++..|++..+
T Consensus       770 -d~~tCV~i~~  779 (1289)
T KOG1214|consen  770 -DRHTCVLITP  779 (1289)
T ss_pred             -CCcceEEecC
Confidence             4567887655


No 6  
>KOG1219 consensus Uncharacterized conserved protein, contains laminin, cadherin and EGF domains [Signal transduction mechanisms]
Probab=97.80  E-value=2.9e-05  Score=86.21  Aligned_cols=67  Identities=34%  Similarity=0.796  Sum_probs=55.2

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      .+|.....|....+      +|.|.|+.||.|..+...   +|+||   +.++|..++.|.|++|+|+|.|-+|+.+..
T Consensus      3909 nPC~~GgtCip~~n------~f~CnC~~gyTG~~Ce~~---Gi~eC---s~n~C~~gg~C~n~~gsf~CncT~g~~gr~ 3975 (4289)
T KOG1219|consen 3909 NPCLTGGTCIPFYN------GFLCNCPNGYTGKRCEAR---GISEC---SKNVCGTGGQCINIPGSFHCNCTPGILGRT 3975 (4289)
T ss_pred             CCCCCCCEEEecCC------CeeEeCCCCccCceeecc---ccccc---ccccccCCceeeccCCceEeccChhHhccc
Confidence            56777778876655      899999999998765432   38999   578999999999999999999999998654


No 7  
>KOG4289 consensus Cadherin EGF LAG seven-pass G-type receptor [Signal transduction mechanisms]
Probab=97.50  E-value=0.0001  Score=79.20  Aligned_cols=53  Identities=32%  Similarity=0.887  Sum_probs=46.6

Q ss_pred             CCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          266 SGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       266 ~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      .+.+|+|++||.|+-..    +.||+|   -..||.+++.|....|+|.|.|++||+|..
T Consensus      1220 nglrCrCPpGFTgd~Ce----TeiDlC---Ys~pC~nng~C~srEggYtCeCrpg~tGeh 1272 (2531)
T KOG4289|consen 1220 NGLRCRCPPGFTGDYCE----TEIDLC---YSGPCGNNGRCRSREGGYTCECRPGFTGEH 1272 (2531)
T ss_pred             CceeEeCCCCCCccccc----chhHhh---hcCCCCCCCceEEecCceeEEecCCccccc
Confidence            58999999999997332    679999   468999999999999999999999999865


No 8  
>KOG1219 consensus Uncharacterized conserved protein, contains laminin, cadherin and EGF domains [Signal transduction mechanisms]
Probab=97.46  E-value=0.00023  Score=79.56  Aligned_cols=72  Identities=28%  Similarity=0.766  Sum_probs=59.7

Q ss_pred             ccCccCCCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCC
Q 044785          241 ETCEEAKICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKG  320 (346)
Q Consensus       241 ~~C~~~~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G  320 (346)
                      +.|.+ .+|.....|.....     +||.|.|+..|.|+-+.    .+++.|   ..+||..++.|.-..++|.|.||.|
T Consensus      3865 d~C~~-npCqhgG~C~~~~~-----ggy~CkCpsqysG~~CE----i~~epC---~snPC~~GgtCip~~n~f~CnC~~g 3931 (4289)
T KOG1219|consen 3865 DPCND-NPCQHGGTCISQPK-----GGYKCKCPSQYSGNHCE----IDLEPC---ASNPCLTGGTCIPFYNGFLCNCPNG 3931 (4289)
T ss_pred             ccccc-CcccCCCEecCCCC-----CceEEeCcccccCcccc----cccccc---cCCCCCCCCEEEecCCCeeEeCCCC
Confidence            34554 57887788876544     49999999999998544    688999   4689999999999999999999999


Q ss_pred             CccCC
Q 044785          321 FHGDG  325 (346)
Q Consensus       321 ~~~~~  325 (346)
                      |+|..
T Consensus      3932 yTG~~ 3936 (4289)
T KOG1219|consen 3932 YTGKR 3936 (4289)
T ss_pred             ccCce
Confidence            99865


No 9  
>smart00179 EGF_CA Calcium-binding EGF-like domain.
Probab=97.43  E-value=0.00022  Score=45.40  Aligned_cols=33  Identities=39%  Similarity=0.982  Sum_probs=28.1

Q ss_pred             eCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCc
Q 044785          288 DVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFH  322 (346)
Q Consensus       288 dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~  322 (346)
                      |++||.. . .+|..++.|.|+.|+|.|.|++||.
T Consensus         1 d~~~C~~-~-~~C~~~~~C~~~~g~~~C~C~~g~~   33 (39)
T smart00179        1 DIDECAS-G-NPCQNGGTCVNTVGSYRCECPPGYT   33 (39)
T ss_pred             CcccCcC-C-CCcCCCCEeECCCCCeEeECCCCCc
Confidence            4788943 2 6898778999999999999999998


No 10 
>KOG4260 consensus Uncharacterized conserved protein [Function unknown]
Probab=97.36  E-value=9.2e-05  Score=66.96  Aligned_cols=52  Identities=35%  Similarity=0.933  Sum_probs=45.2

Q ss_pred             ee-ecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          269 HC-KCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       269 ~C-~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      .| +|+.||...   ..+|.|||||.. ...+|..+..|+|+.|+|.|.+++||...
T Consensus       218 ~C~kCkkGW~ld---e~gCvDvnEC~~-ep~~c~~~qfCvNteGSf~C~dk~Gy~~g  270 (350)
T KOG4260|consen  218 GCSKCKKGWKLD---EEGCVDVNECQN-EPAPCKAHQFCVNTEGSFKCEDKEGYKKG  270 (350)
T ss_pred             Chhhhcccceec---ccccccHHHHhc-CCCCCChhheeecCCCceEecccccccCC
Confidence            35 689999876   458999999976 67899999999999999999999999873


No 11 
>PF14670 FXa_inhibition:  Coagulation Factor Xa inhibitory site; PDB: 3Q3K_B 1NFY_B 1LQD_A 1G2L_B 1IQF_L 2UWP_B 2VH6_B 3KQC_L 2P93_L 2BQW_A ....
Probab=97.33  E-value=0.00015  Score=45.87  Aligned_cols=32  Identities=50%  Similarity=1.156  Sum_probs=24.7

Q ss_pred             CCCCCCCCeeEecCCCeEEecCCCCccCCCCCCCCc
Q 044785          297 LNNCTRTHICDNIPGSYTCRCRKGFHGDGTKDGRGC  332 (346)
Q Consensus       297 ~~~C~~~~~C~nt~G~y~C~C~~G~~~~~~~~~~~C  332 (346)
                      ...|  .+.|+|++|+|+|.|++||..+.  |++.|
T Consensus         5 NGgC--~h~C~~~~g~~~C~C~~Gy~L~~--D~~tC   36 (36)
T PF14670_consen    5 NGGC--SHICVNTPGSYRCSCPPGYKLAE--DGRTC   36 (36)
T ss_dssp             GGGS--SSEEEEETTSEEEE-STTEEE-T--TSSSE
T ss_pred             CCCc--CCCCccCCCceEeECCCCCEECc--CCCCC
Confidence            3567  79999999999999999999887  44544


No 12 
>PF12947 EGF_3:  EGF domain;  InterPro: IPR024731 This entry represents an EGF domain found in the the C terminus of malarial parasite merozoite surface protein 1 [], as well as other proteins.; PDB: 2NPR_A 1N1I_C 1B9W_A 1YO8_A 2RHP_A.
Probab=97.31  E-value=0.00015  Score=45.89  Aligned_cols=31  Identities=42%  Similarity=0.999  Sum_probs=25.0

Q ss_pred             CCCCCCCCCeeEecCCCeEEecCCCCccCCC
Q 044785          296 SLNNCTRTHICDNIPGSYTCRCRKGFHGDGT  326 (346)
Q Consensus       296 ~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~~  326 (346)
                      ..+.|+.+..|.|+.++|.|.|++||.+++.
T Consensus         4 ~~~~C~~nA~C~~~~~~~~C~C~~Gy~GdG~   34 (36)
T PF12947_consen    4 NNGGCHPNATCTNTGGSYTCTCKPGYEGDGF   34 (36)
T ss_dssp             GGGGS-TTCEEEE-TTSEEEEE-CEEECCST
T ss_pred             CCCCCCCCcEeecCCCCEEeECCCCCccCCc
Confidence            4567888999999999999999999999884


No 13 
>PF12662 cEGF:  Complement Clr-like EGF-like
Probab=97.21  E-value=0.00029  Score=40.20  Aligned_cols=24  Identities=38%  Similarity=0.964  Sum_probs=20.1

Q ss_pred             CeeeecCCCcccCCCCCCCceeCcc
Q 044785          267 GYHCKCNEGYEGNPYLSDGCQDVNE  291 (346)
Q Consensus       267 gy~C~C~~Gy~gnp~~~~gC~dide  291 (346)
                      +|+|.|++||..++.. ..|+||||
T Consensus         1 sy~C~C~~Gy~l~~d~-~~C~DIdE   24 (24)
T PF12662_consen    1 SYTCSCPPGYQLSPDG-RSCEDIDE   24 (24)
T ss_pred             CEEeeCCCCCcCCCCC-CccccCCC
Confidence            5899999999977643 57999997


No 14 
>cd00054 EGF_CA Calcium-binding EGF-like domain, present in a large number of membrane-bound and extracellular (mostly animal) proteins. Many of these proteins require calcium for their biological function and calcium-binding sites have been found to be located at the N-terminus of particular EGF-like domains; calcium-binding may be crucial for numerous protein-protein interactions. Six conserved core cysteines form three disulfide bridges as in non calcium-binding EGF domains, whose structures are very similar. EGF_CA can be found in tandem repeat arrangements.
Probab=96.83  E-value=0.0017  Score=40.61  Aligned_cols=35  Identities=40%  Similarity=0.980  Sum_probs=28.4

Q ss_pred             eCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          288 DVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       288 dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      ++++|.. . .+|..++.|.++.++|.|.|++||.+.
T Consensus         1 ~~~~C~~-~-~~C~~~~~C~~~~~~~~C~C~~g~~g~   35 (38)
T cd00054           1 DIDECAS-G-NPCQNGGTCVNTVGSYRCSCPPGYTGR   35 (38)
T ss_pred             CcccCCC-C-CCcCCCCEeECCCCCeEeECCCCCcCC
Confidence            3678842 1 688778899999999999999999863


No 15 
>PF00008 EGF:  EGF-like domain This is a sub-family of the Pfam entry This is a sub-family of the Pfam entry;  InterPro: IPR006209 A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown [, , , , ] to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines vary in length.; GO: 0005515 protein binding; PDB: 1WHE_A 1CCF_A 1APO_A 1WHF_A 2VJ3_A 1TOZ_A 4D90_B 3CFW_A 1EDM_B 1IXA_A ....
Probab=96.81  E-value=0.00092  Score=41.07  Aligned_cols=28  Identities=39%  Similarity=1.041  Sum_probs=25.2

Q ss_pred             CCCCCCCCeeEecC-CCeEEecCCCCccC
Q 044785          297 LNNCTRTHICDNIP-GSYTCRCRKGFHGD  324 (346)
Q Consensus       297 ~~~C~~~~~C~nt~-G~y~C~C~~G~~~~  324 (346)
                      .++|.++++|++.. ++|.|.|++||.+.
T Consensus         3 ~~~C~n~g~C~~~~~~~y~C~C~~G~~G~   31 (32)
T PF00008_consen    3 SNPCQNGGTCIDLPGGGYTCECPPGYTGK   31 (32)
T ss_dssp             TTSSTTTEEEEEESTSEEEEEEBTTEEST
T ss_pred             CCcCCCCeEEEeCCCCCEEeECCCCCccC
Confidence            45898899999999 99999999999875


No 16 
>KOG4260 consensus Uncharacterized conserved protein [Function unknown]
Probab=96.62  E-value=0.0014  Score=59.48  Aligned_cols=80  Identities=26%  Similarity=0.637  Sum_probs=57.5

Q ss_pred             cceEEeeeecccccCccC-------CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCC-
Q 044785          229 FPLVLDWEITTKETCEEA-------KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNC-  300 (346)
Q Consensus       229 ~p~~l~W~i~~~~~C~~~-------~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C-  300 (346)
                      ....-+|.++ ...|.+.       ..|..+..|+|..+      +|.|.+++||++.         +|||.. -...| 
T Consensus       220 ~kCkkGW~ld-e~gCvDvnEC~~ep~~c~~~qfCvNteG------Sf~C~dk~Gy~~g---------~d~C~~-~~d~~~  282 (350)
T KOG4260|consen  220 SKCKKGWKLD-EEGCVDVNECQNEPAPCKAHQFCVNTEG------SFKCEDKEGYKKG---------VDECQF-CADVCA  282 (350)
T ss_pred             hhhcccceec-ccccccHHHHhcCCCCCChhheeecCCC------ceEecccccccCC---------hHHhhh-hhhhcc
Confidence            3456789988 6777654       45777788998877      8999999999863         455532 11223 


Q ss_pred             CCCCeeEecCCCeEEecCCCCccCC
Q 044785          301 TRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       301 ~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      ..+..|.|+.|.|+|.|..|+..-.
T Consensus       283 ~kn~~c~ni~~~~r~v~f~~~~~~~  307 (350)
T KOG4260|consen  283 SKNRPCMNIDGQYRCVCFSGLIIIE  307 (350)
T ss_pred             cCCCCcccCCccEEEEecccceeee
Confidence            2357889999999999999876543


No 17 
>cd01475 vWA_Matrilin VWA_Matrilin: In cartilaginous plate, extracellular matrix molecules mediate cell-matrix and matrix-matrix interactions thereby providing tissue integrity. Some members of the matrilin family are expressed specifically in developing cartilage rudiments. The matrilin family consists of at least four members. All the members of the matrilin family contain VWA domains, EGF-like domains and a heptad repeat coiled-coiled domain at the carboxy terminus which is responsible for the oligomerization of the matrilins. The VWA domains have been shown to be essential for matrilin network formation by interacting with matrix ligands.
Probab=96.25  E-value=0.0041  Score=56.06  Aligned_cols=38  Identities=29%  Similarity=0.687  Sum_probs=33.1

Q ss_pred             CceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          285 GCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       285 gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      -|.+++||.. ..+.|  .+.|.|+.|+|.|.|+.||..++
T Consensus       183 ~C~~~~~C~~-~~~~c--~~~C~~~~g~~~c~c~~g~~~~~  220 (224)
T cd01475         183 ICVVPDLCAT-LSHVC--QQVCISTPGSYLCACTEGYALLE  220 (224)
T ss_pred             cCcCchhhcC-CCCCc--cceEEcCCCCEEeECCCCccCCC
Confidence            5889999965 56788  67999999999999999998765


No 18 
>cd00053 EGF Epidermal growth factor domain, found in epidermal growth factor (EGF) presents in a large number of proteins, mostly animal; the list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied; the functional significance of EGF-like domains in what appear to be unrelated proteins is not yet clear; a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase); the domain includes six cysteine residues which have been shown to be involved in disulfide bonds; the main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet; Subdomains between the conserved cysteines vary in length; the region between the 5th and 6th cysteine contains two conserved glycines of which at  least  one  is  present  in  most EGF-like domains; a subset of these bind calcium.
Probab=96.00  E-value=0.011  Score=36.09  Aligned_cols=28  Identities=46%  Similarity=1.058  Sum_probs=24.6

Q ss_pred             CCCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          297 LNNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       297 ~~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      ..+|.+++.|+++.++|.|.|+.||.++
T Consensus         5 ~~~C~~~~~C~~~~~~~~C~C~~g~~g~   32 (36)
T cd00053           5 SNPCSNGGTCVNTPGSYRCVCPPGYTGD   32 (36)
T ss_pred             CCCCCCCCEEecCCCCeEeECCCCCccc
Confidence            4578767999999999999999999876


No 19 
>PF12947 EGF_3:  EGF domain;  InterPro: IPR024731 This entry represents an EGF domain found in the the C terminus of malarial parasite merozoite surface protein 1 [], as well as other proteins.; PDB: 2NPR_A 1N1I_C 1B9W_A 1YO8_A 2RHP_A.
Probab=95.92  E-value=0.006  Score=38.53  Aligned_cols=30  Identities=37%  Similarity=0.939  Sum_probs=23.1

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYL  282 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~  282 (346)
                      ..|..++.|.+..+      .|.|.|++||.|+++.
T Consensus         6 ~~C~~nA~C~~~~~------~~~C~C~~Gy~GdG~~   35 (36)
T PF12947_consen    6 GGCHPNATCTNTGG------SYTCTCKPGYEGDGFF   35 (36)
T ss_dssp             GGS-TTCEEEE-TT------SEEEEE-CEEECCSTC
T ss_pred             CCCCCCcEeecCCC------CEEeECCCCCccCCcC
Confidence            35778999998876      8999999999998764


No 20 
>KOG4289 consensus Cadherin EGF LAG seven-pass G-type receptor [Signal transduction mechanisms]
Probab=95.83  E-value=0.018  Score=62.88  Aligned_cols=60  Identities=32%  Similarity=0.903  Sum_probs=48.4

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeC---ccCCCCCCCCCCCCCeeEec-CCCeEEecCCC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDV---NECEDPSLNNCTRTHICDNI-PGSYTCRCRKG  320 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~di---deC~~~~~~~C~~~~~C~nt-~G~y~C~C~~G  320 (346)
                      .+|.+|+.|....+      +|+|.|++||.|.-     |+--   ..|   .+..|.+++.|+|. .|++.|.||.|
T Consensus      1245 ~pC~nng~C~srEg------gYtCeCrpg~tGeh-----CEvs~~agrC---vpGvC~nggtC~~~~nggf~c~Cp~g 1308 (2531)
T KOG4289|consen 1245 GPCGNNGRCRSREG------GYTCECRPGFTGEH-----CEVSARAGRC---VPGVCKNGGTCVNLLNGGFCCHCPYG 1308 (2531)
T ss_pred             CCCCCCCceEEecC------ceeEEecCCccccc-----eeeecccCcc---ccceecCCCEEeecCCCceeccCCCc
Confidence            57899999988777      99999999999863     4321   235   45688889999996 67899999999


No 21 
>PF12662 cEGF:  Complement Clr-like EGF-like
Probab=95.78  E-value=0.0099  Score=33.91  Aligned_cols=22  Identities=45%  Similarity=0.952  Sum_probs=17.9

Q ss_pred             CeEEecCCCCccCCCCCCCCceeC
Q 044785          312 SYTCRCRKGFHGDGTKDGRGCIPN  335 (346)
Q Consensus       312 ~y~C~C~~G~~~~~~~~~~~C~~~  335 (346)
                      ||+|.|++||+.++  +++.|+.+
T Consensus         1 sy~C~C~~Gy~l~~--d~~~C~DI   22 (24)
T PF12662_consen    1 SYTCSCPPGYQLSP--DGRSCEDI   22 (24)
T ss_pred             CEEeeCCCCCcCCC--CCCccccC
Confidence            69999999999877  45778754


No 22 
>smart00181 EGF Epidermal growth factor-like domain.
Probab=95.61  E-value=0.018  Score=35.51  Aligned_cols=26  Identities=50%  Similarity=1.187  Sum_probs=22.8

Q ss_pred             CCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          298 NNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       298 ~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      .+|..+ .|.++.++|.|.|++||.++
T Consensus         6 ~~C~~~-~C~~~~~~~~C~C~~g~~g~   31 (35)
T smart00181        6 GPCSNG-TCINTPGSYTCSCPPGYTGD   31 (35)
T ss_pred             CCCCCC-EEECCCCCeEeECCCCCccC
Confidence            678655 99999999999999999874


No 23 
>PF07645 EGF_CA:  Calcium-binding EGF domain;  InterPro: IPR001881 A sequence of about forty amino-acid residues found in epidermal growth factor (EGF) has been shown [, , , , , ] to be present in a large number of membrane-bound and extracellular, mostly animal, proteins. Many of these proteins require calcium for their biological function and a calcium-binding site has been found at the N terminus of some EGF-like domains []. Calcium-binding may be crucial for numerous protein-protein interactions. For human coagulation factor IX it has been shown [] that the calcium-ligands form a pentagonal bipyramid. The first, third and fourth conserved negatively charged or polar residues are side chain ligands. The latter is possibly hydroxylated (see aspartic acid and asparagine hydroxylation site) []. A conserved aromatic residue, as well as the second conserved negative residue, are thought to be involved in stabilising the calcium-binding site. As in non-calcium binding EGF-like domains, there are six conserved cysteines and the structure of both types is very similar as calcium-binding induces only strictly local structural changes [].  +------------------+ +---------+ | | | | nxnnC-x(3,14)-C-x(3,7)-CxxbxxxxaxC-x(1,6)-C-x(8,13)-Cx | | +------------------+ 'n': negatively charged or polar residue [DEQN] 'b': possibly beta-hydroxylated residue [DN] 'a': aromatic amino acid 'C': cysteine, involved in disulphide bond 'x': any amino acid. ; GO: 0005509 calcium ion binding; PDB: 2VJ3_A 1TOZ_A 1LMJ_A 1UZQ_A 1UZK_A 1UZJ_B 1UZP_A 1EMO_A 1EMN_A 2RR0_A ....
Probab=94.06  E-value=0.045  Score=35.65  Aligned_cols=32  Identities=38%  Similarity=0.885  Sum_probs=26.1

Q ss_pred             cCccC-CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccC
Q 044785          242 TCEEA-KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGN  279 (346)
Q Consensus       242 ~C~~~-~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gn  279 (346)
                      .|... ..|..++.|+|..+      +|+|.|++||..+
T Consensus         4 EC~~~~~~C~~~~~C~N~~G------sy~C~C~~Gy~~~   36 (42)
T PF07645_consen    4 ECAEGPHNCPENGTCVNTEG------SYSCSCPPGYELN   36 (42)
T ss_dssp             TTTTTSSSSSTTSEEEEETT------EEEEEESTTEEEC
T ss_pred             ccCCCCCcCCCCCEEEcCCC------CEEeeCCCCcEEC
Confidence            46554 57888899999987      9999999999843


No 24 
>KOG1217 consensus Fibrillins and related proteins containing Ca2+-binding EGF-like domains [Signal transduction mechanisms]
Probab=93.86  E-value=0.095  Score=51.71  Aligned_cols=63  Identities=38%  Similarity=0.934  Sum_probs=51.5

Q ss_pred             CeeeCCCCCCCCCCCeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          253 ASCHKPKDNTTTSSGYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       253 s~C~~~~~~~~~~~gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      ..|.+..+      +|+|.|++||.+...  ..|.++++|.. ... |.+++.|.+..+.|.|.|++||.+..
T Consensus       243 ~~c~~~~~------~~~C~~~~g~~~~~~--~~~~~~~~C~~-~~~-c~~~~~C~~~~~~~~C~C~~g~~g~~  305 (487)
T KOG1217|consen  243 GTCVNTVG------SYTCRCPEGYTGDAC--VTCVDVDSCAL-IAS-CPNGGTCVNVPGSYRCTCPPGFTGRL  305 (487)
T ss_pred             CcccccCC------ceeeeCCCCcccccc--ceeeeccccCC-CCc-cCCCCeeecCCCcceeeCCCCCCCCC
Confidence            66766655      799999999998752  24789999964 323 88889999999999999999999876


No 25 
>PF00008 EGF:  EGF-like domain This is a sub-family of the Pfam entry This is a sub-family of the Pfam entry;  InterPro: IPR006209 A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown [, , , , ] to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines vary in length.; GO: 0005515 protein binding; PDB: 1WHE_A 1CCF_A 1APO_A 1WHF_A 2VJ3_A 1TOZ_A 4D90_B 3CFW_A 1EDM_B 1IXA_A ....
Probab=93.25  E-value=0.044  Score=33.52  Aligned_cols=28  Identities=32%  Similarity=0.882  Sum_probs=22.9

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGN  279 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gn  279 (346)
                      .+|.+++.|++...     .+|.|.|++||.|.
T Consensus         4 ~~C~n~g~C~~~~~-----~~y~C~C~~G~~G~   31 (32)
T PF00008_consen    4 NPCQNGGTCIDLPG-----GGYTCECPPGYTGK   31 (32)
T ss_dssp             TSSTTTEEEEEEST-----SEEEEEEBTTEEST
T ss_pred             CcCCCCeEEEeCCC-----CCEEeECCCCCccC
Confidence            37888899998762     28999999999874


No 26 
>cd00053 EGF Epidermal growth factor domain, found in epidermal growth factor (EGF) presents in a large number of proteins, mostly animal; the list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied; the functional significance of EGF-like domains in what appear to be unrelated proteins is not yet clear; a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase); the domain includes six cysteine residues which have been shown to be involved in disulfide bonds; the main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet; Subdomains between the conserved cysteines vary in length; the region between the 5th and 6th cysteine contains two conserved glycines of which at  least  one  is  present  in  most EGF-like domains; a subset of these bind calcium.
Probab=91.04  E-value=0.39  Score=28.91  Aligned_cols=27  Identities=30%  Similarity=0.879  Sum_probs=22.5

Q ss_pred             CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccC
Q 044785          247 KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGN  279 (346)
Q Consensus       247 ~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gn  279 (346)
                      ..|..++.|.+..+      +|.|.|+.||.++
T Consensus         6 ~~C~~~~~C~~~~~------~~~C~C~~g~~g~   32 (36)
T cd00053           6 NPCSNGGTCVNTPG------SYRCVCPPGYTGD   32 (36)
T ss_pred             CCCCCCCEEecCCC------CeEeECCCCCccc
Confidence            46777789987766      8999999999886


No 27 
>smart00179 EGF_CA Calcium-binding EGF-like domain.
Probab=89.71  E-value=0.58  Score=29.09  Aligned_cols=30  Identities=27%  Similarity=0.832  Sum_probs=22.9

Q ss_pred             cCccCCCCCCCCeeeCCCCCCCCCCCeeeecCCCcc
Q 044785          242 TCEEAKICGLNASCHKPKDNTTTSSGYHCKCNEGYE  277 (346)
Q Consensus       242 ~C~~~~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~  277 (346)
                      .|.....|..++.|.+..+      +|.|.|+.||.
T Consensus         4 ~C~~~~~C~~~~~C~~~~g------~~~C~C~~g~~   33 (39)
T smart00179        4 ECASGNPCQNGGTCVNTVG------SYRCECPPGYT   33 (39)
T ss_pred             cCcCCCCcCCCCEeECCCC------CeEeECCCCCc
Confidence            4543236777778988766      89999999998


No 28 
>KOG1217 consensus Fibrillins and related proteins containing Ca2+-binding EGF-like domains [Signal transduction mechanisms]
Probab=89.26  E-value=0.51  Score=46.48  Aligned_cols=54  Identities=39%  Similarity=0.938  Sum_probs=44.2

Q ss_pred             CeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          267 GYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       267 gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      .++|.|..||.+.+..    .+.++|.. ..++|.+++.|.|..++|.|.|+++|.+..
T Consensus       151 ~~~c~C~~g~~~~~~~----~~~~~C~~-~~~~c~~~~~C~~~~~~~~C~c~~~~~~~~  204 (487)
T KOG1217|consen  151 PFRCSCTEGYEGEPCE----TDLDECIQ-YSSPCQNGGTCVNTGGSYLCSCPPGYTGST  204 (487)
T ss_pred             ceeeeeCCCccccccc----cccccccc-CCCCcCCCcccccCCCCeeEeCCCCccCCc
Confidence            6899999999987644    23378853 456788889999999999999999998865


No 29 
>PF06247 Plasmod_Pvs28:  Plasmodium ookinete surface protein Pvs28;  InterPro: IPR010423 This family consists of several ookinete surface protein (Pvs28) from several species of Plasmodium. Pvs25 and Pvs28 are expressed on the surface of ookinetes. These proteins are potential candidates for vaccine and induce antibodies that block the infectivity of Plasmodium vivax in immunised animals [].; GO: 0009986 cell surface, 0016020 membrane; PDB: 1Z3G_B 1Z1Y_B 1Z27_A.
Probab=88.59  E-value=0.2  Score=43.48  Aligned_cols=62  Identities=29%  Similarity=0.723  Sum_probs=43.0

Q ss_pred             CeeeecCCCcccCCCCCCCceeCccCCCC--CCCCCCCCCeeEecC-----CCeEEecCCCCccCCCCCCCCcee
Q 044785          267 GYHCKCNEGYEGNPYLSDGCQDVNECEDP--SLNNCTRTHICDNIP-----GSYTCRCRKGFHGDGTKDGRGCIP  334 (346)
Q Consensus       267 gy~C~C~~Gy~gnp~~~~gC~dideC~~~--~~~~C~~~~~C~nt~-----G~y~C~C~~G~~~~~~~~~~~C~~  334 (346)
                      .|.|+|.+||..-  ..+.|+...+|..+  ...+|...+.|.+..     ..|.|.|.+||......    |.|
T Consensus        19 HfEC~Cnegfvl~--~EntCE~kv~C~~~e~~~K~Cgdya~C~~~~~~~~~~~~~C~C~~gY~~~~~v----Cvp   87 (197)
T PF06247_consen   19 HFECKCNEGFVLK--NENTCEEKVECDKLENVNKPCGDYAKCINQANKGEERAYKCDCINGYILKQGV----CVP   87 (197)
T ss_dssp             EEEEEESTTEEEE--ETTEEEE----SG-GGTTSEEETTEEEEE-SSTTSSTSEEEEE-TTEEESSSS----EEE
T ss_pred             ceEEEcCCCcEEc--cccccccceecCcccccCccccchhhhhcCCCcccceeEEEecccCceeeCCe----Ech
Confidence            7999999999743  24579998899642  346788789999876     68999999999987644    765


No 30 
>smart00181 EGF Epidermal growth factor-like domain.
Probab=87.54  E-value=0.78  Score=27.92  Aligned_cols=25  Identities=32%  Similarity=0.941  Sum_probs=20.4

Q ss_pred             CCCCCCeeeCCCCCCCCCCCeeeecCCCcccC
Q 044785          248 ICGLNASCHKPKDNTTTSSGYHCKCNEGYEGN  279 (346)
Q Consensus       248 ~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gn  279 (346)
                      .|..+ .|.+..+      +|+|.|+.||.++
T Consensus         7 ~C~~~-~C~~~~~------~~~C~C~~g~~g~   31 (35)
T smart00181        7 PCSNG-TCINTPG------SYTCSCPPGYTGD   31 (35)
T ss_pred             CCCCC-EEECCCC------CeEeECCCCCccC
Confidence            56666 8887755      8999999999985


No 31 
>PF12946 EGF_MSP1_1:  MSP1 EGF domain 1;  InterPro: IPR024730 This EGF-like domain is found at the C terminus of the malaria parasite MSP1 protein. MSP1 is the merozoite surface protein 1. This domain is part of the C-terminal fragment that is proteolytically processed from the the rest of the protein and is left attached to the surface of the invading parasite [].; PDB: 1N1I_C 2FLG_A 1CEJ_A 2NPR_A 1B9W_A 1OB1_F.
Probab=86.26  E-value=0.42  Score=30.20  Aligned_cols=28  Identities=32%  Similarity=0.687  Sum_probs=21.1

Q ss_pred             CCCCCCCeeEecC-CCeEEecCCCCccCC
Q 044785          298 NNCTRTHICDNIP-GSYTCRCRKGFHGDG  325 (346)
Q Consensus       298 ~~C~~~~~C~nt~-G~y~C~C~~G~~~~~  325 (346)
                      ..|..+..|.+.. |++.|+|..||..++
T Consensus         5 ~~cP~NA~C~~~~dG~eecrCllgyk~~~   33 (37)
T PF12946_consen    5 TKCPANAGCFRYDDGSEECRCLLGYKKVG   33 (37)
T ss_dssp             S---TTEEEEEETTSEEEEEE-TTEEEET
T ss_pred             ccCCCCcccEEcCCCCEEEEeeCCccccC
Confidence            4566689999987 999999999998876


No 32 
>cd00054 EGF_CA Calcium-binding EGF-like domain, present in a large number of membrane-bound and extracellular (mostly animal) proteins. Many of these proteins require calcium for their biological function and calcium-binding sites have been found to be located at the N-terminus of particular EGF-like domains; calcium-binding may be crucial for numerous protein-protein interactions. Six conserved core cysteines form three disulfide bridges as in non calcium-binding EGF domains, whose structures are very similar. EGF_CA can be found in tandem repeat arrangements.
Probab=86.04  E-value=0.98  Score=27.56  Aligned_cols=32  Identities=28%  Similarity=0.840  Sum_probs=23.5

Q ss_pred             cCccCCCCCCCCeeeCCCCCCCCCCCeeeecCCCcccC
Q 044785          242 TCEEAKICGLNASCHKPKDNTTTSSGYHCKCNEGYEGN  279 (346)
Q Consensus       242 ~C~~~~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gn  279 (346)
                      .|.....|..++.|.+..+      +|.|.|..||.|.
T Consensus         4 ~C~~~~~C~~~~~C~~~~~------~~~C~C~~g~~g~   35 (38)
T cd00054           4 ECASGNPCQNGGTCVNTVG------SYRCSCPPGYTGR   35 (38)
T ss_pred             cCCCCCCcCCCCEeECCCC------CeEeECCCCCcCC
Confidence            3543235777778987766      8999999999873


No 33 
>PF07974 EGF_2:  EGF-like domain;  InterPro: IPR013111 A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown [, , , , ] to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines vary in length. This entry contains EGF domains found in a variety of extracellular and membrane proteins
Probab=80.80  E-value=2.9  Score=25.48  Aligned_cols=25  Identities=28%  Similarity=0.688  Sum_probs=20.3

Q ss_pred             CCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          298 NNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       298 ~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      ..|.++++|++.  ..+|.|.+||.+.
T Consensus         6 ~~C~~~G~C~~~--~g~C~C~~g~~G~   30 (32)
T PF07974_consen    6 NICSGHGTCVSP--CGRCVCDSGYTGP   30 (32)
T ss_pred             CccCCCCEEeCC--CCEEECCCCCcCC
Confidence            357788999876  4689999999875


No 34 
>PF12661 hEGF:  Human growth factor-like EGF; PDB: 2YGQ_A 2E26_A 3A7Q_A 2YGP_A 2YGO_A 1HRE_A 1HAE_A 1HAF_A 1HRF_A.
Probab=76.08  E-value=1.3  Score=21.30  Aligned_cols=12  Identities=42%  Similarity=1.157  Sum_probs=8.8

Q ss_pred             EEecCCCCccCC
Q 044785          314 TCRCRKGFHGDG  325 (346)
Q Consensus       314 ~C~C~~G~~~~~  325 (346)
                      .|.|++||.|..
T Consensus         1 ~C~C~~G~~G~~   12 (13)
T PF12661_consen    1 TCQCPPGWTGPN   12 (13)
T ss_dssp             EEEE-TTEETTT
T ss_pred             CccCcCCCcCCC
Confidence            489999998753


No 35 
>PF14670 FXa_inhibition:  Coagulation Factor Xa inhibitory site; PDB: 3Q3K_B 1NFY_B 1LQD_A 1G2L_B 1IQF_L 2UWP_B 2VH6_B 3KQC_L 2P93_L 2BQW_A ....
Probab=75.37  E-value=1.9  Score=27.15  Aligned_cols=22  Identities=27%  Similarity=0.730  Sum_probs=16.0

Q ss_pred             CeeeCCCCCCCCCCCeeeecCCCcccCC
Q 044785          253 ASCHKPKDNTTTSSGYHCKCNEGYEGNP  280 (346)
Q Consensus       253 s~C~~~~~~~~~~~gy~C~C~~Gy~gnp  280 (346)
                      ..|.+..+      +|+|.|++||...+
T Consensus        10 h~C~~~~g------~~~C~C~~Gy~L~~   31 (36)
T PF14670_consen   10 HICVNTPG------SYRCSCPPGYKLAE   31 (36)
T ss_dssp             SEEEEETT------SEEEE-STTEEE-T
T ss_pred             CCCccCCC------ceEeECCCCCEECc
Confidence            46777765      89999999998764


No 36 
>KOG1225 consensus Teneurin-1 and related extracellular matrix proteins, contain EGF-like repeats [Signal transduction mechanisms; Extracellular structures]
Probab=74.33  E-value=5.9  Score=40.23  Aligned_cols=24  Identities=29%  Similarity=0.855  Sum_probs=13.1

Q ss_pred             CCCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          298 NNCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       298 ~~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      ..|+++++|+  .|  +|.|.+||.++.
T Consensus       316 adC~g~G~Ci--~G--~C~C~~Gy~G~~  339 (525)
T KOG1225|consen  316 ADCSGHGKCI--DG--ECLCDEGYTGEL  339 (525)
T ss_pred             ccCCCCCccc--CC--ceEeCCCCcCCc
Confidence            3555566665  22  566666665543


No 37 
>PF00954 S_locus_glycop:  S-locus glycoprotein family;  InterPro: IPR000858 In Brassicaceae, self-incompatible plants have a self/non-self recognition system, which involves the inability of flowering plants to achieve self-fertilisation. This is sporophytically controlled by multiple alleles at a single locus (S). There are a total of 50 different S alleles in Brassica oleracea. S-locus glycoproteins, as well as S-receptor kinases, are in linkage with the S-alleles []. Most of the proteins within this family contain apple-like domain (IPR003609 from INTERPRO), which is predicted to possess protein- and/or carbohydrate-binding functions.; GO: 0048544 recognition of pollen
Probab=73.24  E-value=5.9  Score=31.27  Aligned_cols=41  Identities=27%  Similarity=0.725  Sum_probs=30.6

Q ss_pred             ceEEeeeecccccCccCCCCCCCCeeeCCCCCCCCCCCeeeecCCCccc
Q 044785          230 PLVLDWEITTKETCEEAKICGLNASCHKPKDNTTTSSGYHCKCNEGYEG  278 (346)
Q Consensus       230 p~~l~W~i~~~~~C~~~~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~g  278 (346)
                      ...+-|... .+.|+.-..|..++.|... .      ...|.|.+||+.
T Consensus        68 ~W~~~~~~p-~d~Cd~y~~CG~~g~C~~~-~------~~~C~Cl~GF~P  108 (110)
T PF00954_consen   68 SWSVFWSAP-KDQCDVYGFCGPNGICNSN-N------SPKCSCLPGFEP  108 (110)
T ss_pred             cEEEEEEec-ccCCCCccccCCccEeCCC-C------CCceECCCCcCC
Confidence            344567777 6789887789999999432 2      456999999974


No 38 
>PF07172 GRP:  Glycine rich protein family;  InterPro: IPR010800 This family consists of glycine rich proteins. Some of them may be involved in resistance to environmental stress [].
Probab=68.02  E-value=5.1  Score=31.08  Aligned_cols=12  Identities=25%  Similarity=0.119  Sum_probs=5.5

Q ss_pred             CcchhHHHHHHH
Q 044785            1 MPFRLTTKLVVL   12 (346)
Q Consensus         1 M~~~~~~~l~~~   12 (346)
                      |....+++|.++
T Consensus         1 MaSK~~llL~l~   12 (95)
T PF07172_consen    1 MASKAFLLLGLL   12 (95)
T ss_pred             CchhHHHHHHHH
Confidence            654444444333


No 39 
>KOG0994 consensus Extracellular matrix glycoprotein Laminin subunit beta [Extracellular structures]
Probab=65.08  E-value=8.8  Score=42.14  Aligned_cols=64  Identities=36%  Similarity=0.880  Sum_probs=41.6

Q ss_pred             Ceee-ecCCCcccCCCCCC-Ccee-CccCCCCCCCCCCCCCeeEecCCCeEE-ecCCCCccCCCC-CCCCcee
Q 044785          267 GYHC-KCNEGYEGNPYLSD-GCQD-VNECEDPSLNNCTRTHICDNIPGSYTC-RCRKGFHGDGTK-DGRGCIP  334 (346)
Q Consensus       267 gy~C-~C~~Gy~gnp~~~~-gC~d-ideC~~~~~~~C~~~~~C~nt~G~y~C-~C~~G~~~~~~~-~~~~C~~  334 (346)
                      +.+| +|.+||.|.|-..- .|.+ -|+| ++....|   -.|.+..+++.| +|-.||.|++.. .+.+|.|
T Consensus       840 grqCnqCqpG~WgFPeCr~CqCNgHA~~C-d~~tGaC---i~CqD~T~G~~CdrCl~GyyGdP~lg~g~~CrP  908 (1758)
T KOG0994|consen  840 GRQCNQCQPGYWGFPECRPCQCNGHADTC-DPITGAC---IDCQDSTTGHSCDRCLDGYYGDPRLGSGIGCRP  908 (1758)
T ss_pred             hhhccccCCCccCCCcCccccccCccccc-Ccccccc---ccccccccccchhhhhccccCCcccCCCCCCCC
Confidence            5566 68888888774421 2222 3566 3344555   357777888999 899999999976 2344543


No 40 
>PHA03099 epidermal growth factor-like protein (EGF-like protein); Provisional
Probab=63.62  E-value=7.3  Score=31.75  Aligned_cols=37  Identities=22%  Similarity=0.521  Sum_probs=26.0

Q ss_pred             eCccCCCCCCCCCCCCCeeEecC--CCeEEecCCCCccCC
Q 044785          288 DVNECEDPSLNNCTRTHICDNIP--GSYTCRCRKGFHGDG  325 (346)
Q Consensus       288 dideC~~~~~~~C~~~~~C~nt~--G~y~C~C~~G~~~~~  325 (346)
                      +|.+|.....+.|.+ +.|.-..  ..+.|+|+.||.|..
T Consensus        41 ~i~~Cp~ey~~YClH-G~C~yI~dl~~~~CrC~~GYtGeR   79 (139)
T PHA03099         41 AIRLCGPEGDGYCLH-GDCIHARDIDGMYCRCSHGYTGIR   79 (139)
T ss_pred             ccccCChhhCCEeEC-CEEEeeccCCCceeECCCCccccc
Confidence            455665434456763 4887654  788999999999865


No 41 
>PF00954 S_locus_glycop:  S-locus glycoprotein family;  InterPro: IPR000858 In Brassicaceae, self-incompatible plants have a self/non-self recognition system, which involves the inability of flowering plants to achieve self-fertilisation. This is sporophytically controlled by multiple alleles at a single locus (S). There are a total of 50 different S alleles in Brassica oleracea. S-locus glycoproteins, as well as S-receptor kinases, are in linkage with the S-alleles []. Most of the proteins within this family contain apple-like domain (IPR003609 from INTERPRO), which is predicted to possess protein- and/or carbohydrate-binding functions.; GO: 0048544 recognition of pollen
Probab=61.08  E-value=9.1  Score=30.16  Aligned_cols=32  Identities=31%  Similarity=0.751  Sum_probs=23.2

Q ss_pred             ccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          290 NECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       290 deC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      |+|.  .-..|++.+.| +......|.|.+||+..
T Consensus        78 d~Cd--~y~~CG~~g~C-~~~~~~~C~Cl~GF~P~  109 (110)
T PF00954_consen   78 DQCD--VYGFCGPNGIC-NSNNSPKCSCLPGFEPK  109 (110)
T ss_pred             cCCC--CccccCCccEe-CCCCCCceECCCCcCCC
Confidence            4663  24578888999 44556779999999753


No 42 
>KOG1225 consensus Teneurin-1 and related extracellular matrix proteins, contain EGF-like repeats [Signal transduction mechanisms; Extracellular structures]
Probab=57.66  E-value=14  Score=37.60  Aligned_cols=22  Identities=41%  Similarity=1.131  Sum_probs=16.8

Q ss_pred             CCCCCCeeEecCCCeEEecCCCCccCC
Q 044785          299 NCTRTHICDNIPGSYTCRCRKGFHGDG  325 (346)
Q Consensus       299 ~C~~~~~C~nt~G~y~C~C~~G~~~~~  325 (346)
                      .|.+++.|+|  |   |.|..||++..
T Consensus       344 ~C~~~g~cv~--g---C~C~~Gw~G~d  365 (525)
T KOG1225|consen  344 ACSGGGQCVN--G---CKCKKGWRGPD  365 (525)
T ss_pred             ccCCCceecc--C---ceeccCccCCC
Confidence            3777788876  2   99999999755


No 43 
>cd01475 vWA_Matrilin VWA_Matrilin: In cartilaginous plate, extracellular matrix molecules mediate cell-matrix and matrix-matrix interactions thereby providing tissue integrity. Some members of the matrilin family are expressed specifically in developing cartilage rudiments. The matrilin family consists of at least four members. All the members of the matrilin family contain VWA domains, EGF-like domains and a heptad repeat coiled-coiled domain at the carboxy terminus which is responsible for the oligomerization of the matrilins. The VWA domains have been shown to be essential for matrilin network formation by interacting with matrix ligands.
Probab=48.10  E-value=14  Score=33.03  Aligned_cols=22  Identities=32%  Similarity=0.717  Sum_probs=17.3

Q ss_pred             CeeeCCCCCCCCCCCeeeecCCCcccCC
Q 044785          253 ASCHKPKDNTTTSSGYHCKCNEGYEGNP  280 (346)
Q Consensus       253 s~C~~~~~~~~~~~gy~C~C~~Gy~gnp  280 (346)
                      ..|.+..+      .|.|.|+.||..++
T Consensus       199 ~~C~~~~g------~~~c~c~~g~~~~~  220 (224)
T cd01475         199 QVCISTPG------SYLCACTEGYALLE  220 (224)
T ss_pred             ceEEcCCC------CEEeECCCCccCCC
Confidence            35776655      89999999998765


No 44 
>PF05887 Trypan_PARP:  Procyclic acidic repetitive protein (PARP);  InterPro: IPR008882 This family consists of several Trypanosoma brucei procyclic acidic repetitive protein (PARP) like sequences. The procyclic acidic repetitive protein (parp) genes of T. brucei encode a small family of abundant surface proteins whose expression is restricted to the procyclic form of the parasite. They are found at two unlinked loci, parpA and parpB; transcription of both loci is developmentally regulated [].; GO: 0016020 membrane; PDB: 2X34_B 2X32_B.
Probab=43.98  E-value=7.6  Score=31.86  Aligned_cols=18  Identities=33%  Similarity=0.170  Sum_probs=0.0

Q ss_pred             CcchhHHHHHHHHHHHHH
Q 044785            1 MPFRLTTKLVVLLLVLLR   18 (346)
Q Consensus         1 M~~~~~~~l~~~ll~l~~   18 (346)
                      |.++.+.+|.|||+..++
T Consensus         1 m~pr~l~~LavLL~~A~L   18 (143)
T PF05887_consen    1 MTPRHLCLLAVLLFGAAL   18 (143)
T ss_dssp             ------------------
T ss_pred             Cccccccccccccccccc
Confidence            788888888777776443


No 45 
>PF09064 Tme5_EGF_like:  Thrombomodulin like fifth domain, EGF-like;  InterPro: IPR015149 This domain adopts a fold similar to other EGF domains, with a flat major and a twisted minor beta sheet. Disulphide pairing, however, is not of the usual 1-3, 2-4, 5-6 type; rather 1-2, 3-4, 5-6 pairing is found. Its extended major sheet (strands beta-2 and beta-3 and the connecting loop) projects into thrombin's active site groove. This domain is required for interaction of thrombomodulin with thrombin, and subsequent activation of protein-C []. ; GO: 0004888 transmembrane signaling receptor activity, 0016021 integral to membrane
Probab=43.76  E-value=19  Score=22.32  Aligned_cols=13  Identities=31%  Similarity=0.644  Sum_probs=11.4

Q ss_pred             eEEecCCCCccCC
Q 044785          313 YTCRCRKGFHGDG  325 (346)
Q Consensus       313 y~C~C~~G~~~~~  325 (346)
                      ..|.||+||..+.
T Consensus        18 ~~C~CPeGyIlde   30 (34)
T PF09064_consen   18 GQCFCPEGYILDE   30 (34)
T ss_pred             CceeCCCceEecC
Confidence            4899999999876


No 46 
>PF08261 Carcinustatin:  Carcinustatin peptide
Probab=42.08  E-value=13  Score=15.27  Aligned_cols=6  Identities=67%  Similarity=1.558  Sum_probs=4.1

Q ss_pred             cCCCCC
Q 044785           42 PYPFGT   47 (346)
Q Consensus        42 pYPFgi   47 (346)
                      ||-||+
T Consensus         3 py~fgl    8 (8)
T PF08261_consen    3 PYSFGL    8 (8)
T ss_pred             cccccC
Confidence            677764


No 47 
>PF14380 WAK_assoc:  Wall-associated receptor kinase C-terminal
Probab=41.82  E-value=40  Score=25.71  Aligned_cols=39  Identities=23%  Similarity=0.681  Sum_probs=27.5

Q ss_pred             cceEEeeeecccccCccCCCCC-CCCeeeCCCCCCCCCCCeeeecCCC
Q 044785          229 FPLVLDWEITTKETCEEAKICG-LNASCHKPKDNTTTSSGYHCKCNEG  275 (346)
Q Consensus       229 ~p~~l~W~i~~~~~C~~~~~C~-~~s~C~~~~~~~~~~~gy~C~C~~G  275 (346)
                      -.++|+|.+. ...|.   .|. ....|.....    ...+.|.|+.|
T Consensus        54 ~GF~L~w~~~-~~~C~---~C~~SgG~Cgy~~~----~~~f~C~C~dg   93 (94)
T PF14380_consen   54 KGFELEWNAD-SGDCR---ECEASGGRCGYDSN----SEQFTCFCSDG   93 (94)
T ss_pred             cCcEEEEeCC-CCcCc---ChhcCCCEeCCCCC----CceEEEECCCC
Confidence            6889999976 67885   577 5677864433    23678988875


No 48 
>PHA02887 EGF-like protein; Provisional
Probab=40.95  E-value=23  Score=28.40  Aligned_cols=35  Identities=29%  Similarity=0.612  Sum_probs=23.3

Q ss_pred             ccCCCCCCCCCCCCCeeEec--CCCeEEecCCCCccCC
Q 044785          290 NECEDPSLNNCTRTHICDNI--PGSYTCRCRKGFHGDG  325 (346)
Q Consensus       290 deC~~~~~~~C~~~~~C~nt--~G~y~C~C~~G~~~~~  325 (346)
                      ++|...-.+.|. ++.|.-.  .....|.|+.||.|.-
T Consensus        84 ~pC~~eyk~YCi-HG~C~yI~dL~epsCrC~~GYtG~R  120 (126)
T PHA02887         84 EKCKNDFNDFCI-NGECMNIIDLDEKFCICNKGYTGIR  120 (126)
T ss_pred             cccChHhhCEee-CCEEEccccCCCceeECCCCcccCC
Confidence            345432345665 5788665  4458899999998864


No 49 
>KOG0994 consensus Extracellular matrix glycoprotein Laminin subunit beta [Extracellular structures]
Probab=40.34  E-value=29  Score=38.39  Aligned_cols=17  Identities=47%  Similarity=1.181  Sum_probs=15.0

Q ss_pred             Ceee-ecCCCcccCCCCC
Q 044785          267 GYHC-KCNEGYEGNPYLS  283 (346)
Q Consensus       267 gy~C-~C~~Gy~gnp~~~  283 (346)
                      |+.| +|..||.|+|.+.
T Consensus       884 G~~CdrCl~GyyGdP~lg  901 (1758)
T KOG0994|consen  884 GHSCDRCLDGYYGDPRLG  901 (1758)
T ss_pred             ccchhhhhccccCCcccC
Confidence            7888 8999999999874


No 50 
>PF08685 GON:  GON domain;  InterPro: IPR012314 In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:  Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, N-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; N, asparagine; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. In the case of the asparagine endopeptidases, the nucleophile is asparagine and all are self-processing endopeptidases.   In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.  Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases []. The ADAMTSs (a disintegrin and metalloproteinase domain with thrombospondin type-1 modules) are a family of zinc dependent metalloproteinases that play important roles in a variety of normal and pathological conditions. These enzymes show a complex domain organisation including signal sequence, propeptide, metalloproteinase domain (see PDOC50215 from PROSITEDOC), disintegrin-like domain (see PDOC00351 from PROSITEDOC), central TS-1 motif (see PDOC50092 from PROSITEDOC), cysteine-rich region, and a variable number of TS-like repeats at the C-terminal region. The GON domain is an approximately 200-residue module, whose presence is the hallmark of a subfamily of structurally and evolutionarily related ADAMTSs, called GON- ADAMTSs. The GON domain is characterised by the presence of several conserved cysteine residues and is likely to be globular [], []. Some proteins known to contain a GON domain are listed below:  Mammalian ADAMTS-9 Mammalian ADAMTS-20  Caenorhabditis elegans gon-1, a protease required for gonadal morphogenesis   Proteins containing the GON domain belong to MEROPS peptidase subfamily M12B (adamalysin, clan MA).; GO: 0004222 metalloendopeptidase activity, 0008270 zinc ion binding
Probab=34.68  E-value=1.1e+02  Score=27.14  Aligned_cols=59  Identities=19%  Similarity=0.365  Sum_probs=36.0

Q ss_pred             ecccccCC-cccccccceeCCCceEeccCCCEEEEEcccceeeeeeccCCcceeeceEeec
Q 044785           96 AKECYRKG-NSVDSYSPTFSLSKFTVSNTENRFVVIGCDSYAYVRGYLGENRYRAGCMSMC  155 (346)
Q Consensus        96 ~~~c~~~~-~~~~~~~~~l~~~pf~~s~~~n~~~~~gC~~~a~l~~~~~~~~~~~gC~s~C  155 (346)
                      .-+|+... -.....+++|.+++|.|++ .-+++..|......+.-...+.....-|.=+|
T Consensus       126 AGDCyS~~~CpqG~FsIdL~GTgf~vs~-~~~W~~~G~~a~~~i~~s~~~q~v~g~CGGyC  185 (201)
T PF08685_consen  126 AGDCYSAARCPQGRFSIDLRGTGFRVSP-DTKWVTQGNYAVGKINRSPDGQKVSGRCGGYC  185 (201)
T ss_pred             cccccccCCCCCceEEEeeCCCceEecC-CCEEEeCCcEeEEEEEEcCCCcEEEEEeCccC
Confidence            34566541 1222346799999999998 56788889876666542211244444465554


No 51 
>COG2991 Uncharacterized protein conserved in bacteria [Function unknown]
Probab=32.64  E-value=73  Score=23.29  Aligned_cols=16  Identities=25%  Similarity=0.671  Sum_probs=9.3

Q ss_pred             CCCCCccccCCCCCCCCCC
Q 044785           34 SRCGDVEIPYPFGTRPGCF   52 (346)
Q Consensus        34 ~~CG~v~IpYPFgig~~C~   52 (346)
                      -+||.+.-   .||..-|-
T Consensus        31 GSCGGi~a---lGi~K~Cd   46 (77)
T COG2991          31 GSCGGIAA---LGIEKVCD   46 (77)
T ss_pred             cccccHHh---hccchhcC
Confidence            38997742   36655443


No 52 
>PHA02887 EGF-like protein; Provisional
Probab=31.14  E-value=42  Score=26.98  Aligned_cols=35  Identities=34%  Similarity=0.757  Sum_probs=24.9

Q ss_pred             ccCccC--CCCCCCCeeeCCCCCCCCCCCeeeecCCCcccCC
Q 044785          241 ETCEEA--KICGLNASCHKPKDNTTTSSGYHCKCNEGYEGNP  280 (346)
Q Consensus       241 ~~C~~~--~~C~~~s~C~~~~~~~~~~~gy~C~C~~Gy~gnp  280 (346)
                      ..|++.  ..|. |..|.....    .....|.|..||.|..
T Consensus        84 ~pC~~eyk~YCi-HG~C~yI~d----L~epsCrC~~GYtG~R  120 (126)
T PHA02887         84 EKCKNDFNDFCI-NGECMNIID----LDEKFCICNKGYTGIR  120 (126)
T ss_pred             cccChHhhCEee-CCEEEcccc----CCCceeECCCCcccCC
Confidence            357654  4565 678876655    4468999999999863


No 53 
>PF10916 DUF2712:  Protein of unknown function (DUF2712);  InterPro: IPR020208 This entry represents a group of uncharacterised proteins.
Probab=24.31  E-value=77  Score=26.43  Aligned_cols=16  Identities=31%  Similarity=0.600  Sum_probs=13.7

Q ss_pred             CccccCCCCCCCCCCC
Q 044785           38 DVEIPYPFGTRPGCFL   53 (346)
Q Consensus        38 ~v~IpYPFgig~~C~~   53 (346)
                      +-+|+|=|-|++.|.-
T Consensus        32 dn~i~F~F~i~~~~an   47 (146)
T PF10916_consen   32 DNNIPFSFTIKPNQAN   47 (146)
T ss_pred             ccCCceEEEeCCcccc
Confidence            6789999999998875


No 54 
>PRK02710 plastocyanin; Provisional
Probab=23.49  E-value=87  Score=25.00  Aligned_cols=17  Identities=29%  Similarity=0.313  Sum_probs=10.0

Q ss_pred             CcchhHHHHHHHHHHHH
Q 044785            1 MPFRLTTKLVVLLLVLL   17 (346)
Q Consensus         1 M~~~~~~~l~~~ll~l~   17 (346)
                      |+.++.+++..+|++++
T Consensus         1 ~~~~~~~~~~~~~~~~~   17 (119)
T PRK02710          1 MAKRLRSIAAALVAVVS   17 (119)
T ss_pred             CchhHHHHHHHHHHHHH
Confidence            66766666555555443


No 55 
>smart00051 DSL delta serrate ligand.
Probab=22.15  E-value=1.3e+02  Score=21.36  Aligned_cols=46  Identities=24%  Similarity=0.524  Sum_probs=25.3

Q ss_pred             CeeeecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEecCCCeEEecCCCCccC
Q 044785          267 GYHCKCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNIPGSYTCRCRKGFHGD  324 (346)
Q Consensus       267 gy~C~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt~G~y~C~C~~G~~~~  324 (346)
                      .++-.|.++|.|....       ..|..  .+.+..+..|.. .|  .|.|.+||.+.
T Consensus        16 ~~rv~C~~~~yG~~C~-------~~C~~--~~d~~~~~~Cd~-~G--~~~C~~Gw~G~   61 (63)
T smart00051       16 QIRVTCDENYYGEGCN-------KFCRP--RDDFFGHYTCDE-NG--NKGCLEGWMGP   61 (63)
T ss_pred             EEEeeCCCCCcCCccC-------CEeCc--CccccCCccCCc-CC--CEecCCCCcCC
Confidence            3556788888886432       12311  112333556632 33  46789999864


No 56 
>KOG1836 consensus Extracellular matrix glycoprotein Laminin subunits alpha and gamma [Extracellular structures]
Probab=20.60  E-value=90  Score=36.63  Aligned_cols=52  Identities=29%  Similarity=0.701  Sum_probs=30.5

Q ss_pred             Ceee-ecCCCcccCCCCCCCceeCccCCCCCCCCCCCCCeeEec--CCCeEEe-cCCCCccCC
Q 044785          267 GYHC-KCNEGYEGNPYLSDGCQDVNECEDPSLNNCTRTHICDNI--PGSYTCR-CRKGFHGDG  325 (346)
Q Consensus       267 gy~C-~C~~Gy~gnp~~~~gC~dideC~~~~~~~C~~~~~C~nt--~G~y~C~-C~~G~~~~~  325 (346)
                      |-+| +|.+||.|++.. +.-.|   |   ..=+|...+.|..+  .....|. ||+||+|..
T Consensus       755 G~~C~~C~~GfYg~~~~-~~~~d---C---~~C~Cp~~~~~~~~~~~~~~iCk~Cp~gytG~r  810 (1705)
T KOG1836|consen  755 GGQCAQCVDGFYGLPDL-GTSGD---C---QPCPCPNGGACGQTPEILEVVCKNCPPGYTGLR  810 (1705)
T ss_pred             CCchhhhcCCCCCcccc-CCCCC---C---ccCCCCCChhhcCcCcccceecCCCCCCCcccc
Confidence            4456 789999998865 11111   4   22345545555544  3456677 888877643


Done!