Query         016022
Match_columns 396
No_of_seqs    135 out of 194
Neff          6.1 
Searched_HMMs 46136
Date          Fri Mar 29 03:04:44 2013
Command       hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/016022.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/016022hhsearch_cdd -cpu 12 -v 0 

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 PF09402 MSC:  Man1-Src1p-C-ter 100.0 1.1E-61 2.4E-66  482.5   4.8  275   71-349    18-334 (334)
  2 PF12946 EGF_MSP1_1:  MSP1 EGF   95.4   0.006 1.3E-07   42.0   0.9   28   94-121     5-37  (37)
  3 PF01683 EB:  EB module;  Inter  72.4       4 8.6E-05   29.5   2.8   26   92-120    27-52  (52)
  4 PF13314 DUF4083:  Domain of un  64.8      29 0.00063   26.3   6.0   18  261-278    40-57  (58)
  5 PF06667 PspB:  Phage shock pro  62.8      30 0.00066   27.6   6.2   31  242-272    13-51  (75)
  6 PF07127 Nodulin_late:  Late no  58.8      15 0.00033   27.0   3.7   26   73-113    26-52  (54)
  7 PTZ00382 Variant-specific surf  57.4      16 0.00035   30.3   4.0   34   90-123    19-56  (96)
  8 PF06387 Calcyon:  D1 dopamine   55.1      15 0.00032   34.0   3.6   16  109-124   113-128 (186)
  9 PF07645 EGF_CA:  Calcium-bindi  54.7     6.8 0.00015   27.2   1.1   21   95-115    11-35  (42)
 10 COG2976 Uncharacterized protei  51.5      52  0.0011   31.2   6.7   49  227-277    14-65  (207)
 11 KOG0196 Tyrosine kinase, EPH (  45.5      17 0.00036   41.1   2.9   41   74-116   276-319 (996)
 12 PF02009 Rifin_STEVOR:  Rifin/s  45.1      32  0.0007   34.4   4.6   16  248-263   274-289 (299)
 13 PF01102 Glycophorin_A:  Glycop  44.2      33 0.00071   30.0   3.9   20  236-255    68-87  (122)
 14 KOG1214 Nidogen and related ba  41.6      16 0.00035   41.3   2.0   34   91-124   828-867 (1289)
 15 PF06864 PAP_PilO:  Pilin acces  41.5      50  0.0011   34.3   5.6   14  290-303   220-233 (414)
 16 PF12947 EGF_3:  EGF domain;  I  41.3      15 0.00031   25.1   1.1   26   95-120     7-36  (36)
 17 PF04891 NifQ:  NifQ;  InterPro  41.1      30 0.00065   31.8   3.4   16   89-104   152-167 (167)
 18 PF08563 P53_TAD:  P53 transact  40.0      13 0.00027   23.6   0.5   14  176-189     8-21  (25)
 19 TIGR02976 phageshock_pspB phag  39.4 1.3E+02  0.0028   24.0   6.4   28  245-272    16-51  (75)
 20 PF01826 TIL:  Trypsin Inhibito  38.2      17 0.00036   26.5   1.1   26   96-124    27-53  (55)
 21 smart00179 EGF_CA Calcium-bind  37.7      31 0.00067   22.4   2.3   26   94-120     9-38  (39)
 22 PRK09458 pspB phage shock prot  37.2      86  0.0019   25.1   5.0   30  243-272    14-51  (75)
 23 PF07974 EGF_2:  EGF-like domai  36.2      33 0.00072   22.7   2.2   20   95-114     7-28  (32)
 24 PRK11677 hypothetical protein;  34.6 1.5E+02  0.0032   26.3   6.6    7  265-271    52-58  (134)
 25 cd00053 EGF Epidermal growth f  34.0      40 0.00086   20.9   2.3   25   95-120     7-35  (36)
 26 PF06143 Baculo_11_kDa:  Baculo  33.8 2.7E+02  0.0058   22.8   7.9   22  230-251    32-53  (84)
 27 PF10576 EndIII_4Fe-2S:  Iron-s  32.8      19 0.00042   20.6   0.5   14   89-102     4-17  (17)
 28 PF05568 ASFV_J13L:  African sw  32.7 1.1E+02  0.0024   27.7   5.5   10  230-239    26-35  (189)
 29 PRK07597 secE preprotein trans  32.5      82  0.0018   23.9   4.1   28   38-65     25-52  (64)
 30 TIGR00964 secE_bact preprotein  32.0      86  0.0019   23.1   4.1   27   39-65     17-43  (55)
 31 PF00558 Vpu:  Vpu protein;  In  32.0      50  0.0011   26.8   2.9   19  256-274    30-48  (81)
 32 PF07271 Cytadhesin_P30:  Cytad  31.6 1.5E+02  0.0032   29.5   6.6   17  260-276   104-120 (279)
 33 PF06679 DUF1180:  Protein of u  30.9 1.9E+02  0.0042   26.4   6.9   31   35-65     84-114 (163)
 34 PF07543 PGA2:  Protein traffic  30.1 1.5E+02  0.0032   26.5   5.9   12  293-304    62-73  (140)
 35 PF11044 TMEMspv1-c74-12:  Plec  29.7 2.1E+02  0.0046   20.7   5.4   15  237-251     7-21  (49)
 36 COG0690 SecE Preprotein transl  28.3 1.2E+02  0.0026   23.9   4.5   28   38-65     35-62  (73)
 37 KOG4403 Cell surface glycoprot  28.0 2.9E+02  0.0063   29.3   8.3   11  157-167   117-127 (575)
 38 PHA03399 pif3 per os infectivi  27.4      73  0.0016   30.2   3.6   21   94-114    58-86  (200)
 39 KOG0474 Cl- channel CLC-7 and   27.3      90  0.0019   34.6   4.7   24   90-113   396-420 (762)
 40 PF07466 DUF1517:  Protein of u  26.5 1.8E+02  0.0039   29.0   6.5   23   43-65     62-84  (289)
 41 PF14316 DUF4381:  Domain of un  26.0 1.7E+02  0.0036   25.8   5.6   15  269-283    70-84  (146)
 42 PF00584 SecE:  SecE/Sec61-gamm  25.6 1.6E+02  0.0035   21.5   4.6   21   39-59     18-38  (57)
 43 PF09064 Tme5_EGF_like:  Thromb  25.2      55  0.0012   22.3   1.7   21   95-117     7-30  (34)
 44 PF03672 UPF0154:  Uncharacteri  25.1   3E+02  0.0065   21.4   6.0   18  243-260    10-27  (64)
 45 PF06247 Plasmod_Pvs28:  Plasmo  24.9      27 0.00059   32.7   0.3   31   95-125    51-90  (197)
 46 PF08114 PMP1_2:  ATPase proteo  24.0 1.9E+02  0.0042   20.5   4.3   18  246-263    20-37  (43)
 47 PF14991 MLANA:  Protein melan-  23.9      19 0.00041   31.0  -0.8   22  241-262    31-54  (118)
 48 PF01102 Glycophorin_A:  Glycop  23.9      88  0.0019   27.3   3.3   22  237-258    73-94  (122)
 49 PF09402 MSC:  Man1-Src1p-C-ter  23.4      27 0.00059   34.8   0.0   70  262-331    98-174 (334)
 50 PHA02673 ORF109 EEV glycoprote  22.5 1.3E+02  0.0028   27.5   4.1   22   45-66     35-56  (161)
 51 PF12729 4HB_MCP_1:  Four helix  22.1 3.7E+02  0.0081   22.6   7.1   10  297-306    63-72  (181)
 52 PRK15428 putative propanediol   21.7      81  0.0017   28.9   2.7   31  263-301     4-34  (163)
 53 PF12662 cEGF:  Complement Clr-  21.4      52  0.0011   20.5   1.0   16  107-122     4-21  (24)
 54 PF15050 SCIMP:  SCIMP protein   21.3 2.3E+02  0.0049   24.9   5.2   13  230-242     3-15  (133)
 55 PF11392 DUF2877:  Protein of u  21.2      52  0.0011   27.9   1.3   11   35-45      5-15  (110)
 56 PF10500 SR-25:  Nuclear RNA-sp  21.1      66  0.0014   30.9   2.1    9  177-185   159-167 (225)
 57 PF10588 NADH-G_4Fe-4S_3:  NADH  20.9      44 0.00095   23.3   0.7   16   88-103    11-26  (41)
 58 cd00033 CCP Complement control  20.1      61  0.0013   22.5   1.3   20  106-125    26-48  (57)
 59 PRK09400 secE preprotein trans  20.1   2E+02  0.0044   22.0   4.2   19   40-58     27-45  (61)

No 1  
>PF09402 MSC:  Man1-Src1p-C-terminal domain;  InterPro: IPR018996 This entry represents the Inner nuclear membrane proteins MAN1 (also known as LEM domain-containing protein 3) and LEM domain-containing protein 2 (or LEM protein 2). Emerin and MAN1 are LEM domain-containing integral membrane proteins of the vertebrate nuclear envelope []. MAN1 is an integral protein of the inner nuclear membrane which binds to chromatin associated proteins and plays a role in nuclear organisation. The C-terminal nulceoplasmic region forms a DNA binding winged helix and binds to Smad []. LEM protein 2 is an essential protein involved in chromosome segregation and cell division, probably via its interaction with lmn-1, the main component of nuclear lamina. Has some overlapping function with emr-1.; GO: 0005639 integral to nuclear inner membrane; PDB: 2CH0_A.
Probab=100.00  E-value=1.1e-61  Score=482.48  Aligned_cols=275  Identities=28%  Similarity=0.423  Sum_probs=71.3

Q ss_pred             CCCCCCCCCCCCCCC------------CCCCCCCCccCCCCceecCC-eeeeCCCceec-----------CCCcccChhh
Q 016022           71 STSKPFCDSNLLLDS------------PQSPTDSCEPCPSNGECHQG-KLECFHGYRKH-----------GKLCVEDGDI  126 (396)
Q Consensus        71 ~~~~~fCds~~~~~~------------~~~~~p~C~PCPehAiC~~g-~l~C~~gYvl~-----------~~~CV~D~~k  126 (396)
                      +...||||++.+..+            ...++|+|+|||+||+|++| ++.|++||++.           +++||+|+++
T Consensus        18 ~~~vgyC~~~~~~~~~~~~~~~~~~~~~~~~~P~C~pCP~~a~C~~~~~~~C~~~y~~~~~~l~~~g~~p~~~Ci~D~~k   97 (334)
T PF09402_consen   18 KIAVGYCGTESPSPSFADDDISVPDWLLENFKPSCEPCPEHAICYPGLKLECEPGYVLKPSPLSLFGLIPPPKCIPDTEK   97 (334)
T ss_dssp             --------------------------------------------------------------------------------
T ss_pred             cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccHHH
Confidence            468999999972211            14578999999999999999 99999999998           9999999999


Q ss_pred             hHHHHHHHHHHHHHHHHHhhcccccC---CCCcccchhhHHHHhhhhhhhhccCCChHHHHHHHHHHHHHHHhhhhhccc
Q 016022          127 NETAGRLSRWVENRLCRAYAQFLCDG---TGSIWVEENDIWNDLEGHELMKIFELDNPVYLYTKKRTMETVGRYLESRTN  203 (396)
Q Consensus       127 ~~~i~~l~~~i~~~Lr~~~a~~~CG~---~~s~~i~e~dL~~~~~e~~~~k~~~l~~~~fe~l~~~al~~l~~~l~~~~~  203 (396)
                      ++.+.+|++++.++||++||+++||.   ..+.+|+++||++++.+   ++++++++++|+++|..++..+.+.-+..+.
T Consensus        98 ~~~i~~l~~~~~~~Lr~~~a~~~Cg~~~~~~~~~ls~~el~~~~~~---~~~~~~~~~efe~l~~~a~~~L~~~~ei~~~  174 (334)
T PF09402_consen   98 EEKIEELAKKILDELRERNAQYECGDSEDDESPGLSEEELKDILSS---KKSPWISDEEFEELWSAALQELKKNPEIIIR  174 (334)
T ss_dssp             --------------------------------------------------------------------------------
T ss_pred             HHHHHHHHHHHHHHHHHHHhhcccCCCCCCCCCCCcHHHHHHHHHh---ccCccccHHHHHHHHHHHHHHHHhCCcEEEe
Confidence            99999999999999999999999993   34678999999999998   7788999999999999999888743222221


Q ss_pred             ---------CCCceeeecccccccCccCccchhHHH----HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Q 016022          204 ---------SYGMKELKCPELLAEHYKPLSCRIHQW----VSTHALIIVPVCSLLVGCLLLLWKVHRRRYFAIRVEELYH  270 (396)
Q Consensus       204 ---------sn~~~~~k~~~~~S~~~i~l~Crir~~----i~~~~~~i~~~l~llv~i~~l~~~~~r~~~e~~~v~~Lv~  270 (396)
                               .+.........+++++++||+|++++.    +.+++.+++++++++++++++++++++++.++++|++||+
T Consensus       175 ~~~~~~~~~~~~~~~~~~~~s~s~~~lpl~C~~~~~i~~~~~~~~~~i~~~~~~~~~~~~~~~~~~~~~~~~~~v~~lv~  254 (334)
T PF09402_consen  175 DDIINSHSSDDSNEKDKYFRSSSLPYLPLKCRLRRQIRQFISRYRLIILGVLILLLLIKYIRYRYRKRREEKARVEELVK  254 (334)
T ss_dssp             -----------------------------------------------------------------STHHHHHTTTTTTHH
T ss_pred             cccccccccccccCCcEEEEeeCCCccccEEEEehHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Confidence                     001111112224689999999976555    4556667777777777777778888888999999999999


Q ss_pred             HHHHHHHHHHhhhhccCCCCCCcccccccccccCCCCCc--cchhhHHHHHHHHhcCCCcceeeeEEcCceeeeeEEeec
Q 016022          271 QVCEILEENALMSKSVNGECEPWVVASRLRDHLLLPKER--KDPVIWKKVEELVQEDSRVDQYPKLLKGESKVVWEWQVE  348 (396)
Q Consensus       271 ~vl~~L~~q~~~~~~~~~~~~pyl~~~qLRD~LL~~~~r--~r~~LW~kV~k~Ve~nSnIrt~~~ei~GE~~~vWEWig~  348 (396)
                      +|+++|++|+..+ ..+..++|||+++||||+||.++++  ++++||++|+++||+|||||++++|+|||+|+||||||+
T Consensus       255 ~ii~~L~~~~~~~-~~~~~~~p~v~~~qLRD~ll~~~~~~~~~~~lW~~v~~~ve~ns~Vr~~~~e~~Ge~~~vWeWig~  333 (334)
T PF09402_consen  255 KIIDRLQDQARAS-DPNSSPEPYVSISQLRDDLLPPEHRLKRRNRLWKKVVKKVEENSNVRTEVREVHGEIMRVWEWIGP  333 (334)
T ss_dssp             HHHHHHHHHHHHH-TTSS-S-S-B-HHHHHHTT--STTGGG-GHHHHHHHHHHHTT---SEEEEEEETTEEEEEEE----
T ss_pred             HHHHHHHHHhhhh-ccCCCCCCCccHHHHHHHhCCcccCHHHHHHHHHHHHHHHHcCCCeeEEEEEECCeEEEEEEecCC
Confidence            9999999999843 3446789999999999999987653  379999999999999999999999999999999999997


Q ss_pred             C
Q 016022          349 G  349 (396)
Q Consensus       349 ~  349 (396)
                      +
T Consensus       334 ~  334 (334)
T PF09402_consen  334 N  334 (334)
T ss_dssp             -
T ss_pred             C
Confidence            5


No 2  
>PF12946 EGF_MSP1_1:  MSP1 EGF domain 1;  InterPro: IPR024730 This EGF-like domain is found at the C terminus of the malaria parasite MSP1 protein. MSP1 is the merozoite surface protein 1. This domain is part of the C-terminal fragment that is proteolytically processed from the the rest of the protein and is left attached to the surface of the invading parasite [].; PDB: 1N1I_C 2FLG_A 1CEJ_A 2NPR_A 1B9W_A 1OB1_F.
Probab=95.43  E-value=0.006  Score=42.00  Aligned_cols=28  Identities=39%  Similarity=0.954  Sum_probs=20.2

Q ss_pred             ccCCCCceecCC----e-eeeCCCceecCCCcc
Q 016022           94 EPCPSNGECHQG----K-LECFHGYRKHGKLCV  121 (396)
Q Consensus        94 ~PCPehAiC~~g----~-l~C~~gYvl~~~~CV  121 (396)
                      ++||+||-|+.+    + -.|..||.+.+.+|+
T Consensus         5 ~~cP~NA~C~~~~dG~eecrCllgyk~~~~~C~   37 (37)
T PF12946_consen    5 TKCPANAGCFRYDDGSEECRCLLGYKKVGGKCV   37 (37)
T ss_dssp             S---TTEEEEEETTSEEEEEE-TTEEEETTEEE
T ss_pred             ccCCCCcccEEcCCCCEEEEeeCCccccCCCcC
Confidence            589999999864    3 399999999999886


No 3  
>PF01683 EB:  EB module;  InterPro: IPR006149  The EB domain has no known function. It is found in several Caenorhabditis sp. and Drosophila sp. proteins. The domain contains 8 conserved cysteines that probably form four disulphide bridges and is found associated with kunitz domains IPR002223 from INTERPRO 
Probab=72.44  E-value=4  Score=29.52  Aligned_cols=26  Identities=31%  Similarity=0.782  Sum_probs=23.0

Q ss_pred             CCccCCCCceecCCeeeeCCCceecCCCc
Q 016022           92 SCEPCPSNGECHQGKLECFHGYRKHGKLC  120 (396)
Q Consensus        92 ~C~PCPehAiC~~g~l~C~~gYvl~~~~C  120 (396)
                      +|+   .++.|.+|.=.|.+||+....+|
T Consensus        27 qC~---~~s~C~~g~C~C~~g~~~~~~~C   52 (52)
T PF01683_consen   27 QCI---GGSVCVNGRCQCPPGYVEVGGRC   52 (52)
T ss_pred             CCC---CcCEEcCCEeECCCCCEecCCCC
Confidence            555   99999998889999999988877


No 4  
>PF13314 DUF4083:  Domain of unknown function (DUF4083)
Probab=64.77  E-value=29  Score=26.34  Aligned_cols=18  Identities=22%  Similarity=0.289  Sum_probs=11.1

Q ss_pred             HHHHHHHHHHHHHHHHHH
Q 016022          261 FAIRVEELYHQVCEILEE  278 (396)
Q Consensus       261 e~~~v~~Lv~~vl~~L~~  278 (396)
                      ....+++=.+.+++.|++
T Consensus        40 ~~~~~eqKLDrIIeLLEK   57 (58)
T PF13314_consen   40 DVDSMEQKLDRIIELLEK   57 (58)
T ss_pred             chhHHHHHHHHHHHHHcc
Confidence            333566666677777754


No 5  
>PF06667 PspB:  Phage shock protein B;  InterPro: IPR009554 This family consists of several bacterial phage shock protein B (PspB) sequences. The phage shock protein (psp) operon is induced in response to heat, ethanol, osmotic shock and infection by filamentous bacteriophages []. Expression of the operon requires the alternative sigma factor sigma54 and the transcriptional activator PspF. In addition, PspA plays a negative regulatory role, and the integral-membrane proteins PspB and PspC play a positive one [].; GO: 0006355 regulation of transcription, DNA-dependent, 0009271 phage shock
Probab=62.82  E-value=30  Score=27.60  Aligned_cols=31  Identities=26%  Similarity=0.323  Sum_probs=18.2

Q ss_pred             HHHHHHHHHH-HHHHHHHH-------HHHHHHHHHHHHH
Q 016022          242 CSLLVGCLLL-LWKVHRRR-------YFAIRVEELYHQV  272 (396)
Q Consensus       242 l~llv~i~~l-~~~~~r~~-------~e~~~v~~Lv~~v  272 (396)
                      .+++|+..|+ .+|..+++       .+.++.++|++.+
T Consensus        13 f~ifVap~WL~lHY~sk~~~~~gLs~~d~~~L~~L~~~a   51 (75)
T PF06667_consen   13 FMIFVAPIWLILHYRSKWKSSQGLSEEDEQRLQELYEQA   51 (75)
T ss_pred             HHHHHHHHHHHHHHHHhcccCCCCCHHHHHHHHHHHHHH
Confidence            3445555555 56665554       4666677777765


No 6  
>PF07127 Nodulin_late:  Late nodulin protein;  InterPro: IPR009810 This family consists of several plant specific late nodulin sequences which are homologous to the Pisum sativum (Garden pea) ENOD3 protein. ENOD3 is expressed in the late stages of root nodule formation and contains two pairs of cysteine residues toward the proteins C terminus which may be involved in metal-binding [].; GO: 0046872 metal ion binding, 0009878 nodule morphogenesis
Probab=58.76  E-value=15  Score=27.03  Aligned_cols=26  Identities=19%  Similarity=0.586  Sum_probs=18.8

Q ss_pred             CCCCCCCCCCCCCCCCCCCCCccCCCCceecCC-eeeeCCCc
Q 016022           73 SKPFCDSNLLLDSPQSPTDSCEPCPSNGECHQG-KLECFHGY  113 (396)
Q Consensus        73 ~~~fCds~~~~~~~~~~~p~C~PCPehAiC~~g-~l~C~~gY  113 (396)
                      ....|.++.             .||.+  |..+ ..+|..|+
T Consensus        26 ~~~~C~~d~-------------DCp~~--c~~~~~~kCi~~~   52 (54)
T PF07127_consen   26 AIIPCKTDS-------------DCPKD--CPPPFIPKCINNI   52 (54)
T ss_pred             CCcccCccc-------------cCCCC--CCCCcCcEeCcCC
Confidence            457888874             78888  8777 45887663


No 7  
>PTZ00382 Variant-specific surface protein (VSP); Provisional
Probab=57.42  E-value=16  Score=30.32  Aligned_cols=34  Identities=24%  Similarity=0.574  Sum_probs=23.5

Q ss_pred             CCCCccCCC--CceecCCee--eeCCCceecCCCcccC
Q 016022           90 TDSCEPCPS--NGECHQGKL--ECFHGYRKHGKLCVED  123 (396)
Q Consensus        90 ~p~C~PCPe--hAiC~~g~l--~C~~gYvl~~~~CV~D  123 (396)
                      ...|.+||.  =+.|.....  .|..||.+..+.|+.+
T Consensus        19 ~~~C~~C~~~~C~~C~~~~~C~~C~~GY~~~~~~Cv~~   56 (96)
T PTZ00382         19 GSGCVLCSVGNCKSCVVDGVCGECNSGFSLDNGKCVSS   56 (96)
T ss_pred             CCcCCcCCCCCCcCCCCCCccccCcCCcccCCCccccc
Confidence            346999985  234433322  7999999999989863


No 8  
>PF06387 Calcyon:  D1 dopamine receptor-interacting protein (calcyon);  InterPro: IPR009431 This family consists of several D1 dopamine receptor-interacting (calcyon) proteins. D1/D5 dopamine receptors in the basal ganglia, hippocampus, and cerebral cortex modulate motor, reward, and cognitive behaviour. D1-like dopamine receptors likely modulate neocortical and hippocampal neuronal excitability and synaptic function via Ca2+ as well as cAMP-dependent signalling []. Defective calcyon proteins have been implicated in both attention-deficit/hyperactivity disorder (ADHD) [] and schizophrenia.; GO: 0050780 dopamine receptor binding, 0007212 dopamine receptor signaling pathway, 0016021 integral to membrane
Probab=55.05  E-value=15  Score=33.98  Aligned_cols=16  Identities=25%  Similarity=0.455  Sum_probs=11.5

Q ss_pred             eCCCceecCCCcccCh
Q 016022          109 CFHGYRKHGKLCVEDG  124 (396)
Q Consensus       109 C~~gYvl~~~~CV~D~  124 (396)
                      |-+||++..+.|+|-+
T Consensus       113 CPdGFv~khk~C~P~~  128 (186)
T PF06387_consen  113 CPDGFVLKHKRCTPLT  128 (186)
T ss_pred             CCCcceeecccccchh
Confidence            4458888888888744


No 9  
>PF07645 EGF_CA:  Calcium-binding EGF domain;  InterPro: IPR001881 A sequence of about forty amino-acid residues found in epidermal growth factor (EGF) has been shown [, , , , , ] to be present in a large number of membrane-bound and extracellular, mostly animal, proteins. Many of these proteins require calcium for their biological function and a calcium-binding site has been found at the N terminus of some EGF-like domains []. Calcium-binding may be crucial for numerous protein-protein interactions. For human coagulation factor IX it has been shown [] that the calcium-ligands form a pentagonal bipyramid. The first, third and fourth conserved negatively charged or polar residues are side chain ligands. The latter is possibly hydroxylated (see aspartic acid and asparagine hydroxylation site) []. A conserved aromatic residue, as well as the second conserved negative residue, are thought to be involved in stabilising the calcium-binding site. As in non-calcium binding EGF-like domains, there are six conserved cysteines and the structure of both types is very similar as calcium-binding induces only strictly local structural changes [].  +------------------+ +---------+ | | | | nxnnC-x(3,14)-C-x(3,7)-CxxbxxxxaxC-x(1,6)-C-x(8,13)-Cx | | +------------------+ 'n': negatively charged or polar residue [DEQN] 'b': possibly beta-hydroxylated residue [DN] 'a': aromatic amino acid 'C': cysteine, involved in disulphide bond 'x': any amino acid. ; GO: 0005509 calcium ion binding; PDB: 2VJ3_A 1TOZ_A 1LMJ_A 1UZQ_A 1UZK_A 1UZJ_B 1UZP_A 1EMO_A 1EMN_A 2RR0_A ....
Probab=54.72  E-value=6.8  Score=27.15  Aligned_cols=21  Identities=38%  Similarity=0.911  Sum_probs=17.6

Q ss_pred             cCCCCceecCC--ee--eeCCCcee
Q 016022           95 PCPSNGECHQG--KL--ECFHGYRK  115 (396)
Q Consensus        95 PCPehAiC~~g--~l--~C~~gYvl  115 (396)
                      +|+.++.|.+-  ..  .|.+||..
T Consensus        11 ~C~~~~~C~N~~Gsy~C~C~~Gy~~   35 (42)
T PF07645_consen   11 NCPENGTCVNTEGSYSCSCPPGYEL   35 (42)
T ss_dssp             SSSTTSEEEEETTEEEEEESTTEEE
T ss_pred             cCCCCCEEEcCCCCEEeeCCCCcEE
Confidence            68999999876  33  99999994


No 10 
>COG2976 Uncharacterized protein conserved in bacteria [Function unknown]
Probab=51.54  E-value=52  Score=31.23  Aligned_cols=49  Identities=12%  Similarity=0.346  Sum_probs=24.1

Q ss_pred             hHHHHHHHHHHHHHHHHHHHHHH-HHHHHHH-HHHHHH-HHHHHHHHHHHHHHH
Q 016022          227 IHQWVSTHALIIVPVCSLLVGCL-LLLWKVH-RRRYFA-IRVEELYHQVCEILE  277 (396)
Q Consensus       227 ir~~i~~~~~~i~~~l~llv~i~-~l~~~~~-r~~~e~-~~v~~Lv~~vl~~L~  277 (396)
                      ++.|++.+-..++.  ++++|+. ++-|.++ .++.++ +.....|+.+++.++
T Consensus        14 ik~wwkeNGk~li~--gviLg~~~lfGW~ywq~~q~~q~~~AS~~Y~~~i~~~~   65 (207)
T COG2976          14 IKDWWKENGKALIV--GVILGLGGLFGWRYWQSHQVEQAQEASAQYQNAIKAVQ   65 (207)
T ss_pred             HHHHHHHCCchhHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHh
Confidence            56676666543322  2222222 3345554 333332 245667777777763


No 11 
>KOG0196 consensus Tyrosine kinase, EPH (ephrin) receptor family [Signal transduction mechanisms]
Probab=45.50  E-value=17  Score=41.11  Aligned_cols=41  Identities=24%  Similarity=0.566  Sum_probs=28.5

Q ss_pred             CCCCCCCCCCCCCCCCCCCCccCCCCcee-cCC-ee-eeCCCceec
Q 016022           74 KPFCDSNLLLDSPQSPTDSCEPCPSNGEC-HQG-KL-ECFHGYRKH  116 (396)
Q Consensus        74 ~~fCds~~~~~~~~~~~p~C~PCPehAiC-~~g-~l-~C~~gYvl~  116 (396)
                      ..-|..+.  -+.......|.|||+|.+= ..| .. .|..||-..
T Consensus       276 C~aCp~G~--yK~~~~~~~C~~CP~~S~s~~ega~~C~C~~gyyRA  319 (996)
T KOG0196|consen  276 CQACPPGT--YKASQGDSLCLPCPPNSHSSSEGATSCTCENGYYRA  319 (996)
T ss_pred             ceeCCCCc--ccCCCCCCCCCCCCCCCCCCCCCCCcccccCCcccC
Confidence            33455543  1233456789999999998 556 55 999999983


No 12 
>PF02009 Rifin_STEVOR:  Rifin/stevor family;  InterPro: IPR002858 Malaria is still a major cause of mortality in many areas of the world. Plasmodium falciparum causes the most severe human form of the disease and is responsible for most fatalities. Severe cases of malaria can occur when the parasite invades and then proliferates within red blood cell erythrocytes. The parasite produces many variant antigenic proteins, encoded by multigene families, which are present on the surface of the infected erythrocyte and play important roles in virulence. A crucial survival mechanism for the malaria parasite is its ability to evade the immune response by switching these variant surface antigens. The high virulence of P. falciparum relative to other malarial parasites is in large part due to the fact that in this organism many of these surface antigens mediate the binding of infected erythrocytes to the vascular endothelium (cytoadherence) and non-infected erythrocytes (rosetting). This can lead to the accumulation of infected cells in the vasculature of a variety of organs, blocking the blood flow and reducing the oxygen supply. Clinical symptoms of severe infection can include fever, progressive anaemia, multi-organ dysfunction and coma. For more information see []. Several multicopy gene families have been described in Plasmodium falciparum, including the stevor family of subtelomeric open reading frames and the rif interspersed repetitive elements. Both families contain three predicted transmembrane segments. It has been proposed that stevor and rif are members of a larger superfamily that code for variant surface antigens [].
Probab=45.05  E-value=32  Score=34.44  Aligned_cols=16  Identities=13%  Similarity=0.358  Sum_probs=8.5

Q ss_pred             HHHHHHHHHHHHHHHH
Q 016022          248 CLLLLWKVHRRRYFAI  263 (396)
Q Consensus       248 i~~l~~~~~r~~~e~~  263 (396)
                      |.||+++|||+++.+.
T Consensus       274 IIYLILRYRRKKKmkK  289 (299)
T PF02009_consen  274 IIYLILRYRRKKKMKK  289 (299)
T ss_pred             HHHHHHHHHHHhhhhH
Confidence            4455566666554443


No 13 
>PF01102 Glycophorin_A:  Glycophorin A;  InterPro: IPR001195 Proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Glycophorin A (PAS-2) and glycophorin B (PAS-3) belong to the MNS blood group system and are associated with antigens that include M/N, S/s, U, He, Mi(a), M(c), Vw, Mur, M(g), Vr, M(e), Mt(a), St(a), Ri(a), Cl(a), Ny(a), Hut, Hil, M(v), Far, Mit, Dantu, Hop, Nob, En(a), ENKT, amongst others. Glycophorin A is the major sialoglycoprotein of the erythrocyte membrane []. Structurally, glycophorin A consists of an N-terminal extracellular domain, heavily glycosylated on serine and threonine residues, followed by a transmembrane region and a C-terminal cytoplasmic domain. Other glycophorins in this entry such as Glycophorin B and Glycophorin E represent minor sialoglycoproteins in the erythrocyte membrane.; GO: 0016021 integral to membrane; PDB: 2KPF_B 1AFO_B 2KPE_A.
Probab=44.19  E-value=33  Score=29.95  Aligned_cols=20  Identities=30%  Similarity=0.333  Sum_probs=9.4

Q ss_pred             HHHHHHHHHHHHHHHHHHHH
Q 016022          236 LIIVPVCSLLVGCLLLLWKV  255 (396)
Q Consensus       236 ~~i~~~l~llv~i~~l~~~~  255 (396)
                      .+|+++++.++|+.+++.|+
T Consensus        68 ~Ii~gv~aGvIg~Illi~y~   87 (122)
T PF01102_consen   68 GIIFGVMAGVIGIILLISYC   87 (122)
T ss_dssp             HHHHHHHHHHHHHHHHHHHH
T ss_pred             ehhHHHHHHHHHHHHHHHHH
Confidence            34455554445544444443


No 14 
>KOG1214 consensus Nidogen and related basement membrane protein proteins [Cell wall/membrane/envelope biogenesis; Extracellular structures]
Probab=41.62  E-value=16  Score=41.25  Aligned_cols=34  Identities=35%  Similarity=0.842  Sum_probs=29.4

Q ss_pred             CCCcc--CCCCceecCC--ee--eeCCCceecCCCcccCh
Q 016022           91 DSCEP--CPSNGECHQG--KL--ECFHGYRKHGKLCVEDG  124 (396)
Q Consensus        91 p~C~P--CPehAiC~~g--~l--~C~~gYvl~~~~CV~D~  124 (396)
                      ++|.|  |=++|.||+.  .+  +|.+||.--+-.||||+
T Consensus       828 DeC~psrChp~A~CyntpgsfsC~C~pGy~GDGf~CVP~~  867 (1289)
T KOG1214|consen  828 DECSPSRCHPAATCYNTPGSFSCRCQPGYYGDGFQCVPDT  867 (1289)
T ss_pred             cccCccccCCCceEecCCCcceeecccCccCCCceecCCC
Confidence            67776  9999999987  33  99999999999999993


No 15 
>PF06864 PAP_PilO:  Pilin accessory protein (PilO);  InterPro: IPR009663 This family consists of several enterobacterial PilO proteins. The function of PilO is unknown although it has been suggested that it is a cytoplasmic protein in the absence of other Pil proteins, but PilO protein is translocated to the outer membrane in the presence of other Pil proteins. Alternatively, PilO protein may form a complex with other Pil protein(s). PilO has been predicted to function as a component of the pilin transport apparatus and thin-pilus basal body []. This family does not seem to be related to IPR007445 from INTERPRO.
Probab=41.45  E-value=50  Score=34.26  Aligned_cols=14  Identities=21%  Similarity=0.581  Sum_probs=9.6

Q ss_pred             CCCccccccccccc
Q 016022          290 CEPWVVASRLRDHL  303 (396)
Q Consensus       290 ~~pyl~~~qLRD~L  303 (396)
                      ++||...+..-+.|
T Consensus       220 ~~PW~~~P~~~~fl  233 (414)
T PF06864_consen  220 PHPWAKQPSVQAFL  233 (414)
T ss_pred             CCCcccCCCHHHHH
Confidence            56888777666654


No 16 
>PF12947 EGF_3:  EGF domain;  InterPro: IPR024731 This entry represents an EGF domain found in the the C terminus of malarial parasite merozoite surface protein 1 [], as well as other proteins.; PDB: 2NPR_A 1N1I_C 1B9W_A 1YO8_A 2RHP_A.
Probab=41.33  E-value=15  Score=25.05  Aligned_cols=26  Identities=31%  Similarity=0.797  Sum_probs=17.4

Q ss_pred             cCCCCceecCC--ee--eeCCCceecCCCc
Q 016022           95 PCPSNGECHQG--KL--ECFHGYRKHGKLC  120 (396)
Q Consensus        95 PCPehAiC~~g--~l--~C~~gYvl~~~~C  120 (396)
                      .|=+||.|.+-  .+  .|.+||.--+..|
T Consensus         7 ~C~~nA~C~~~~~~~~C~C~~Gy~GdG~~C   36 (36)
T PF12947_consen    7 GCHPNATCTNTGGSYTCTCKPGYEGDGFFC   36 (36)
T ss_dssp             GS-TTCEEEE-TTSEEEEE-CEEECCSTCE
T ss_pred             CCCCCcEeecCCCCEEeECCCCCccCCcCC
Confidence            67889999876  44  9999998655544


No 17 
>PF04891 NifQ:  NifQ;  InterPro: IPR006975 NifQ is involved in early stages of the biosynthesis of the iron-molybdenum cofactor (FeMo-co) [], which is an integral part of the active site of dinitrogenase []. The conserved C-terminal cysteine residues may be involved in metal binding [].; GO: 0030151 molybdenum ion binding, 0009399 nitrogen fixation
Probab=41.12  E-value=30  Score=31.79  Aligned_cols=16  Identities=31%  Similarity=0.746  Sum_probs=14.0

Q ss_pred             CCCCCccCCCCceecC
Q 016022           89 PTDSCEPCPSNGECHQ  104 (396)
Q Consensus        89 ~~p~C~PCPehAiC~~  104 (396)
                      ..|+|.-|.+++.||.
T Consensus       152 ~aPsC~~C~D~~~CFG  167 (167)
T PF04891_consen  152 RAPSCEECSDYAVCFG  167 (167)
T ss_pred             CCCCCCCcCCHhhcCC
Confidence            3589999999999984


No 18 
>PF08563 P53_TAD:  P53 transactivation motif;  InterPro: IPR013872  The binding of this protein by regulatory proteins regulates p53 transcription activation. This entry is comprised of a single amphipathic alpha helix and contains a highly conserved motif [, ]. ; GO: 0005515 protein binding; PDB: 1YCQ_B 2Z5T_R 3DAB_B 3DAC_B 2Z5S_Q 2K8F_B 2L14_B 1YCR_B.
Probab=40.02  E-value=13  Score=23.58  Aligned_cols=14  Identities=7%  Similarity=0.007  Sum_probs=9.9

Q ss_pred             cCCChHHHHHHHHH
Q 016022          176 FELDNPVYLYTKKR  189 (396)
Q Consensus       176 ~~l~~~~fe~l~~~  189 (396)
                      +-|+++.|++||+.
T Consensus         8 ~PLSQeTF~~LW~~   21 (25)
T PF08563_consen    8 LPLSQETFSDLWNL   21 (25)
T ss_dssp             ---STCCHHHHHHT
T ss_pred             CCccHHHHHHHHHh
Confidence            45889999999974


No 19 
>TIGR02976 phageshock_pspB phage shock protein B. This model describes the PspB protein of the psp (phage shock protein) operon, as found in Escherichia coli and many related species. Expression of a phage protein called secretin protein IV, and a number of other stresses including ethanol, heat shock, and defects in protein secretion trigger sigma-54-dependent expression of the phage shock regulon. PspB is both a regulator and an effector protein of the phage shock response.
Probab=39.42  E-value=1.3e+02  Score=24.01  Aligned_cols=28  Identities=29%  Similarity=0.338  Sum_probs=14.9

Q ss_pred             HHHHHHH-HHHHHHHH-------HHHHHHHHHHHHH
Q 016022          245 LVGCLLL-LWKVHRRR-------YFAIRVEELYHQV  272 (396)
Q Consensus       245 lv~i~~l-~~~~~r~~-------~e~~~v~~Lv~~v  272 (396)
                      +++..|+ .+|..+++       .+.++..+|++.+
T Consensus        16 fVap~wl~lHY~~k~~~~~~ls~~d~~~L~~L~~~a   51 (75)
T TIGR02976        16 FVAPLWLILHYRSKRKTAASLSTDDQALLQELYAKA   51 (75)
T ss_pred             HHHHHHHHHHHHhhhccCCCCCHHHHHHHHHHHHHH
Confidence            3444444 55554443       3555666776654


No 20 
>PF01826 TIL:  Trypsin Inhibitor like cysteine rich domain;  InterPro: IPR002919 This domain is found in proteinase inhibitors as well as in many extracellular proteins. The domain typically contains ten cysteine residues that form five disulphide bonds. The cysteine residues that form the disulphide bonds are 1-7, 2-6, 3-5, 4-10 and 8-9. This inhibitor domain belongs to MEROPS inhibitor family I8 (clan IA). Proteins containing this domain inhibit peptidases belonging to families S1 (IPR001254 from INTERPRO), S8 (IPR000209 from INTERPRO), and M4 (IPR001570 from INTERPRO) [] and are restricted to the chordata, nematoda, arthropoda and echinodermata. Examples of proteins containing this domain are:  chymotrypsin/elastase inhibitor from Ascaris suum (pig roundworm) Acp62F protein from Drosophila melanogaster  Bombina trypsin inhibitor from Bombina maxima (large-webbed bell toad) Bombyx subtilisin inhibitor from Bombyx mori (silk moth) von Willebrand factor ; PDB: 2P3F_N 1HX2_A 1CCV_A 1EAI_D 2H9E_C 1COU_A 1ATE_A 1ATB_A 1ATD_A 1ATA_A ....
Probab=38.18  E-value=17  Score=26.45  Aligned_cols=26  Identities=31%  Similarity=0.752  Sum_probs=20.6

Q ss_pred             CCCCceecCCeeeeCCCceecCC-CcccCh
Q 016022           96 CPSNGECHQGKLECFHGYRKHGK-LCVEDG  124 (396)
Q Consensus        96 CPehAiC~~g~l~C~~gYvl~~~-~CV~D~  124 (396)
                      |+  ..|.+| =.|.+||++... .||+-.
T Consensus        27 C~--~~C~~g-C~C~~G~v~~~~~~CV~~~   53 (55)
T PF01826_consen   27 CS--EPCVEG-CFCPPGYVRNDNGRCVPPS   53 (55)
T ss_dssp             CS--SS-ESE-EEETTTEEEETTSEEEEGG
T ss_pred             cC--CCCCcc-CCCCCCeeEcCCCCEEcHH
Confidence            55  778888 789999999876 999864


No 21 
>smart00179 EGF_CA Calcium-binding EGF-like domain.
Probab=37.69  E-value=31  Score=22.37  Aligned_cols=26  Identities=38%  Similarity=1.017  Sum_probs=18.6

Q ss_pred             ccCCCCceecCC--ee--eeCCCceecCCCc
Q 016022           94 EPCPSNGECHQG--KL--ECFHGYRKHGKLC  120 (396)
Q Consensus        94 ~PCPehAiC~~g--~l--~C~~gYvl~~~~C  120 (396)
                      .||..+|.|.+.  ..  .|.+||. .+..|
T Consensus         9 ~~C~~~~~C~~~~g~~~C~C~~g~~-~g~~C   38 (39)
T smart00179        9 NPCQNGGTCVNTVGSYRCECPPGYT-DGRNC   38 (39)
T ss_pred             CCcCCCCEeECCCCCeEeECCCCCc-cCCcC
Confidence            369899999854  22  7889987 45555


No 22 
>PRK09458 pspB phage shock protein B; Provisional
Probab=37.19  E-value=86  Score=25.10  Aligned_cols=30  Identities=23%  Similarity=0.261  Sum_probs=17.2

Q ss_pred             HHHHHHHHH-HHHHHHHH-------HHHHHHHHHHHHH
Q 016022          243 SLLVGCLLL-LWKVHRRR-------YFAIRVEELYHQV  272 (396)
Q Consensus       243 ~llv~i~~l-~~~~~r~~-------~e~~~v~~Lv~~v  272 (396)
                      +++|+-.|+ .+|..+++       .+.++.++|++.+
T Consensus        14 ~ifVaPiWL~LHY~sk~~~~~~Ls~~d~~~L~~L~~~A   51 (75)
T PRK09458         14 VLFVAPIWLWLHYRSKRQGSQGLSQEEQQRLAQLTEKA   51 (75)
T ss_pred             HHHHHHHHHHHhhcccccCCCCCCHHHHHHHHHHHHHH
Confidence            344454444 56655443       4666677777765


No 23 
>PF07974 EGF_2:  EGF-like domain;  InterPro: IPR013111 A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown [, , , , ] to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines vary in length. This entry contains EGF domains found in a variety of extracellular and membrane proteins
Probab=36.20  E-value=33  Score=22.68  Aligned_cols=20  Identities=35%  Similarity=0.856  Sum_probs=17.0

Q ss_pred             cCCCCceec--CCeeeeCCCce
Q 016022           95 PCPSNGECH--QGKLECFHGYR  114 (396)
Q Consensus        95 PCPehAiC~--~g~l~C~~gYv  114 (396)
                      .|=.||+|.  .|.=.|++||.
T Consensus         7 ~C~~~G~C~~~~g~C~C~~g~~   28 (32)
T PF07974_consen    7 ICSGHGTCVSPCGRCVCDSGYT   28 (32)
T ss_pred             ccCCCCEEeCCCCEEECCCCCc
Confidence            588999999  56779999984


No 24 
>PRK11677 hypothetical protein; Provisional
Probab=34.55  E-value=1.5e+02  Score=26.29  Aligned_cols=7  Identities=0%  Similarity=0.192  Sum_probs=2.6

Q ss_pred             HHHHHHH
Q 016022          265 VEELYHQ  271 (396)
Q Consensus       265 v~~Lv~~  271 (396)
                      |.+.+.+
T Consensus        52 V~~HFa~   58 (134)
T PRK11677         52 LVSHFAR   58 (134)
T ss_pred             HHHHHHH
Confidence            3333333


No 25 
>cd00053 EGF Epidermal growth factor domain, found in epidermal growth factor (EGF) presents in a large number of proteins, mostly animal; the list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied; the functional significance of EGF-like domains in what appear to be unrelated proteins is not yet clear; a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase); the domain includes six cysteine residues which have been shown to be involved in disulfide bonds; the main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet; Subdomains between the conserved cysteines vary in length; the region between the 5th and 6th cysteine contains two conserved glycines of which at  least  one  is  present  in  most EGF-like domains; a subset of these bind calcium.
Probab=34.03  E-value=40  Score=20.93  Aligned_cols=25  Identities=32%  Similarity=0.890  Sum_probs=17.9

Q ss_pred             cCCCCceecCC--ee--eeCCCceecCCCc
Q 016022           95 PCPSNGECHQG--KL--ECFHGYRKHGKLC  120 (396)
Q Consensus        95 PCPehAiC~~g--~l--~C~~gYvl~~~~C  120 (396)
                      +|..||+|.+.  ..  .|..||... ..|
T Consensus         7 ~C~~~~~C~~~~~~~~C~C~~g~~g~-~~C   35 (36)
T cd00053           7 PCSNGGTCVNTPGSYRCVCPPGYTGD-RSC   35 (36)
T ss_pred             CCCCCCEEecCCCCeEeECCCCCccc-CCc
Confidence            67788999984  23  899998654 344


No 26 
>PF06143 Baculo_11_kDa:  Baculovirus 11 kDa family;  InterPro: IPR009313 This is a family of uncharacterised Baculovirus proteins that are all about 11 kDa in size.
Probab=33.77  E-value=2.7e+02  Score=22.83  Aligned_cols=22  Identities=14%  Similarity=0.354  Sum_probs=12.6

Q ss_pred             HHHHHHHHHHHHHHHHHHHHHH
Q 016022          230 WVSTHALIIVPVCSLLVGCLLL  251 (396)
Q Consensus       230 ~i~~~~~~i~~~l~llv~i~~l  251 (396)
                      .++.+++.|.+++++++.++++
T Consensus        32 firdFvLVic~~lVfVii~lFi   53 (84)
T PF06143_consen   32 FIRDFVLVICCFLVFVIIVLFI   53 (84)
T ss_pred             HHHHHHHHHHHHHHHHHHHHHH
Confidence            4566666666665555555444


No 27 
>PF10576 EndIII_4Fe-2S:  Iron-sulfur binding domain of endonuclease III;  InterPro: IPR003651 Endonuclease III (4.2.99.18 from EC) is a DNA repair enzyme which removes a number of damaged pyrimidines from DNA via its glycosylase activity and also cleaves the phosphodiester backbone at apurinic / apyrimidinic sites via a beta-elimination mechanism [, ]. The structurally related DNA glycosylase MutY recognises and excises the mutational intermediate 8-oxoguanine-adenine mispair []. The 3-D structures of Escherichia coli endonuclease III [] and catalytic domain of MutY [] have been determined. The structures contain two all-alpha domains: a sequence-continuous, six-helix domain (residues 22-132) and a Greek-key, four-helix domain formed by one N-terminal and three C-terminal helices (residues 1-21 and 133-211) together with the [Fe4S4] cluster. The cluster is bound entirely within the C-terminal loop by four cysteine residues with a ligation pattern Cys-(Xaa)6-Cys-(Xaa)2-Cys-(Xaa)5-Cys which is distinct from all other known Fe4S4 proteins. This structural motif is referred to as a [Fe4S4] cluster loop (FCL) []. Two DNA-binding motifs have been proposed, one at either end of the interdomain groove: the helix-hairpin-helix (HhH) and FCL motifs. The primary role of the iron-sulphur cluster appears to involve positioning conserved basic residues for interaction with the DNA phosphate backbone by forming the loop of the FCL motif [, ].  The iron-sulphur cluster loop (FCL) is also found in DNA-(apurinic or apyrimidinic site) lyase, a subfamily of endonuclease III. The enzyme has both apurinic and apyrimidinic endonuclease activity and a DNA N-glycosylase activity. It cuts damaged DNA at cytosines, thymines and guanines, and acts on the damaged strand 5' of the damaged site. The enzyme binds a 4Fe-4S cluster which is not important for the catalytic activity, but is probably involved in the alignment of the enzyme along the DNA strand.; GO: 0004519 endonuclease activity, 0051539 4 iron, 4 sulfur cluster binding; PDB: 1VRL_A 1RRQ_A 3G0Q_A 3FSQ_A 1RRS_A 3FSP_A 2ABK_A 1KG7_A 1KG2_A 1MUN_A ....
Probab=32.83  E-value=19  Score=20.62  Aligned_cols=14  Identities=36%  Similarity=0.923  Sum_probs=8.6

Q ss_pred             CCCCCccCCCCcee
Q 016022           89 PTDSCEPCPSNGEC  102 (396)
Q Consensus        89 ~~p~C~PCPehAiC  102 (396)
                      .+|.|.-||-+..|
T Consensus         4 r~P~C~~Cpl~~~C   17 (17)
T PF10576_consen    4 RKPKCEECPLADYC   17 (17)
T ss_dssp             SS--GGG-TTGGG-
T ss_pred             CCCccccCCCcccC
Confidence            47899999999887


No 28 
>PF05568 ASFV_J13L:  African swine fever virus J13L protein;  InterPro: IPR008385 This family consists of several African swine fever virus (ASFV) j13L proteins [, , ].
Probab=32.66  E-value=1.1e+02  Score=27.66  Aligned_cols=10  Identities=40%  Similarity=0.630  Sum_probs=4.8

Q ss_pred             HHHHHHHHHH
Q 016022          230 WVSTHALIIV  239 (396)
Q Consensus       230 ~i~~~~~~i~  239 (396)
                      ++..|+..|+
T Consensus        26 ffsthm~tIL   35 (189)
T PF05568_consen   26 FFSTHMYTIL   35 (189)
T ss_pred             HHHHHHHHHH
Confidence            3455554443


No 29 
>PRK07597 secE preprotein translocase subunit SecE; Reviewed
Probab=32.54  E-value=82  Score=23.87  Aligned_cols=28  Identities=25%  Similarity=0.420  Sum_probs=20.0

Q ss_pred             CCCChhhHHHHHHHHHHHHHHHHHHHHH
Q 016022           38 LFPSKQDLLRLITVVAIASSVALTCNYL   65 (396)
Q Consensus        38 ~~~~~~~~~~~~~v~~ia~~~a~~c~~l   65 (396)
                      -.|+++|..+...+.+++.++..+..++
T Consensus        25 ~WPs~~e~~~~t~~Vi~~~~~~~~~i~~   52 (64)
T PRK07597         25 TWPTRKELVRSTIVVLVFVAFFALFFYL   52 (64)
T ss_pred             cCcCHHHHHhHHHHHHHHHHHHHHHHHH
Confidence            3699999998888777777665444443


No 30 
>TIGR00964 secE_bact preprotein translocase, SecE subunit, bacterial. This model represents exclusively the bacterial (and some organellar) SecE protein. SecE is part of the core heterotrimer, SecYEG, of the Sec preprotein translocase system. Other components are the ATPase SecA, a cytosolic chaperone SecB, and an accessory complex of SecDF and YajC.
Probab=32.02  E-value=86  Score=23.09  Aligned_cols=27  Identities=19%  Similarity=0.261  Sum_probs=18.9

Q ss_pred             CCChhhHHHHHHHHHHHHHHHHHHHHH
Q 016022           39 FPSKQDLLRLITVVAIASSVALTCNYL   65 (396)
Q Consensus        39 ~~~~~~~~~~~~v~~ia~~~a~~c~~l   65 (396)
                      .|+|+|..+...+.++++++..+..++
T Consensus        17 WPt~~e~~~~t~~Vi~~~~~~~~~~~~   43 (55)
T TIGR00964        17 WPSRKELITYTIVVIVFVIFFSLFLFG   43 (55)
T ss_pred             CcCHHHHHhHHHHHHHHHHHHHHHHHH
Confidence            699999988877777766664444333


No 31 
>PF00558 Vpu:  Vpu protein;  InterPro: IPR008187 The Human immunodeficiency virus 1 (HIV-1) Vpu protein acts in the degradation of CD4 in the endoplasmic reticulum and in the enhancement of virion release from the plasma membrane of infected cells [].; GO: 0019076 release of virus from host; PDB: 2JPX_A 1PI8_A 2GOH_A 2GOF_A 1PI7_A 1PJE_A 1VPU_A 2K7Y_A.
Probab=31.95  E-value=50  Score=26.82  Aligned_cols=19  Identities=16%  Similarity=0.238  Sum_probs=5.5

Q ss_pred             HHHHHHHHHHHHHHHHHHH
Q 016022          256 HRRRYFAIRVEELYHQVCE  274 (396)
Q Consensus       256 ~r~~~e~~~v~~Lv~~vl~  274 (396)
                      |++.+.+++++++++.+.+
T Consensus        30 Yrk~~rqrkId~li~RIre   48 (81)
T PF00558_consen   30 YRKIKRQRKIDRLIERIRE   48 (81)
T ss_dssp             ---------CHHHHHHHHC
T ss_pred             HHHHHHHHhHHHHHHHHHc
Confidence            5555556667666654433


No 32 
>PF07271 Cytadhesin_P30:  Cytadhesin P30/P32;  InterPro: IPR009896 This family consists of several Mycoplasma species specific Cytadhesin P32 and P30 proteins. P30 has been found to be membrane associated and localised on the tip organelle. It is thought that it is important in cytadherence and virulence [].; GO: 0007157 heterophilic cell-cell adhesion, 0009405 pathogenesis, 0016021 integral to membrane
Probab=31.59  E-value=1.5e+02  Score=29.45  Aligned_cols=17  Identities=24%  Similarity=0.022  Sum_probs=10.9

Q ss_pred             HHHHHHHHHHHHHHHHH
Q 016022          260 YFAIRVEELYHQVCEIL  276 (396)
Q Consensus       260 ~e~~~v~~Lv~~vl~~L  276 (396)
                      +|+++.++++++.-.+-
T Consensus       104 ee~e~~~q~~e~~~~i~  120 (279)
T PF07271_consen  104 EEKEEHEQLAEQLGRIS  120 (279)
T ss_pred             HHHHHHHHHHHHHHHHH
Confidence            56667788887654443


No 33 
>PF06679 DUF1180:  Protein of unknown function (DUF1180);  InterPro: IPR009565 This entry consists of several hypothetical eukaryotic proteins thought to be membrane proteins. Their function is unknown.
Probab=30.95  E-value=1.9e+02  Score=26.44  Aligned_cols=31  Identities=23%  Similarity=0.231  Sum_probs=23.1

Q ss_pred             CCCCCCChhhHHHHHHHHHHHHHHHHHHHHH
Q 016022           35 PQSLFPSKQDLLRLITVVAIASSVALTCNYL   65 (396)
Q Consensus        35 ~~~~~~~~~~~~~~~~v~~ia~~~a~~c~~l   65 (396)
                      |..+-+.+.-+.|.++||..+++.+.+|+++
T Consensus        84 ~s~~~~d~~~l~R~~~Vl~g~s~l~i~yfvi  114 (163)
T PF06679_consen   84 PSPSSPDSPMLKRALYVLVGLSALAILYFVI  114 (163)
T ss_pred             cCCCcCCccchhhhHHHHHHHHHHHHHHHHH
Confidence            3345567777888999888888888777774


No 34 
>PF07543 PGA2:  Protein trafficking PGA2;  InterPro: IPR011431 A Saccharomyces cerevisiae (Baker's yeast) member of this family (PGA2, P53903 from SWISSPROT) is a single pass membrane protein which has been implicated in protein trafficking [, ].
Probab=30.07  E-value=1.5e+02  Score=26.50  Aligned_cols=12  Identities=17%  Similarity=0.094  Sum_probs=6.7

Q ss_pred             cccccccccccC
Q 016022          293 WVVASRLRDHLL  304 (396)
Q Consensus       293 yl~~~qLRD~LL  304 (396)
                      =|+.+.||+..-
T Consensus        62 k~s~n~lRg~~~   73 (140)
T PF07543_consen   62 KISPNALRGGKA   73 (140)
T ss_pred             cCCchhhccccc
Confidence            356666666433


No 35 
>PF11044 TMEMspv1-c74-12:  Plectrovirus spv1-c74 ORF 12 transmembrane protein;  InterPro: IPR022743  This is a group of proteins expressed by Plectroviruses. The Plectroviruses are single-stranded DNA viruses belonging to the Inoviridae. This entry represents putative transmembrane proteins of unknown function. 
Probab=29.75  E-value=2.1e+02  Score=20.70  Aligned_cols=15  Identities=20%  Similarity=0.104  Sum_probs=5.7

Q ss_pred             HHHHHHHHHHHHHHH
Q 016022          237 IIVPVCSLLVGCLLL  251 (396)
Q Consensus       237 ~i~~~l~llv~i~~l  251 (396)
                      .|+++++++..++|+
T Consensus         7 ~iFsvvIil~If~~i   21 (49)
T PF11044_consen    7 TIFSVVIILGIFAWI   21 (49)
T ss_pred             HHHHHHHHHHHHHHH
Confidence            344443333333343


No 36 
>COG0690 SecE Preprotein translocase subunit SecE [Intracellular trafficking and secretion]
Probab=28.35  E-value=1.2e+02  Score=23.92  Aligned_cols=28  Identities=18%  Similarity=0.376  Sum_probs=18.5

Q ss_pred             CCCChhhHHHHHHHHHHHHHHHHHHHHH
Q 016022           38 LFPSKQDLLRLITVVAIASSVALTCNYL   65 (396)
Q Consensus        38 ~~~~~~~~~~~~~v~~ia~~~a~~c~~l   65 (396)
                      -+|++.|..+...+.++..+++.+..++
T Consensus        35 ~WPsrke~~~~t~~Vl~~v~~~s~~~~~   62 (73)
T COG0690          35 VWPTRKELIRSTLIVLVVVAFFSLFLYG   62 (73)
T ss_pred             cCCCHHHHHHHHHHHHHHHHHHHHHHHH
Confidence            3699999888877666655554443333


No 37 
>KOG4403 consensus Cell surface glycoprotein STIM, contains SAM domain [General function prediction only]
Probab=27.95  E-value=2.9e+02  Score=29.35  Aligned_cols=11  Identities=18%  Similarity=0.691  Sum_probs=6.6

Q ss_pred             ccchhhHHHHh
Q 016022          157 WVEENDIWNDL  167 (396)
Q Consensus       157 ~i~e~dL~~~~  167 (396)
                      .|+.+|||+.-
T Consensus       117 ~ItVedLWeaW  127 (575)
T KOG4403|consen  117 HITVEDLWEAW  127 (575)
T ss_pred             ceeHHHHHHHH
Confidence            46666666653


No 38 
>PHA03399 pif3 per os infectivity factor 3; Provisional
Probab=27.43  E-value=73  Score=30.17  Aligned_cols=21  Identities=24%  Similarity=0.768  Sum_probs=15.7

Q ss_pred             ccCCCCceecCC--------eeeeCCCce
Q 016022           94 EPCPSNGECHQG--------KLECFHGYR  114 (396)
Q Consensus        94 ~PCPehAiC~~g--------~l~C~~gYv  114 (396)
                      +||=.+..|.++        .+.|+.||=
T Consensus        58 lPCVtD~QC~dnC~~~~~~~~~~C~~GFC   86 (200)
T PHA03399         58 LPCVTDQQCRDNCAIGSAAGVMTCDGGFC   86 (200)
T ss_pred             CCcccHHHHHHHHHhccccceEECCCCee
Confidence            488899888754        458988863


No 39 
>KOG0474 consensus Cl- channel CLC-7 and related proteins (CLC superfamily) [Inorganic ion transport and metabolism]
Probab=27.34  E-value=90  Score=34.59  Aligned_cols=24  Identities=29%  Similarity=0.584  Sum_probs=13.6

Q ss_pred             CCCCccCCCCceecCC-eeeeCCCc
Q 016022           90 TDSCEPCPSNGECHQG-KLECFHGY  113 (396)
Q Consensus        90 ~p~C~PCPehAiC~~g-~l~C~~gY  113 (396)
                      -..|+|||....=..- .+-|.+|+
T Consensus       396 l~~C~P~~~~~~~~~~p~f~Cp~~~  420 (762)
T KOG0474|consen  396 LADCQPCPPSITEGQCPTFFCPDGE  420 (762)
T ss_pred             HhcCCCCCCCcccccCccccCCCCc
Confidence            3578888876533211 25676664


No 40 
>PF07466 DUF1517:  Protein of unknown function (DUF1517);  InterPro: IPR010903 This family consists of several hypothetical glycine rich plant and bacterial proteins of around 300 residues in length. The function of this family is unknown.
Probab=26.53  E-value=1.8e+02  Score=28.98  Aligned_cols=23  Identities=4%  Similarity=0.163  Sum_probs=12.5

Q ss_pred             hhHHHHHHHHHHHHHHHHHHHHH
Q 016022           43 QDLLRLITVVAIASSVALTCNYL   65 (396)
Q Consensus        43 ~~~~~~~~v~~ia~~~a~~c~~l   65 (396)
                      ..|.-++.+|+++.+++++..++
T Consensus        62 gg~~gl~~iLIl~~Ia~~vv~~~   84 (289)
T PF07466_consen   62 GGFGGLFDILILFGIAFFVVRFF   84 (289)
T ss_pred             cccchHHHHHHHHHHHHHHHHHH
Confidence            33455666666555555554444


No 41 
>PF14316 DUF4381:  Domain of unknown function (DUF4381)
Probab=26.04  E-value=1.7e+02  Score=25.77  Aligned_cols=15  Identities=27%  Similarity=0.275  Sum_probs=6.9

Q ss_pred             HHHHHHHHHHHHhhh
Q 016022          269 YHQVCEILEENALMS  283 (396)
Q Consensus       269 v~~vl~~L~~q~~~~  283 (396)
                      ..++-.+|+..+..+
T Consensus        70 ~~~l~~LLKr~a~~~   84 (146)
T PF14316_consen   70 LAALNELLKRVALQY   84 (146)
T ss_pred             HHHHHHHHHHHHHHh
Confidence            334455555444433


No 42 
>PF00584 SecE:  SecE/Sec61-gamma subunits of protein translocation complex;  InterPro: IPR001901 Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component []. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.   The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF) []. The chaperone protein SecB [] is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion []. SecE, part of the main SecYEG translocase complex, is ~106 residues in length, and spans the inner membrane of the Gram-negative bacterial envelope. Together with SecY and SecG, SecE forms a multimeric channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA.  In eukaryotes, the evolutionary related protein sec61-gamma plays a role in protein translocation through the endoplasmic reticulum; it is part of a trimeric complex that also consist of sec61-alpha and beta []. Both secE and sec61-gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region at their C-terminal extremity (Escherichia coli secE is an exception, in that it possess an extra N-terminal segment of 60 residues that contains two additional transmembrane domains) [].; GO: 0006605 protein targeting, 0006886 intracellular protein transport, 0016020 membrane; PDB: 3J01_B 2WW9_B 2WWA_B 3DL8_C 2WWB_B 3DIN_G 2ZJS_E 2ZQP_E.
Probab=25.55  E-value=1.6e+02  Score=21.51  Aligned_cols=21  Identities=24%  Similarity=0.505  Sum_probs=15.2

Q ss_pred             CCChhhHHHHHHHHHHHHHHH
Q 016022           39 FPSKQDLLRLITVVAIASSVA   59 (396)
Q Consensus        39 ~~~~~~~~~~~~v~~ia~~~a   59 (396)
                      .|+++|..+.-.+.++..++.
T Consensus        18 WP~~~e~~~~t~~Vl~~~~i~   38 (57)
T PF00584_consen   18 WPSRKELLKSTIIVLVFVIIF   38 (57)
T ss_dssp             CCCTHHHHHHHHHHHHHHHHH
T ss_pred             CCCHHHHHHHHHHHHHHHHHH
Confidence            599999888776666655553


No 43 
>PF09064 Tme5_EGF_like:  Thrombomodulin like fifth domain, EGF-like;  InterPro: IPR015149 This domain adopts a fold similar to other EGF domains, with a flat major and a twisted minor beta sheet. Disulphide pairing, however, is not of the usual 1-3, 2-4, 5-6 type; rather 1-2, 3-4, 5-6 pairing is found. Its extended major sheet (strands beta-2 and beta-3 and the connecting loop) projects into thrombin's active site groove. This domain is required for interaction of thrombomodulin with thrombin, and subsequent activation of protein-C []. ; GO: 0004888 transmembrane signaling receptor activity, 0016021 integral to membrane
Probab=25.22  E-value=55  Score=22.26  Aligned_cols=21  Identities=29%  Similarity=0.765  Sum_probs=14.2

Q ss_pred             cCCCCceecCC---eeeeCCCceecC
Q 016022           95 PCPSNGECHQG---KLECFHGYRKHG  117 (396)
Q Consensus        95 PCPehAiC~~g---~l~C~~gYvl~~  117 (396)
                      .||.  .|-++   .-.|-+||++..
T Consensus         7 ~CpA--~CDpn~~~~C~CPeGyIlde   30 (34)
T PF09064_consen    7 ECPA--DCDPNSPGQCFCPEGYILDE   30 (34)
T ss_pred             cCCC--ccCCCCCCceeCCCceEecC
Confidence            3553  77776   338889999853


No 44 
>PF03672 UPF0154:  Uncharacterised protein family (UPF0154);  InterPro: IPR005359 The proteins in this entry are functionally uncharacterised.
Probab=25.12  E-value=3e+02  Score=21.38  Aligned_cols=18  Identities=6%  Similarity=0.143  Sum_probs=9.0

Q ss_pred             HHHHHHHHHHHHHHHHHH
Q 016022          243 SLLVGCLLLLWKVHRRRY  260 (396)
Q Consensus       243 ~llv~i~~l~~~~~r~~~  260 (396)
                      ++++|+++.++++.++-+
T Consensus        10 G~~~Gff~ar~~~~k~l~   27 (64)
T PF03672_consen   10 GAVIGFFIARKYMEKQLK   27 (64)
T ss_pred             HHHHHHHHHHHHHHHHHH
Confidence            444555554565554443


No 45 
>PF06247 Plasmod_Pvs28:  Plasmodium ookinete surface protein Pvs28;  InterPro: IPR010423 This family consists of several ookinete surface protein (Pvs28) from several species of Plasmodium. Pvs25 and Pvs28 are expressed on the surface of ookinetes. These proteins are potential candidates for vaccine and induce antibodies that block the infectivity of Plasmodium vivax in immunised animals [].; GO: 0009986 cell surface, 0016020 membrane; PDB: 1Z3G_B 1Z1Y_B 1Z27_A.
Probab=24.94  E-value=27  Score=32.74  Aligned_cols=31  Identities=26%  Similarity=0.720  Sum_probs=24.7

Q ss_pred             cCCCCceecCC-e------e--eeCCCceecCCCcccChh
Q 016022           95 PCPSNGECHQG-K------L--ECFHGYRKHGKLCVEDGD  125 (396)
Q Consensus        95 PCPehAiC~~g-~------l--~C~~gYvl~~~~CV~D~~  125 (396)
                      ||=+.|.|... .      +  .|.+||++..+.|+|+.=
T Consensus        51 ~Cgdya~C~~~~~~~~~~~~~C~C~~gY~~~~~vCvp~~C   90 (197)
T PF06247_consen   51 PCGDYAKCINQANKGEERAYKCDCINGYILKQGVCVPNKC   90 (197)
T ss_dssp             EEETTEEEEE-SSTTSSTSEEEEE-TTEEESSSSEEEGGG
T ss_pred             cccchhhhhcCCCcccceeEEEecccCceeeCCeEchhhc
Confidence            78899999865 2      2  899999999999999864


No 46 
>PF08114 PMP1_2:  ATPase proteolipid family;  InterPro: IPR012589 This family consists of small proteolipids associated with the plasma membrane H+ ATPase. Two proteolipids (PMP1 and PMP2) are associated with the ATPase and both genes are similarly expressed in the wild-type strain of yeast. No modification of the level of transcription of one PMP gene is detected in a strain deleted of the other. Though both proteolipids show similarity with other small proteolipids associated with other cation -transporting ATPases, their functions remain unclear [].
Probab=24.03  E-value=1.9e+02  Score=20.52  Aligned_cols=18  Identities=17%  Similarity=0.074  Sum_probs=8.1

Q ss_pred             HHHHHHHHHHHHHHHHHH
Q 016022          246 VGCLLLLWKVHRRRYFAI  263 (396)
Q Consensus       246 v~i~~l~~~~~r~~~e~~  263 (396)
                      +++..+...+||+.+.++
T Consensus        20 v~i~iva~~iYRKw~aRk   37 (43)
T PF08114_consen   20 VGIGIVALFIYRKWQARK   37 (43)
T ss_pred             HHHHHHHHHHHHHHHHHH
Confidence            344444444455544443


No 47 
>PF14991 MLANA:  Protein melan-A; PDB: 2GTZ_F 2GT9_F 3MRO_P 2GUO_C 3MRQ_P 2GTW_C 3L6F_C 3MRP_P.
Probab=23.87  E-value=19  Score=31.04  Aligned_cols=22  Identities=32%  Similarity=0.679  Sum_probs=0.0

Q ss_pred             HHHHHHHHHHH--HHHHHHHHHHH
Q 016022          241 VCSLLVGCLLL--LWKVHRRRYFA  262 (396)
Q Consensus       241 ~l~llv~i~~l--~~~~~r~~~e~  262 (396)
                      ++++++|+++|  .||++||.-++
T Consensus        31 iL~VILgiLLliGCWYckRRSGYk   54 (118)
T PF14991_consen   31 ILIVILGILLLIGCWYCKRRSGYK   54 (118)
T ss_dssp             ------------------------
T ss_pred             eHHHHHHHHHHHhheeeeecchhh
Confidence            34445555444  68887765443


No 48 
>PF01102 Glycophorin_A:  Glycophorin A;  InterPro: IPR001195 Proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Glycophorin A (PAS-2) and glycophorin B (PAS-3) belong to the MNS blood group system and are associated with antigens that include M/N, S/s, U, He, Mi(a), M(c), Vw, Mur, M(g), Vr, M(e), Mt(a), St(a), Ri(a), Cl(a), Ny(a), Hut, Hil, M(v), Far, Mit, Dantu, Hop, Nob, En(a), ENKT, amongst others. Glycophorin A is the major sialoglycoprotein of the erythrocyte membrane []. Structurally, glycophorin A consists of an N-terminal extracellular domain, heavily glycosylated on serine and threonine residues, followed by a transmembrane region and a C-terminal cytoplasmic domain. Other glycophorins in this entry such as Glycophorin B and Glycophorin E represent minor sialoglycoproteins in the erythrocyte membrane.; GO: 0016021 integral to membrane; PDB: 2KPF_B 1AFO_B 2KPE_A.
Probab=23.86  E-value=88  Score=27.30  Aligned_cols=22  Identities=5%  Similarity=0.256  Sum_probs=14.2

Q ss_pred             HHHHHHHHHHHHHHHHHHHHHH
Q 016022          237 IIVPVCSLLVGCLLLLWKVHRR  258 (396)
Q Consensus       237 ~i~~~l~llv~i~~l~~~~~r~  258 (396)
                      .+++++++++.++|++++.+++
T Consensus        73 v~aGvIg~Illi~y~irR~~Kk   94 (122)
T PF01102_consen   73 VMAGVIGIILLISYCIRRLRKK   94 (122)
T ss_dssp             HHHHHHHHHHHHHHHHHHHS--
T ss_pred             HHHHHHHHHHHHHHHHHHHhcc
Confidence            5667777777777777766554


No 49 
>PF09402 MSC:  Man1-Src1p-C-terminal domain;  InterPro: IPR018996 This entry represents the Inner nuclear membrane proteins MAN1 (also known as LEM domain-containing protein 3) and LEM domain-containing protein 2 (or LEM protein 2). Emerin and MAN1 are LEM domain-containing integral membrane proteins of the vertebrate nuclear envelope []. MAN1 is an integral protein of the inner nuclear membrane which binds to chromatin associated proteins and plays a role in nuclear organisation. The C-terminal nulceoplasmic region forms a DNA binding winged helix and binds to Smad []. LEM protein 2 is an essential protein involved in chromosome segregation and cell division, probably via its interaction with lmn-1, the main component of nuclear lamina. Has some overlapping function with emr-1.; GO: 0005639 integral to nuclear inner membrane; PDB: 2CH0_A.
Probab=23.37  E-value=27  Score=34.83  Aligned_cols=70  Identities=16%  Similarity=0.221  Sum_probs=0.0

Q ss_pred             HHHHHHHHHHHHHHHHHHHhhhhcc--CCCCCCcccccccccccCCCCC-----ccchhhHHHHHHHHhcCCCccee
Q 016022          262 AIRVEELYHQVCEILEENALMSKSV--NGECEPWVVASRLRDHLLLPKE-----RKDPVIWKKVEELVQEDSRVDQY  331 (396)
Q Consensus       262 ~~~v~~Lv~~vl~~L~~q~~~~~~~--~~~~~pyl~~~qLRD~LL~~~~-----r~r~~LW~kV~k~Ve~nSnIrt~  331 (396)
                      .+.+..|++.+.+.|++++..+.=+  .....++++...|+|.+.....     ..-+.+|+.+...+.++..|...
T Consensus        98 ~~~i~~l~~~~~~~Lr~~~a~~~Cg~~~~~~~~~ls~~el~~~~~~~~~~~~~~~efe~l~~~a~~~L~~~~ei~~~  174 (334)
T PF09402_consen   98 EEKIEELAKKILDELRERNAQYECGDSEDDESPGLSEEELKDILSSKKSPWISDEEFEELWSAALQELKKNPEIIIR  174 (334)
T ss_dssp             -----------------------------------------------------------------------------
T ss_pred             HHHHHHHHHHHHHHHHHHHhhcccCCCCCCCCCCCcHHHHHHHHHhccCccccHHHHHHHHHHHHHHHHhCCcEEEe
Confidence            4568888999999998776665433  2457899999999999995441     23388999999999887666544


No 50 
>PHA02673 ORF109 EEV glycoprotein; Provisional
Probab=22.51  E-value=1.3e+02  Score=27.50  Aligned_cols=22  Identities=23%  Similarity=0.298  Sum_probs=15.4

Q ss_pred             HHHHHHHHHHHHHHHHHHHHHH
Q 016022           45 LLRLITVVAIASSVALTCNYLA   66 (396)
Q Consensus        45 ~~~~~~v~~ia~~~a~~c~~l~   66 (396)
                      |+|+.++++|-++.+++..+.+
T Consensus        35 ~~Ri~~~iSIisL~~l~v~LaL   56 (161)
T PHA02673         35 FFRLMAAIAIIVLAILVVILAL   56 (161)
T ss_pred             HHHHHHHHHHHHHHHHHHHHHH
Confidence            6777777777777776665543


No 51 
>PF12729 4HB_MCP_1:  Four helix bundle sensory module for signal transduction;  InterPro: IPR024478 This entry represents a four-helix bundle that operates as a ubiquitous sensory module in prokaryotic signal-transduction, which is known as four-helix bundles methyl-accepting chemotaxis protein (4HB_MCP) domain. The 4HB_MCP is always found between two predicted transmembrane helices indicating that it detects only extracellular signals. In many cases the domain is associated with a cytoplasmic HAMP domain suggesting that most proteins carrying the bundle might share the mechanism of transmembrane signalling which is well-characterised in E coli chemoreceptors [].
Probab=22.10  E-value=3.7e+02  Score=22.62  Aligned_cols=10  Identities=40%  Similarity=0.411  Sum_probs=5.0

Q ss_pred             cccccccCCC
Q 016022          297 SRLRDHLLLP  306 (396)
Q Consensus       297 ~qLRD~LL~~  306 (396)
                      ..+++.++.+
T Consensus        63 ~~~~~~~~~~   72 (181)
T PF12729_consen   63 RALRRYLLAT   72 (181)
T ss_pred             HHHHHhhhcC
Confidence            3455555543


No 52 
>PRK15428 putative propanediol utilization protein PduM; Provisional
Probab=21.73  E-value=81  Score=28.93  Aligned_cols=31  Identities=19%  Similarity=0.266  Sum_probs=24.1

Q ss_pred             HHHHHHHHHHHHHHHHHHhhhhccCCCCCCccccccccc
Q 016022          263 IRVEELYHQVCEILEENALMSKSVNGECEPWVVASRLRD  301 (396)
Q Consensus       263 ~~v~~Lv~~vl~~L~~q~~~~~~~~~~~~pyl~~~qLRD  301 (396)
                      ..++.||++|+.+|++++....        -++..|||+
T Consensus         4 ~~~~~iV~~Vv~RLk~Ra~~~~--------~ls~~ql~~   34 (163)
T PRK15428          4 EMLQRIVEEVVARLQRRAQSTA--------TLSVAQLRD   34 (163)
T ss_pred             HHHHHHHHHHHHHHHHHhhceE--------EEEHHHccC
Confidence            4578899999999998776543        377778887


No 53 
>PF12662 cEGF:  Complement Clr-like EGF-like
Probab=21.38  E-value=52  Score=20.53  Aligned_cols=16  Identities=31%  Similarity=0.800  Sum_probs=11.9

Q ss_pred             eeeCCCceec--CCCccc
Q 016022          107 LECFHGYRKH--GKLCVE  122 (396)
Q Consensus       107 l~C~~gYvl~--~~~CV~  122 (396)
                      -.|.+||.+.  +..|+.
T Consensus         4 C~C~~Gy~l~~d~~~C~D   21 (24)
T PF12662_consen    4 CSCPPGYQLSPDGRSCED   21 (24)
T ss_pred             eeCCCCCcCCCCCCcccc
Confidence            3699999985  567764


No 54 
>PF15050 SCIMP:  SCIMP protein
Probab=21.33  E-value=2.3e+02  Score=24.86  Aligned_cols=13  Identities=31%  Similarity=0.606  Sum_probs=6.5

Q ss_pred             HHHHHHHHHHHHH
Q 016022          230 WVSTHALIIVPVC  242 (396)
Q Consensus       230 ~i~~~~~~i~~~l  242 (396)
                      |+..+..+|+.+.
T Consensus         3 WWr~nFWiiLAVa   15 (133)
T PF15050_consen    3 WWRDNFWIILAVA   15 (133)
T ss_pred             hHHhchHHHHHHH
Confidence            4455555554443


No 55 
>PF11392 DUF2877:  Protein of unknown function (DUF2877);  InterPro: IPR021530  This bacterial family of proteins are putative carboxylase proteins however this cannot be confirmed. 
Probab=21.19  E-value=52  Score=27.85  Aligned_cols=11  Identities=36%  Similarity=0.507  Sum_probs=9.0

Q ss_pred             CCCCCCChhhH
Q 016022           35 PQSLFPSKQDL   45 (396)
Q Consensus        35 ~~~~~~~~~~~   45 (396)
                      =|||+||.+||
T Consensus         5 G~GLTPSGDD~   15 (110)
T PF11392_consen    5 GPGLTPSGDDF   15 (110)
T ss_pred             CCCCCCchHHH
Confidence            37899999995


No 56 
>PF10500 SR-25:  Nuclear RNA-splicing-associated protein;  InterPro: IPR019532  SR-25, otherwise known as ADP-ribosylation factor-like factor 6-interacting protein 4, is expressed in virtually all tissue types. At the N terminus there is a repeat of serine-arginine (SR repeat), and towards the middle of the protein there are clusters of both serines and of basic amino acids. The presence of many nuclear localisation signals strongly implies that this is a nuclear protein that may contribute to RNA splicing []. SR-25 is also implicated, along with heat-shock-protein-27, as a mediator in the Rac1 (GTPase ras-related C3 botulinum toxin substrate 1; also see IPR019093 from INTERPRO) signalling pathway [].
Probab=21.08  E-value=66  Score=30.91  Aligned_cols=9  Identities=11%  Similarity=0.132  Sum_probs=5.5

Q ss_pred             CCChHHHHH
Q 016022          177 ELDNPVYLY  185 (396)
Q Consensus       177 ~l~~~~fe~  185 (396)
                      -|+.|||+-
T Consensus       159 PmTkEEyea  167 (225)
T PF10500_consen  159 PMTKEEYEA  167 (225)
T ss_pred             CCCHHHHHH
Confidence            467776653


No 57 
>PF10588 NADH-G_4Fe-4S_3:  NADH-ubiquinone oxidoreductase-G iron-sulfur binding region;  InterPro: IPR019574  NADH:ubiquinone oxidoreductase (complex I) (1.6.5.3 from EC) is a respiratory-chain enzyme that catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane (NADH + ubiquinone = NAD+ + ubiquinol) []. Complex I is a major source of reactive oxygen species (ROS) that are predominantly formed by electron transfer from FMNH(2). Complex I is found in bacteria, cyanobacteria (as a NADH-plastoquinone oxidoreductase), archaea [], mitochondira, and in the hydrogenosome, a mitochondria-derived organelle. In general, the bacterial complex consists of 14 different subunits, while the mitochondrial complex contains homologues to these subunits in addition to approximately 31 additional proteins []. Mitochondrial complex I, which is located in the inner mitochondrial membrane, is the largest multimeric respiratory enzyme in the mitochondria, consisting of more than 40 subunits, one FMN co-factor and eight FeS clusters []. The assembly of mitochondrial complex I is an intricate process that requires the cooperation of the nuclear and mitochondrial genomes [, ]. Mitochondrial complex I can cycle between active and deactive forms that can be distinguished by the reactivity towards divalent cations and thiol-reactive agents. All redox prosthetic groups reside in the peripheral arm of the L-shaped structure. The NADH oxidation domain harbouring the FMN cofactor is connected via a chain of iron-sulphur clusters to the ubiquinone reduction site that is located in a large pocket formed by the PSST and 49kDa subunits of complex I []. This entry describes the G subunit (one of 14 subunits, A to N) of the NADH-quinone oxidoreductase complex I which generally couples NADH and ubiquinone oxidation/reduction in bacteria and mammalian mitochondria while translocating protons, but may act on NADPH and/or plastoquinone in cyanobacteria and plant chloroplasts. This family does not contain related subunits from formate dehydrogenase complexes.  This entry represents the iron-sulphur binding domain of the G subunit.; GO: 0016491 oxidoreductase activity, 0055114 oxidation-reduction process; PDB: 3M9S_C 2FUG_L 3IAS_L 2YBB_3 3IAM_3 3I9V_3.
Probab=20.88  E-value=44  Score=23.30  Aligned_cols=16  Identities=31%  Similarity=0.866  Sum_probs=8.4

Q ss_pred             CCCCCCccCCCCceec
Q 016022           88 SPTDSCEPCPSNGECH  103 (396)
Q Consensus        88 ~~~p~C~PCPehAiC~  103 (396)
                      .++-.|.-|+.+|.|.
T Consensus        11 ~H~~dC~~C~~~G~Ce   26 (41)
T PF10588_consen   11 NHPLDCPTCDKNGNCE   26 (41)
T ss_dssp             T----TTT-TTGGG-H
T ss_pred             CCCCcCcCCCCCCCCH
Confidence            3456899999999984


No 58 
>cd00033 CCP Complement control protein (CCP) modules (aka short consensus repeats SCRs or SUSHI repeats) have been identified in several proteins of the complement system. SUSHI repeats (short complement-like repeat, SCR) are abundant in complement control proteins. The complement control protein (CCP) modules (also known as short consensus repeats SCRs or SUSHI repeats) contain approximately 60 amino acid residues and have been identified in several proteins of the complement system. Typically, 2 to 4 modules contribute to a binding site, implying that the orientation of the modules to each other is critical for function.
Probab=20.11  E-value=61  Score=22.52  Aligned_cols=20  Identities=35%  Similarity=0.773  Sum_probs=14.1

Q ss_pred             eeeeCCCceecC---CCcccChh
Q 016022          106 KLECFHGYRKHG---KLCVEDGD  125 (396)
Q Consensus       106 ~l~C~~gYvl~~---~~CV~D~~  125 (396)
                      .+.|++||.+.+   -.|..|+.
T Consensus        26 ~~~C~~Gy~~~g~~~~~C~~~g~   48 (57)
T cd00033          26 TYSCNEGYTLVGSSTITCTENGG   48 (57)
T ss_pred             EEECCCCCeEeCCCeeEECCCCe
Confidence            459999999874   35666553


No 59 
>PRK09400 secE preprotein translocase subunit SecE; Reviewed
Probab=20.08  E-value=2e+02  Score=21.95  Aligned_cols=19  Identities=16%  Similarity=0.461  Sum_probs=15.3

Q ss_pred             CChhhHHHHHHHHHHHHHH
Q 016022           40 PSKQDLLRLITVVAIASSV   58 (396)
Q Consensus        40 ~~~~~~~~~~~v~~ia~~~   58 (396)
                      |+++||.+...+.++..++
T Consensus        27 Pd~~Ef~~ia~~~~iG~~i   45 (61)
T PRK09400         27 PTREEFLLVAKVTGLGILL   45 (61)
T ss_pred             CCHHHHHHHHHHHHHHHHH
Confidence            8999999988877666555


Done!