Query         040068
Match_columns 123
No_of_seqs    107 out of 131
Neff          5.2 
Searched_HMMs 46136
Date          Fri Mar 29 04:48:39 2013
Command       hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/040068.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/040068hhsearch_cdd -cpu 12 -v 0 

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 PF09478 CBM49:  Carbohydrate b  97.0  0.0039 8.4E-08   41.8   6.9   73   29-106     1-78  (80)
  2 PLN02171 endoglucanase          92.1    0.51 1.1E-05   43.0   6.8   77   27-108   535-615 (629)
  3 PF07127 Nodulin_late:  Late no  70.9     4.4 9.6E-05   25.2   2.4    7    1-7       1-7   (54)
  4 PF02933 CDC48_2:  Cell divisio  66.7     4.7  0.0001   25.4   1.9   29   91-120    15-43  (64)
  5 PF03330 DPBB_1:  Rare lipoprot  49.8      20 0.00043   23.2   2.7   38   44-81     38-76  (78)
  6 PF07172 GRP:  Glycine rich pro  49.1      11 0.00024   26.4   1.4    9    1-9       1-9   (95)
  7 PLN02340 endoglucanase          48.5      17 0.00036   33.3   2.9   77   26-106   519-600 (614)
  8 PF06483 ChiC:  Chitinase C;  I  46.9      37 0.00081   26.6   4.2   50   57-113    34-86  (180)
  9 PF07705 CARDB:  CARDB;  InterP  45.5      79  0.0017   20.1   5.4   69   27-110     2-72  (101)
 10 COG3900 Predicted periplasmic   43.2      15 0.00032   30.2   1.5   26   29-54    190-224 (262)
 11 KOG3358 Uncharacterized secret  38.3      24 0.00052   28.1   2.0   40   66-108    60-101 (211)
 12 PF05753 TRAP_beta:  Translocon  36.6   2E+02  0.0042   22.1   8.8   62   37-106    32-94  (181)
 13 PF14016 DUF4232:  Protein of u  32.8      83  0.0018   22.2   4.0   73   24-109     1-82  (131)
 14 PF01345 DUF11:  Domain of unkn  30.0 1.4E+02   0.003   18.8   4.3   30   36-65     35-64  (76)
 15 PF09640 DUF2027:  Domain of un  29.7      30 0.00066   26.7   1.3   32   72-108     8-39  (162)
 16 PF10633 NPCBM_assoc:  NPCBM-as  27.8      87  0.0019   20.0   3.1   25   41-65      4-28  (78)
 17 PF14345 GDYXXLXY:  GDYXXLXY pr  27.5      46 0.00099   24.1   1.9   10   71-80     27-37  (144)
 18 PF06682 DUF1183:  Protein of u  24.2 1.2E+02  0.0025   25.7   3.9   23   58-82     93-115 (318)
 19 PF03293 Pox_RNA_pol:  Poxvirus  22.7 1.4E+02  0.0031   22.8   3.8   48   57-105    93-141 (160)
 20 KOG4063 Major epididymal secre  21.9 1.5E+02  0.0032   22.9   3.8   34    3-36      4-42  (158)
 21 PF00856 SET:  SET domain;  Int  21.8      89  0.0019   20.9   2.4   22   84-105   140-161 (162)
 22 TIGR01451 B_ant_repeat conserv  21.2   2E+02  0.0044   17.4   3.7   25   40-64     10-34  (53)
 23 PRK15301 hypothetical protein;  20.4 2.8E+02   0.006   21.9   5.1   12   44-55     56-67  (186)

No 1  
>PF09478 CBM49:  Carbohydrate binding domain CBM49;  InterPro: IPR019028 A carbohydrate-binding module (CBM) is defined as a contiguous amino acid sequence within a carbohydrate-active enzyme with a discreet fold having carbohydrate-binding activity. A few exceptions are CBMs in cellulosomal scaffolding proteins and rare instances of independent putative CBMs. The requirement of CBMs existing as modules within larger enzymes sets this class of carbohydrate-binding protein apart from other non-catalytic sugar binding proteins such as lectins and sugar transport proteins. CBMs were previously classified as cellulose-binding domains (CBDs) based on the initial discovery of several modules that bound cellulose [, ]. However, additional modules in carbohydrate-active enzymes are continually being found that bind carbohydrates other than cellulose yet otherwise meet the CBM criteria, hence the need to reclassify these polypeptides using more inclusive terminology. Previous classification of cellulose-binding domains were based on amino acid similarity. Groupings of CBDs were called "Types" and numbered with roman numerals (e.g. Type I or Type II CBDs). In keeping with the glycoside hydrolase classification, these groupings are now called families and numbered with Arabic numerals. Families 1 to 13 are the same as Types I to XIII. For a detailed review on the structure and binding modes of CBMs see [].  This domain is found at the C-terminal of cellulases and in vitro binding studies have shown it to binds to crystalline cellulose []. ; GO: 0030246 carbohydrate binding, 0005576 extracellular region
Probab=97.03  E-value=0.0039  Score=41.76  Aligned_cols=73  Identities=16%  Similarity=0.242  Sum_probs=55.1

Q ss_pred             eeEEEeecCCCCCC---eeeEEEEEeeccccccccEEEEcCCcc-cccccCCCceEEeCCeeEEc-cCCcccCCCCeEEE
Q 040068           29 IKVSQSQTGKTVPN---KQEWRLTLNNTCICTQLELKLSCKGFQ-TVEPIDPSIIAISGDECTVV-NNGNPFYGFTTLSF  103 (123)
Q Consensus        29 i~V~Q~~tg~~~~G---~p~~~VtI~N~C~C~~~~V~l~C~gF~-S~~~VdP~i~r~~~d~C~Lv-n~G~pi~~~~~v~F  103 (123)
                      |+|.|..++.+..|   ..+|.|+|+|++.=+++++++.-+.+. +.=    .+-+..++.. -+ +--.+|.+|++.+|
T Consensus         1 i~i~q~~~~sW~~~g~~y~qy~v~I~N~~~~~I~~~~i~~~~l~~~iW----~l~~~~~~~y-~lPs~~~~i~pg~s~~F   75 (80)
T PF09478_consen    1 ITITQTLVNSWTENGQTYTQYDVTITNNGSKPIKSLKISIDNLYGSIW----GLDKVSGNTY-TLPSYQPTIKPGQSFTF   75 (80)
T ss_pred             CEEEEEEEeEEEeCCEEEEEEEEEEEECCCCeEEEEEEEECccchhhe----eEEeccCCEE-ECCccccccCCCCEEEE
Confidence            68899998888775   457999999999999999999998765 111    2222335665 55 43459999999999


Q ss_pred             Eee
Q 040068          104 NYA  106 (123)
Q Consensus       104 ~Ya  106 (123)
                      -|-
T Consensus        76 GYI   78 (80)
T PF09478_consen   76 GYI   78 (80)
T ss_pred             EEE
Confidence            995


No 2  
>PLN02171 endoglucanase
Probab=92.13  E-value=0.51  Score=42.96  Aligned_cols=77  Identities=18%  Similarity=0.124  Sum_probs=54.3

Q ss_pred             CceeEEEeecCCCCC---CeeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCc-ccCCCCeEE
Q 040068           27 DDIKVSQSQTGKTVP---NKQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGN-PFYGFTTLS  102 (123)
Q Consensus        27 sdi~V~Q~~tg~~~~---G~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~-pi~~~~~v~  102 (123)
                      +.|+|.|..++.+..   +..+|+|+|+|++..+.+++++.=..+-.    |=.=+...++.. -+-+-. .|.+|++.+
T Consensus       535 ~ei~i~q~v~~sW~~~g~~y~qy~v~I~N~s~~~ik~i~i~~~~~~~----~iW~v~~~~ngy-tlPs~~~sL~aG~s~t  609 (629)
T PLN02171        535 SPIEIEQKATASWKAKGRTYYRYSTTVTNRSAKTLKELHLGISKLYG----PLWGLTKAGYGY-VLPSWMPSLPAGKSLE  609 (629)
T ss_pred             ceeEEEEEEEEEEEcCCceEEEEEEEEEECCCCceeeeeeeeccccc----cchheeecCCcc-cCchhhcccCCCCeeE
Confidence            358999999988875   47889999999999999999996544421    111111233442 444443 788899999


Q ss_pred             EEeecC
Q 040068          103 FNYAWD  108 (123)
Q Consensus       103 F~Yaw~  108 (123)
                      |-|=..
T Consensus       610 FgyI~~  615 (629)
T PLN02171        610 FVYVHS  615 (629)
T ss_pred             EEeecC
Confidence            999854


No 3  
>PF07127 Nodulin_late:  Late nodulin protein;  InterPro: IPR009810 This family consists of several plant specific late nodulin sequences which are homologous to the Pisum sativum (Garden pea) ENOD3 protein. ENOD3 is expressed in the late stages of root nodule formation and contains two pairs of cysteine residues toward the proteins C terminus which may be involved in metal-binding [].; GO: 0046872 metal ion binding, 0009878 nodule morphogenesis
Probab=70.94  E-value=4.4  Score=25.15  Aligned_cols=7  Identities=86%  Similarity=1.165  Sum_probs=6.5

Q ss_pred             ChhhHHH
Q 040068            1 MAAILKF    7 (123)
Q Consensus         1 Ma~~~k~    7 (123)
                      ||.++|+
T Consensus         1 Ma~ilKF    7 (54)
T PF07127_consen    1 MAKILKF    7 (54)
T ss_pred             Cccchhh
Confidence            9999998


No 4  
>PF02933 CDC48_2:  Cell division protein 48 (CDC48), domain 2;  InterPro: IPR004201 This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the C terminus. The VAT-N domain found in AAA ATPases (IPR003959 from INTERPRO) is a substrate 185-residue recognition domain [].; GO: 0005524 ATP binding; PDB: 1QDN_B 1QCS_A 1CR5_C 3QQ8_A 3HU2_A 3HU1_E 3HU3_A 3QWZ_A 3TIW_B 3QQ7_A ....
Probab=66.72  E-value=4.7  Score=25.45  Aligned_cols=29  Identities=28%  Similarity=0.525  Sum_probs=23.4

Q ss_pred             CCcccCCCCeEEEEeecCCccceeeeeeee
Q 040068           91 NGNPFYGFTTLSFNYAWDTSFPFKPISSQI  120 (123)
Q Consensus        91 ~G~pi~~~~~v~F~Yaw~~~f~~~p~ss~~  120 (123)
                      .|+|+..|+.|.|.+. ...++|.+.+.++
T Consensus        15 ~~~pv~~Gd~i~~~~~-~~~~~~~V~~~~P   43 (64)
T PF02933_consen   15 EGRPVTKGDTIVFPFF-GQALPFKVVSTEP   43 (64)
T ss_dssp             TTEEEETT-EEEEEET-TEEEEEEEEEECS
T ss_pred             cCCCccCCCEEEEEeC-CcEEEEEEEEEEc
Confidence            4699999999999997 6889999887653


No 5  
>PF03330 DPBB_1:  Rare lipoprotein A (RlpA)-like double-psi beta-barrel;  InterPro: IPR009009  Beta barrels are commonly observed in protein structures. They are classified in terms of two integral parameters: the number of strands in the sheet, n, and the shear number, S, a measure of the stagger of the strands in the beta-sheet. These two parameters have been shown to determine the major geometrical features of beta-barrels. Six-stranded beta-barrels with a pseudo-twofold axis are found in several proteins. One involving parallel strands forming two psi structures is known as the double-psi barrel. The first psi structure consists of the loop connecting strands beta1 and beta2 (a 'psi loop') and the strand beta5, whereas the second psi structure consists of the loop connecting strands beta4 and beta5 and the strand beta2. All the psi structures in double-psi barrels have a unique handedness, in that beta1 (beta4), beta2 (beta5) and the loop following beta5 (beta2) form a right-handed helix. The unique handedness may be related to the fact that the twisting angle between the parallel pair of strands is always larger than that between the antiparallel pair [].; PDB: 1N10_B 3D30_A 2BH0_A 2HCZ_X.
Probab=49.80  E-value=20  Score=23.18  Aligned_cols=38  Identities=24%  Similarity=0.472  Sum_probs=28.8

Q ss_pred             eeEEEEEeeccc-cccccEEEEcCCcccccccCCCceEE
Q 040068           44 QEWRLTLNNTCI-CTQLELKLSCKGFQTVEPIDPSIIAI   81 (123)
Q Consensus        44 p~~~VtI~N~C~-C~~~~V~l~C~gF~S~~~VdP~i~r~   81 (123)
                      ..-.|+|+++|+ |...++-|+=..|..--..|..++.+
T Consensus        38 ksV~v~V~D~Cp~~~~~~lDLS~~aF~~la~~~~G~i~V   76 (78)
T PF03330_consen   38 KSVTVTVVDRCPGCPPNHLDLSPAAFKALADPDAGVIPV   76 (78)
T ss_dssp             CEEEEEEEEE-TTSSSSEEEEEHHHHHHTBSTTCSSEEE
T ss_pred             CeEEEEEEccCCCCcCCEEEeCHHHHHHhCCCCceEEEE
Confidence            667899999996 99999999988887655555555543


No 6  
>PF07172 GRP:  Glycine rich protein family;  InterPro: IPR010800 This family consists of glycine rich proteins. Some of them may be involved in resistance to environmental stress [].
Probab=49.11  E-value=11  Score=26.36  Aligned_cols=9  Identities=33%  Similarity=0.224  Sum_probs=5.0

Q ss_pred             ChhhHHHHH
Q 040068            1 MAAILKFLA    9 (123)
Q Consensus         1 Ma~~~k~l~    9 (123)
                      ||++..+|+
T Consensus         1 MaSK~~llL    9 (95)
T PF07172_consen    1 MASKAFLLL    9 (95)
T ss_pred             CchhHHHHH
Confidence            885544443


No 7  
>PLN02340 endoglucanase
Probab=48.54  E-value=17  Score=33.30  Aligned_cols=77  Identities=14%  Similarity=0.133  Sum_probs=51.1

Q ss_pred             CCceeEEEeecCCCCCC---eeeEEEEEeeccccccccEEEEcCCcc-cccccCCCceEEeCCeeEEccCC-cccCCCCe
Q 040068           26 LDDIKVSQSQTGKTVPN---KQEWRLTLNNTCICTQLELKLSCKGFQ-TVEPIDPSIIAISGDECTVVNNG-NPFYGFTT  100 (123)
Q Consensus        26 ~sdi~V~Q~~tg~~~~G---~p~~~VtI~N~C~C~~~~V~l~C~gF~-S~~~VdP~i~r~~~d~C~Lvn~G-~pi~~~~~  100 (123)
                      ..++++.|.-+..+..+   .-+|+|+|+|+|.=+.+.+++.=..+- ..-.|.|++=   .+.. -+-+= ..|.+|+.
T Consensus       519 ~~~~e~~~~~~~sw~~~g~~y~~~~v~i~N~s~~pi~~l~~~~~~l~g~lwgl~~~~~---~~~y-~~p~~~~tl~~g~~  594 (614)
T PLN02340        519 GAPVEFVHSITNTWTAGGTTYYRHKVIIKNKSQKPITDLKLVIEDLSGPIWGLNPTKE---KNTY-ELPQWQKVLQPGSQ  594 (614)
T ss_pred             CCchhhhhhheeeeecCCceEEEEEEEEEeCCCCCchhhhhhhhhcccchhcceeccc---cCCc-cCchhhhccCCCCe
Confidence            44567777777666654   678999999999999999988764443 2222333211   2332 33332 47888999


Q ss_pred             EEEEee
Q 040068          101 LSFNYA  106 (123)
Q Consensus       101 v~F~Ya  106 (123)
                      ++|.|-
T Consensus       595 ~~f~yi  600 (614)
T PLN02340        595 LSFVYV  600 (614)
T ss_pred             eEEEec
Confidence            999998


No 8  
>PF06483 ChiC:  Chitinase C;  InterPro: IPR009470 This ~170 aa region is found at the C-terminal to the catalytic domain (IPR001223 from INTERPRO) found in members of glycoside hydrolase family 18.
Probab=46.90  E-value=37  Score=26.64  Aligned_cols=50  Identities=26%  Similarity=0.452  Sum_probs=37.3

Q ss_pred             ccccEEEEcCCcc---cccccCCCceEEeCCeeEEccCCcccCCCCeEEEEeecCCccce
Q 040068           57 TQLELKLSCKGFQ---TVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYAWDTSFPF  113 (123)
Q Consensus        57 ~~~~V~l~C~gF~---S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~~~f~~  113 (123)
                      ..-||.++=+||+   +-=||+|++-=       .=|.++.|+.|..++|.|+-+.+-.+
T Consensus        34 ~~ldv~v~~~gf~~GD~NYPI~Pkl~i-------TNns~~~iPGGt~~~FD~ptSa~~~~   86 (180)
T PF06483_consen   34 EALDVSVSFTGFKLGDSNYPINPKLTI-------TNNSGQTIPGGTEFEFDYPTSAPDNA   86 (180)
T ss_pred             ceEEEEEEeCCcccCCCCCCcCCcEEE-------EcCCCcccCCccEEEEccccCCcccc
Confidence            3457788888996   55788887641       33567899999999999997776543


No 9  
>PF07705 CARDB:  CARDB;  InterPro: IPR011635 The APHP (acidic peptide-dependent hydrolases/peptidase) domain is found in a variety of different proteins.; PDB: 2KUT_A 2L0D_A 3IDU_A 2KL6_A.
Probab=45.45  E-value=79  Score=20.13  Aligned_cols=69  Identities=9%  Similarity=0.164  Sum_probs=33.8

Q ss_pred             CceeE--EEeecCCCCCCeeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCcccCCCCeEEEE
Q 040068           27 DDIKV--SQSQTGKTVPNKQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFN  104 (123)
Q Consensus        27 sdi~V--~Q~~tg~~~~G~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~  104 (123)
                      -||.|  ...+.-...+..-+..|+|.|.=.-...++.+.  .+.+...+         +.- -|   ..|.+|++..++
T Consensus         2 pDL~v~~~~~~~~~~~g~~~~i~~~V~N~G~~~~~~~~v~--~~~~~~~~---------~~~-~i---~~L~~g~~~~v~   66 (101)
T PF07705_consen    2 PDLTVSITVSPSNVVPGEPVTITVTVKNNGTADAENVTVR--LYLDGNSV---------STV-TI---PSLAPGESETVT   66 (101)
T ss_dssp             --EEE-EEEC-SEEETTSEEEEEEEEEE-SSS-BEEEEEE--EEETTEEE---------EEE-EE---SEB-TTEEEEEE
T ss_pred             CCEEEEEeeCCCcccCCCEEEEEEEEEECCCCCCCCEEEE--EEECCcee---------ccE-EE---CCcCCCcEEEEE
Confidence            36666  222333334446679999999876556666665  11111111         111 22   477888886666


Q ss_pred             eecCCc
Q 040068          105 YAWDTS  110 (123)
Q Consensus       105 Yaw~~~  110 (123)
                      +.|..+
T Consensus        67 ~~~~~~   72 (101)
T PF07705_consen   67 FTWTPP   72 (101)
T ss_dssp             EEEE-S
T ss_pred             EEEEeC
Confidence            666543


No 10 
>COG3900 Predicted periplasmic protein [Function unknown]
Probab=43.20  E-value=15  Score=30.20  Aligned_cols=26  Identities=31%  Similarity=0.474  Sum_probs=21.6

Q ss_pred             eeEEEee---------cCCCCCCeeeEEEEEeecc
Q 040068           29 IKVSQSQ---------TGKTVPNKQEWRLTLNNTC   54 (123)
Q Consensus        29 i~V~Q~~---------tg~~~~G~p~~~VtI~N~C   54 (123)
                      |=|+|..         |.+.+.|-|||.|++.|.=
T Consensus       190 IWIsqGeqpvp~k~VITsk~v~g~PqYtv~fsnwk  224 (262)
T COG3900         190 IWISQGEQPVPLKYVITSKDVPGEPQYTVVFSNWK  224 (262)
T ss_pred             EEeecCCCCcceeEEEEecccCCCCcEEEEEcccc
Confidence            6667764         7888999999999999964


No 11 
>KOG3358 consensus Uncharacterized secreted protein SDF2 (Stromal cell-derived factor 2), contains MIR domains [General function prediction only]
Probab=38.32  E-value=24  Score=28.10  Aligned_cols=40  Identities=23%  Similarity=0.409  Sum_probs=29.5

Q ss_pred             CCcccccccCC--CceEEeCCeeEEccCCcccCCCCeEEEEeecC
Q 040068           66 KGFQTVEPIDP--SIIAISGDECTVVNNGNPFYGFTTLSFNYAWD  108 (123)
Q Consensus        66 ~gF~S~~~VdP--~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~  108 (123)
                      .||..++.+|.  .|....+..|   +.|.||.-|++|+.+--..
T Consensus        60 Tgv~~~dD~NSyW~Ik~~~~~~c---~rG~pikcG~~iRL~H~~T  101 (211)
T KOG3358|consen   60 TGVEGVDDSNSYWRIKPVSGTTC---ERGDPIKCGQTIRLTHLKT  101 (211)
T ss_pred             ecccccccCcceEEEecCCCCcc---cCCCccccCCeEEEEEeec
Confidence            46777776666  3333447778   8999999999999987643


No 12 
>PF05753 TRAP_beta:  Translocon-associated protein beta (TRAPB);  InterPro: IPR008856 This family consists of several eukaryotic translocon-associated protein beta (TRAPB) or signal sequence receptor beta subunit (SSR-beta) proteins. The normal translocation of nascent polypeptides into the lumen of the endoplasmic reticulum (ER) is thought to be aided in part by a translocon-associated protein (TRAP) complex consisting of 4 protein subunits. The association of mature proteins with the ER and Golgi, or other intracellular locales, such as lysosomes, depends on the initial targeting of the nascent polypeptide to the ER membrane. A similar scenario must also exist for proteins destined for secretion [].; GO: 0005783 endoplasmic reticulum, 0016021 integral to membrane
Probab=36.56  E-value=2e+02  Score=22.13  Aligned_cols=62  Identities=15%  Similarity=0.142  Sum_probs=42.3

Q ss_pred             CCCCCC-eeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCcccCCCCeEEEEee
Q 040068           37 GKTVPN-KQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYA  106 (123)
Q Consensus        37 g~~~~G-~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Ya  106 (123)
                      .-++.| .-+.+++|.|.=.=+-.||.|.=++|.      |.-|....+.  +=-.=..|++|+.++..|.
T Consensus        32 ~~~v~g~~v~V~~~iyN~G~~~A~dV~l~D~~fp------~~~F~lvsG~--~s~~~~~i~pg~~vsh~~v   94 (181)
T PF05753_consen   32 KYLVEGEDVTVTYTIYNVGSSAAYDVKLTDDSFP------PEDFELVSGS--LSASWERIPPGENVSHSYV   94 (181)
T ss_pred             ccccCCcEEEEEEEEEECCCCeEEEEEEECCCCC------ccccEeccCc--eEEEEEEECCCCeEEEEEE
Confidence            334444 667999999999999999999887774      4556644222  1111247888888887776


No 13 
>PF14016 DUF4232:  Protein of unknown function (DUF4232)
Probab=32.79  E-value=83  Score=22.15  Aligned_cols=73  Identities=19%  Similarity=0.277  Sum_probs=45.8

Q ss_pred             CCCCceeEEEeecCCCCCCeeeEEEEEeecc--ccccccEEEEcCCcccccccCC-------CceEEeCCeeEEccCCcc
Q 040068           24 CTLDDIKVSQSQTGKTVPNKQEWRLTLNNTC--ICTQLELKLSCKGFQTVEPIDP-------SIIAISGDECTVVNNGNP   94 (123)
Q Consensus        24 C~~sdi~V~Q~~tg~~~~G~p~~~VtI~N~C--~C~~~~V~l~C~gF~S~~~VdP-------~i~r~~~d~C~Lvn~G~p   94 (123)
                      |...|++++-..... ..|...+.|+++|+=  .|...       ||..+..+|.       ..-+...     -..--.
T Consensus         1 C~~~~L~~~~~~~~~-~~g~~~~~l~~tN~s~~~C~l~-------G~P~v~~~~~~g~~~~~~~~~~~~-----~~~~vt   67 (131)
T PF14016_consen    1 CTAADLSVTVGPVDA-GAGQRHATLTFTNTSDTPCTLY-------GYPGVALVDADGAPLGVPAVREGP-----PPRPVT   67 (131)
T ss_pred             CCcccEEEEEecccC-CCCccEEEEEEEECCCCcEEec-------cCCcEEEECCCCCcCCccccccCC-----CCCcEE
Confidence            888999998876543 568889999999966  37654       4444444433       2222211     011125


Q ss_pred             cCCCCeEEEEeecCC
Q 040068           95 FYGFTTLSFNYAWDT  109 (123)
Q Consensus        95 i~~~~~v~F~Yaw~~  109 (123)
                      |.+|++..|.=.|..
T Consensus        68 L~PG~sA~a~l~~~~   82 (131)
T PF14016_consen   68 LAPGGSAYAGLRWSN   82 (131)
T ss_pred             ECCCCEEEEEEEEec
Confidence            678888888877755


No 14 
>PF01345 DUF11:  Domain of unknown function DUF11;  InterPro: IPR001434 This group of sequences is represented by a conserved region of about 53 amino acids shared between regions, usually repeated, of proteins from a small number of phylogenetically distant prokaryotes. Examples include a 132-residue region found repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue repeat in a cell wall-anchored protein of Enterococcus faecalis (Streptococcus faecalis), and a 120-residue repeat in Methanobacterium thermoautotrophicum. A similar region is found in some Chlamydia trachomatis outer membrane proteins.  In C. trachomatis, three cysteine-rich proteins (also believed to be lipoproteins), MOMP, OMP6 and OMP3, make up the extracellular matrix of the outer membrane []. They are involved in the essential structural integrity of both the elementary body (EB) and recticulate body (RB) phase. They are thought to be involved in porin formation and, as these bacteria lack the peptidoglycan layer common to most Gram-negative microbes, such proteins are highly important in the pathogenicity of the organism.; GO: 0005727 extrachromosomal circular DNA
Probab=30.05  E-value=1.4e+02  Score=18.80  Aligned_cols=30  Identities=10%  Similarity=0.127  Sum_probs=23.5

Q ss_pred             cCCCCCCeeeEEEEEeeccccccccEEEEc
Q 040068           36 TGKTVPNKQEWRLTLNNTCICTQLELKLSC   65 (123)
Q Consensus        36 tg~~~~G~p~~~VtI~N~C~C~~~~V~l~C   65 (123)
                      ....++..-+|.++|+|.=.-+-.+|.|.-
T Consensus        35 ~~~~~Gd~v~ytitvtN~G~~~a~nv~v~D   64 (76)
T PF01345_consen   35 STANPGDTVTYTITVTNTGPAPATNVVVTD   64 (76)
T ss_pred             CcccCCCEEEEEEEEEECCCCeeEeEEEEE
Confidence            344455677899999999988888888864


No 15 
>PF09640 DUF2027:  Domain of unknown function (DUF2027);  InterPro: IPR018598  This protein domain is of unknown function. though putatively involved in DNA mismatch repair. It is associated with IPR002625 from INTERPRO. ; PDB: 2HUH_A.
Probab=29.71  E-value=30  Score=26.70  Aligned_cols=32  Identities=25%  Similarity=0.333  Sum_probs=19.4

Q ss_pred             cccCCCceEEeCCeeEEccCCcccCCCCeEEEEeecC
Q 040068           72 EPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYAWD  108 (123)
Q Consensus        72 ~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~  108 (123)
                      +|+|.+-+.-..=.||||||     ++--+.|.|...
T Consensus         8 vP~d~k~l~~T~fE~YlVND-----SNYy~~y~y~~~   39 (162)
T PF09640_consen    8 VPEDIKNLSTTRFECYLVND-----SNYYLHYTYLTA   39 (162)
T ss_dssp             EES-TT-TTT--EEEEEEE------SSSEEEEEEEEE
T ss_pred             cccCccccCCCceEEEEEec-----CccEEEEEEEec
Confidence            46676666544335889999     566799999743


No 16 
>PF10633 NPCBM_assoc:  NPCBM-associated, NEW3 domain of alpha-galactosidase;  InterPro: IPR018905 This domain has been named NEW3, but its function is not known. It is found on proteins which are bacterial galactosidases [].; PDB: 1EUT_A 2BZD_A 1WCQ_C 2BER_A 1W8O_A 1EUU_A 1W8N_A.
Probab=27.77  E-value=87  Score=19.98  Aligned_cols=25  Identities=24%  Similarity=0.171  Sum_probs=16.8

Q ss_pred             CCeeeEEEEEeeccccccccEEEEc
Q 040068           41 PNKQEWRLTLNNTCICTQLELKLSC   65 (123)
Q Consensus        41 ~G~p~~~VtI~N~C~C~~~~V~l~C   65 (123)
                      +..-+++++|+|...-+..++.|+-
T Consensus         4 G~~~~~~~tv~N~g~~~~~~v~~~l   28 (78)
T PF10633_consen    4 GETVTVTLTVTNTGTAPLTNVSLSL   28 (78)
T ss_dssp             TEEEEEEEEEE--SSS-BSS-EEEE
T ss_pred             CCEEEEEEEEEECCCCceeeEEEEE
Confidence            3456799999999988888888875


No 17 
>PF14345 GDYXXLXY:  GDYXXLXY protein
Probab=27.50  E-value=46  Score=24.11  Aligned_cols=10  Identities=30%  Similarity=0.949  Sum_probs=7.0

Q ss_pred             ccccCC-CceE
Q 040068           71 VEPIDP-SIIA   80 (123)
Q Consensus        71 ~~~VdP-~i~r   80 (123)
                      .+|||| ++||
T Consensus        27 ~~PvDPRdllr   37 (144)
T PF14345_consen   27 TAPVDPRDLLR   37 (144)
T ss_pred             ecccCcccccc
Confidence            368999 5665


No 18 
>PF06682 DUF1183:  Protein of unknown function (DUF1183);  InterPro: IPR009567 This family consists of several eukaryotic proteins of around 360 residues in length. The function of this family is unknown.
Probab=24.23  E-value=1.2e+02  Score=25.74  Aligned_cols=23  Identities=26%  Similarity=0.512  Sum_probs=19.2

Q ss_pred             cccEEEEcCCcccccccCCCceEEe
Q 040068           58 QLELKLSCKGFQTVEPIDPSIIAIS   82 (123)
Q Consensus        58 ~~~V~l~C~gF~S~~~VdP~i~r~~   82 (123)
                      ...|.|.|.|+.+.+  ||=|||-+
T Consensus        93 lG~~~V~CEGY~~pd--DpyvLkGS  115 (318)
T PF06682_consen   93 LGSTDVSCEGYDYPD--DPYVLKGS  115 (318)
T ss_pred             ecceEEeeecccCCC--CceecCCc
Confidence            456889999999965  99999955


No 19 
>PF03293 Pox_RNA_pol:  Poxvirus DNA-directed RNA polymerase, 18 kD subunit;  InterPro: IPR004973 DNA-directed RNA polymerases 2.7.7.6 from EC (also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme []. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length []. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel. RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:  RNA polymerase I: located in the nucleoli, synthesises precursors of most ribosomal RNAs. RNA polymerase II: occurs in the nucleoplasm, synthesises mRNA precursors.  RNA polymerase III: also occurs in the nucleoplasm, synthesises the precursors of 5S ribosomal RNA, the tRNAs, and a variety of other small nuclear and cytosolic RNAs.   Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kDa, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits. The Poxvirus DNA-directed RNA polymerase (2.7.7.6 from EC) catalyses DNA-template-directed extension of the 3'-end of an RNA strand by one nucleotide at a time. The enzyme consists of at least eight subunits, this is the 18 kDa subunit.; GO: 0003677 DNA binding, 0003899 DNA-directed RNA polymerase activity, 0019083 viral transcription
Probab=22.66  E-value=1.4e+02  Score=22.82  Aligned_cols=48  Identities=13%  Similarity=0.282  Sum_probs=31.1

Q ss_pred             ccccEEEEcCCcccccccCCCceEEe-CCeeEEccCCcccCCCCeEEEEe
Q 040068           57 TQLELKLSCKGFQTVEPIDPSIIAIS-GDECTVVNNGNPFYGFTTLSFNY  105 (123)
Q Consensus        57 ~~~~V~l~C~gF~S~~~VdP~i~r~~-~d~C~Lvn~G~pi~~~~~v~F~Y  105 (123)
                      ..+||.+.|+..-=-..=|..-.... ..-| ++.||..-..|+.|+-.-
T Consensus        93 dESni~V~CgDLiCkl~rdsGtVSf~dsKYC-firNg~vY~ngs~Vsv~L  141 (160)
T PF03293_consen   93 DESNITVQCGDLICKLSRDSGTVSFNDSKYC-FIRNGVVYDNGSEVSVVL  141 (160)
T ss_pred             ccCceEEEcCcEEEEeeccCCeEEecCceEE-EEECCEEecCCCEEEEEe
Confidence            36889999987543222233323322 2349 999999999999987654


No 20 
>KOG4063 consensus Major epididymal secretory protein HE1 [Function unknown]
Probab=21.93  E-value=1.5e+02  Score=22.87  Aligned_cols=34  Identities=21%  Similarity=0.312  Sum_probs=19.9

Q ss_pred             hhHHHHHHHHHHHhhc-c----CccCCCCCceeEEEeec
Q 040068            3 AILKFLAAIMLFTIIT-K----GNCQCTLDDIKVSQSQT   36 (123)
Q Consensus         3 ~~~k~l~~~l~l~l~~-~----g~~~C~~sdi~V~Q~~t   36 (123)
                      +.+|.++++++|.+.+ |    +..+|+.+|-.+.+.+.
T Consensus         4 s~~~~v~l~alls~a~aq~~~t~~k~C~ss~g~~~~V~i   42 (158)
T KOG4063|consen    4 SFLKTVILLALLSLAAAQAISTGVKQCGSSDGTPLEVKI   42 (158)
T ss_pred             HHHHHHHHHHHHHHhhhcccCcccccccCCCCcceEEEe
Confidence            4455544444444443 3    23579988887777664


No 21 
>PF00856 SET:  SET domain;  InterPro: IPR001214 The SET domain appears generally as one part of a larger multidomain protein, and recently there were described three structures of very different proteins with distinct domain compositions: Neurospora crassa DIM-5, a member of the Su(var) family of HKMTs which methylate histone H3 on lysine 9,human SET7 (also called SET9), which methylates H3 on lysine 4 and garden pea Rubisco LSMT, an enzyme that does not modify histones, but instead methylates lysine 14 in the flexible tail of the large subunit of the enzyme Rubisco. The SET domain itself turned out to be an uncommon structure. Although in all three studies, electron density maps revealed the location of the AdoMet or AdoHcy cofactor, the SET domain bears no similarity at all to the canonical/AdoMet-dependent methyltransferase fold. Strictly conserved in the C-terminal motif of the SET domain tyrosine could be involved in abstracting a proton from the protonated amino group of the substrate lysine, promoting its nucleophilic attack on the sulphonium methyl group of the AdoMet cofactor. In contrast to the AdoMet-dependent protein methyltranferases of the classical type, which tend to bind their polypeptide substrates on top of the cofactor, it is noted from the Rubisco LSMT structure that the AdoMet seems to bind in a separate cleft, suggesting how a polypeptide substrate could be subjected to multiple rounds of methylation without having to be released from the enzyme. In contrast, SET7/9 is able to add only a single methyl group to its substrate. It has been demonstrated that association of SET domain and myotubularin-related proteins modulates growth control []. The SET domain-containing Drosophila melanogaster (Fruit fly) protein, enhancer of zeste, has a function in segment determination and the mammalian homologue may be involved in the regulation of gene transcription and chromatin structure. Histone lysine methylation is part of the histone code that regulated chromatin function and epigenetic control of gene function. Histone lysine methyltransferases (HMTase) differ both in their substrate specificity for the various acceptor lysines as well as in their product specificity for the number of methyl groups (one, two, or three) they transfer. With just one exception [], the HMTases belong to SET family that can be classified according to the sequences surrounding the SET domain [, ]. Structural studies on the human SET7/9, a mono-methylase, have revealed the molecular basis for the specificity of the enzyme for the histone-target and the roles of the invariant residues in the SET domain in determining the methylation specificities [].  The pre-SET domain, as found in the SUV39 SET family, contains nine invariant cysteine residues that are grouped into two segments separated by a region of variable length. These 9 cysteines coordinate 3 zinc ions to form to form a triangular cluster, where each of the zinc ions is coordinated by 4 four cysteines to give a tetrahedral configuration. The function of this domain is structural, holding together 2 long segments of random coils. The C-terminal region including the post-SET domain is disordered when not interacting with a histone tail and in the absence of zinc. The three conserved cysteines in the post-SET domain form a zinc-binding site when coupled to a fourth conserved cysteine in the knot-like structure close to the SET domain active site []. The structured post-SET region brings in the C-terminal residues that participate in S-adenosylmethine-binding and histone tail interactions. The three conserved cysteine residues are essential for HMTase activity, as replacement with serine abolishes HMTase activity [], []. ; GO: 0005515 protein binding; PDB: 3TG5_A 3S7F_A 3RIB_B 3TG4_A 3S7J_A 3S7D_A 3S7B_A 3H6L_A 3SMT_A 3K5K_A ....
Probab=21.78  E-value=89  Score=20.88  Aligned_cols=22  Identities=18%  Similarity=0.258  Sum_probs=17.1

Q ss_pred             CeeEEccCCcccCCCCeEEEEe
Q 040068           84 DECTVVNNGNPFYGFTTLSFNY  105 (123)
Q Consensus        84 d~C~Lvn~G~pi~~~~~v~F~Y  105 (123)
                      +.++.+.-.++|.+|+.|...|
T Consensus       140 ~~~~~~~a~r~I~~GeEi~isY  161 (162)
T PF00856_consen  140 GGCLVVRATRDIKKGEEIFISY  161 (162)
T ss_dssp             TTEEEEEESS-B-TTSBEEEES
T ss_pred             cceEEEEECCccCCCCEEEEEE
Confidence            4555888899999999999998


No 22 
>TIGR01451 B_ant_repeat conserved repeat domain. This model represents the conserved region of about 53 amino acids shared between regions, usually repeated, of proteins from a small number of phylogenetically distant prokaryotes. Examples include a 132-residue region found repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue repeat in a cell wall-anchored protein of Enterococcus faecalis, and a 120-residue repeat in Methanobacterium thermoautotrophicum. A similar region is found in some Chlamydial outer membrane proteins.
Probab=21.20  E-value=2e+02  Score=17.38  Aligned_cols=25  Identities=8%  Similarity=0.211  Sum_probs=18.8

Q ss_pred             CCCeeeEEEEEeeccccccccEEEE
Q 040068           40 VPNKQEWRLTLNNTCICTQLELKLS   64 (123)
Q Consensus        40 ~~G~p~~~VtI~N~C~C~~~~V~l~   64 (123)
                      ++-.-+|+++|.|+-.=+..+|.|.
T Consensus        10 ~Gd~v~Yti~v~N~g~~~a~~v~v~   34 (53)
T TIGR01451        10 IGDTITYTITVTNNGNVPATNVVVT   34 (53)
T ss_pred             CCCEEEEEEEEEECCCCceEeEEEE
Confidence            4557789999999876666666654


No 23 
>PRK15301 hypothetical protein; Provisional
Probab=20.41  E-value=2.8e+02  Score=21.89  Aligned_cols=12  Identities=17%  Similarity=0.429  Sum_probs=6.7

Q ss_pred             eeEEEEEeeccc
Q 040068           44 QEWRLTLNNTCI   55 (123)
Q Consensus        44 p~~~VtI~N~C~   55 (123)
                      +|=+|+|.=.|+
T Consensus        56 ~~R~v~vsV~Cp   67 (186)
T PRK15301         56 PEREVNVSVSCP   67 (186)
T ss_pred             cceeEEEEEECC
Confidence            445566655554


Done!