Query 040068
Match_columns 123
No_of_seqs 107 out of 131
Neff 5.2
Searched_HMMs 46136
Date Fri Mar 29 04:48:39 2013
Command hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/040068.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/040068hhsearch_cdd -cpu 12 -v 0
No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 PF09478 CBM49: Carbohydrate b 97.0 0.0039 8.4E-08 41.8 6.9 73 29-106 1-78 (80)
2 PLN02171 endoglucanase 92.1 0.51 1.1E-05 43.0 6.8 77 27-108 535-615 (629)
3 PF07127 Nodulin_late: Late no 70.9 4.4 9.6E-05 25.2 2.4 7 1-7 1-7 (54)
4 PF02933 CDC48_2: Cell divisio 66.7 4.7 0.0001 25.4 1.9 29 91-120 15-43 (64)
5 PF03330 DPBB_1: Rare lipoprot 49.8 20 0.00043 23.2 2.7 38 44-81 38-76 (78)
6 PF07172 GRP: Glycine rich pro 49.1 11 0.00024 26.4 1.4 9 1-9 1-9 (95)
7 PLN02340 endoglucanase 48.5 17 0.00036 33.3 2.9 77 26-106 519-600 (614)
8 PF06483 ChiC: Chitinase C; I 46.9 37 0.00081 26.6 4.2 50 57-113 34-86 (180)
9 PF07705 CARDB: CARDB; InterP 45.5 79 0.0017 20.1 5.4 69 27-110 2-72 (101)
10 COG3900 Predicted periplasmic 43.2 15 0.00032 30.2 1.5 26 29-54 190-224 (262)
11 KOG3358 Uncharacterized secret 38.3 24 0.00052 28.1 2.0 40 66-108 60-101 (211)
12 PF05753 TRAP_beta: Translocon 36.6 2E+02 0.0042 22.1 8.8 62 37-106 32-94 (181)
13 PF14016 DUF4232: Protein of u 32.8 83 0.0018 22.2 4.0 73 24-109 1-82 (131)
14 PF01345 DUF11: Domain of unkn 30.0 1.4E+02 0.003 18.8 4.3 30 36-65 35-64 (76)
15 PF09640 DUF2027: Domain of un 29.7 30 0.00066 26.7 1.3 32 72-108 8-39 (162)
16 PF10633 NPCBM_assoc: NPCBM-as 27.8 87 0.0019 20.0 3.1 25 41-65 4-28 (78)
17 PF14345 GDYXXLXY: GDYXXLXY pr 27.5 46 0.00099 24.1 1.9 10 71-80 27-37 (144)
18 PF06682 DUF1183: Protein of u 24.2 1.2E+02 0.0025 25.7 3.9 23 58-82 93-115 (318)
19 PF03293 Pox_RNA_pol: Poxvirus 22.7 1.4E+02 0.0031 22.8 3.8 48 57-105 93-141 (160)
20 KOG4063 Major epididymal secre 21.9 1.5E+02 0.0032 22.9 3.8 34 3-36 4-42 (158)
21 PF00856 SET: SET domain; Int 21.8 89 0.0019 20.9 2.4 22 84-105 140-161 (162)
22 TIGR01451 B_ant_repeat conserv 21.2 2E+02 0.0044 17.4 3.7 25 40-64 10-34 (53)
23 PRK15301 hypothetical protein; 20.4 2.8E+02 0.006 21.9 5.1 12 44-55 56-67 (186)
No 1
>PF09478 CBM49: Carbohydrate binding domain CBM49; InterPro: IPR019028 A carbohydrate-binding module (CBM) is defined as a contiguous amino acid sequence within a carbohydrate-active enzyme with a discreet fold having carbohydrate-binding activity. A few exceptions are CBMs in cellulosomal scaffolding proteins and rare instances of independent putative CBMs. The requirement of CBMs existing as modules within larger enzymes sets this class of carbohydrate-binding protein apart from other non-catalytic sugar binding proteins such as lectins and sugar transport proteins. CBMs were previously classified as cellulose-binding domains (CBDs) based on the initial discovery of several modules that bound cellulose [, ]. However, additional modules in carbohydrate-active enzymes are continually being found that bind carbohydrates other than cellulose yet otherwise meet the CBM criteria, hence the need to reclassify these polypeptides using more inclusive terminology. Previous classification of cellulose-binding domains were based on amino acid similarity. Groupings of CBDs were called "Types" and numbered with roman numerals (e.g. Type I or Type II CBDs). In keeping with the glycoside hydrolase classification, these groupings are now called families and numbered with Arabic numerals. Families 1 to 13 are the same as Types I to XIII. For a detailed review on the structure and binding modes of CBMs see []. This domain is found at the C-terminal of cellulases and in vitro binding studies have shown it to binds to crystalline cellulose []. ; GO: 0030246 carbohydrate binding, 0005576 extracellular region
Probab=97.03 E-value=0.0039 Score=41.76 Aligned_cols=73 Identities=16% Similarity=0.242 Sum_probs=55.1
Q ss_pred eeEEEeecCCCCCC---eeeEEEEEeeccccccccEEEEcCCcc-cccccCCCceEEeCCeeEEc-cCCcccCCCCeEEE
Q 040068 29 IKVSQSQTGKTVPN---KQEWRLTLNNTCICTQLELKLSCKGFQ-TVEPIDPSIIAISGDECTVV-NNGNPFYGFTTLSF 103 (123)
Q Consensus 29 i~V~Q~~tg~~~~G---~p~~~VtI~N~C~C~~~~V~l~C~gF~-S~~~VdP~i~r~~~d~C~Lv-n~G~pi~~~~~v~F 103 (123)
|+|.|..++.+..| ..+|.|+|+|++.=+++++++.-+.+. +.= .+-+..++.. -+ +--.+|.+|++.+|
T Consensus 1 i~i~q~~~~sW~~~g~~y~qy~v~I~N~~~~~I~~~~i~~~~l~~~iW----~l~~~~~~~y-~lPs~~~~i~pg~s~~F 75 (80)
T PF09478_consen 1 ITITQTLVNSWTENGQTYTQYDVTITNNGSKPIKSLKISIDNLYGSIW----GLDKVSGNTY-TLPSYQPTIKPGQSFTF 75 (80)
T ss_pred CEEEEEEEeEEEeCCEEEEEEEEEEEECCCCeEEEEEEEECccchhhe----eEEeccCCEE-ECCccccccCCCCEEEE
Confidence 68899998888775 457999999999999999999998765 111 2222335665 55 43459999999999
Q ss_pred Eee
Q 040068 104 NYA 106 (123)
Q Consensus 104 ~Ya 106 (123)
-|-
T Consensus 76 GYI 78 (80)
T PF09478_consen 76 GYI 78 (80)
T ss_pred EEE
Confidence 995
No 2
>PLN02171 endoglucanase
Probab=92.13 E-value=0.51 Score=42.96 Aligned_cols=77 Identities=18% Similarity=0.124 Sum_probs=54.3
Q ss_pred CceeEEEeecCCCCC---CeeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCc-ccCCCCeEE
Q 040068 27 DDIKVSQSQTGKTVP---NKQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGN-PFYGFTTLS 102 (123)
Q Consensus 27 sdi~V~Q~~tg~~~~---G~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~-pi~~~~~v~ 102 (123)
+.|+|.|..++.+.. +..+|+|+|+|++..+.+++++.=..+-. |=.=+...++.. -+-+-. .|.+|++.+
T Consensus 535 ~ei~i~q~v~~sW~~~g~~y~qy~v~I~N~s~~~ik~i~i~~~~~~~----~iW~v~~~~ngy-tlPs~~~sL~aG~s~t 609 (629)
T PLN02171 535 SPIEIEQKATASWKAKGRTYYRYSTTVTNRSAKTLKELHLGISKLYG----PLWGLTKAGYGY-VLPSWMPSLPAGKSLE 609 (629)
T ss_pred ceeEEEEEEEEEEEcCCceEEEEEEEEEECCCCceeeeeeeeccccc----cchheeecCCcc-cCchhhcccCCCCeeE
Confidence 358999999988875 47889999999999999999996544421 111111233442 444443 788899999
Q ss_pred EEeecC
Q 040068 103 FNYAWD 108 (123)
Q Consensus 103 F~Yaw~ 108 (123)
|-|=..
T Consensus 610 FgyI~~ 615 (629)
T PLN02171 610 FVYVHS 615 (629)
T ss_pred EEeecC
Confidence 999854
No 3
>PF07127 Nodulin_late: Late nodulin protein; InterPro: IPR009810 This family consists of several plant specific late nodulin sequences which are homologous to the Pisum sativum (Garden pea) ENOD3 protein. ENOD3 is expressed in the late stages of root nodule formation and contains two pairs of cysteine residues toward the proteins C terminus which may be involved in metal-binding [].; GO: 0046872 metal ion binding, 0009878 nodule morphogenesis
Probab=70.94 E-value=4.4 Score=25.15 Aligned_cols=7 Identities=86% Similarity=1.165 Sum_probs=6.5
Q ss_pred ChhhHHH
Q 040068 1 MAAILKF 7 (123)
Q Consensus 1 Ma~~~k~ 7 (123)
||.++|+
T Consensus 1 Ma~ilKF 7 (54)
T PF07127_consen 1 MAKILKF 7 (54)
T ss_pred Cccchhh
Confidence 9999998
No 4
>PF02933 CDC48_2: Cell division protein 48 (CDC48), domain 2; InterPro: IPR004201 This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the C terminus. The VAT-N domain found in AAA ATPases (IPR003959 from INTERPRO) is a substrate 185-residue recognition domain [].; GO: 0005524 ATP binding; PDB: 1QDN_B 1QCS_A 1CR5_C 3QQ8_A 3HU2_A 3HU1_E 3HU3_A 3QWZ_A 3TIW_B 3QQ7_A ....
Probab=66.72 E-value=4.7 Score=25.45 Aligned_cols=29 Identities=28% Similarity=0.525 Sum_probs=23.4
Q ss_pred CCcccCCCCeEEEEeecCCccceeeeeeee
Q 040068 91 NGNPFYGFTTLSFNYAWDTSFPFKPISSQI 120 (123)
Q Consensus 91 ~G~pi~~~~~v~F~Yaw~~~f~~~p~ss~~ 120 (123)
.|+|+..|+.|.|.+. ...++|.+.+.++
T Consensus 15 ~~~pv~~Gd~i~~~~~-~~~~~~~V~~~~P 43 (64)
T PF02933_consen 15 EGRPVTKGDTIVFPFF-GQALPFKVVSTEP 43 (64)
T ss_dssp TTEEEETT-EEEEEET-TEEEEEEEEEECS
T ss_pred cCCCccCCCEEEEEeC-CcEEEEEEEEEEc
Confidence 4699999999999997 6889999887653
No 5
>PF03330 DPBB_1: Rare lipoprotein A (RlpA)-like double-psi beta-barrel; InterPro: IPR009009 Beta barrels are commonly observed in protein structures. They are classified in terms of two integral parameters: the number of strands in the sheet, n, and the shear number, S, a measure of the stagger of the strands in the beta-sheet. These two parameters have been shown to determine the major geometrical features of beta-barrels. Six-stranded beta-barrels with a pseudo-twofold axis are found in several proteins. One involving parallel strands forming two psi structures is known as the double-psi barrel. The first psi structure consists of the loop connecting strands beta1 and beta2 (a 'psi loop') and the strand beta5, whereas the second psi structure consists of the loop connecting strands beta4 and beta5 and the strand beta2. All the psi structures in double-psi barrels have a unique handedness, in that beta1 (beta4), beta2 (beta5) and the loop following beta5 (beta2) form a right-handed helix. The unique handedness may be related to the fact that the twisting angle between the parallel pair of strands is always larger than that between the antiparallel pair [].; PDB: 1N10_B 3D30_A 2BH0_A 2HCZ_X.
Probab=49.80 E-value=20 Score=23.18 Aligned_cols=38 Identities=24% Similarity=0.472 Sum_probs=28.8
Q ss_pred eeEEEEEeeccc-cccccEEEEcCCcccccccCCCceEE
Q 040068 44 QEWRLTLNNTCI-CTQLELKLSCKGFQTVEPIDPSIIAI 81 (123)
Q Consensus 44 p~~~VtI~N~C~-C~~~~V~l~C~gF~S~~~VdP~i~r~ 81 (123)
..-.|+|+++|+ |...++-|+=..|..--..|..++.+
T Consensus 38 ksV~v~V~D~Cp~~~~~~lDLS~~aF~~la~~~~G~i~V 76 (78)
T PF03330_consen 38 KSVTVTVVDRCPGCPPNHLDLSPAAFKALADPDAGVIPV 76 (78)
T ss_dssp CEEEEEEEEE-TTSSSSEEEEEHHHHHHTBSTTCSSEEE
T ss_pred CeEEEEEEccCCCCcCCEEEeCHHHHHHhCCCCceEEEE
Confidence 667899999996 99999999988887655555555543
No 6
>PF07172 GRP: Glycine rich protein family; InterPro: IPR010800 This family consists of glycine rich proteins. Some of them may be involved in resistance to environmental stress [].
Probab=49.11 E-value=11 Score=26.36 Aligned_cols=9 Identities=33% Similarity=0.224 Sum_probs=5.0
Q ss_pred ChhhHHHHH
Q 040068 1 MAAILKFLA 9 (123)
Q Consensus 1 Ma~~~k~l~ 9 (123)
||++..+|+
T Consensus 1 MaSK~~llL 9 (95)
T PF07172_consen 1 MASKAFLLL 9 (95)
T ss_pred CchhHHHHH
Confidence 885544443
No 7
>PLN02340 endoglucanase
Probab=48.54 E-value=17 Score=33.30 Aligned_cols=77 Identities=14% Similarity=0.133 Sum_probs=51.1
Q ss_pred CCceeEEEeecCCCCCC---eeeEEEEEeeccccccccEEEEcCCcc-cccccCCCceEEeCCeeEEccCC-cccCCCCe
Q 040068 26 LDDIKVSQSQTGKTVPN---KQEWRLTLNNTCICTQLELKLSCKGFQ-TVEPIDPSIIAISGDECTVVNNG-NPFYGFTT 100 (123)
Q Consensus 26 ~sdi~V~Q~~tg~~~~G---~p~~~VtI~N~C~C~~~~V~l~C~gF~-S~~~VdP~i~r~~~d~C~Lvn~G-~pi~~~~~ 100 (123)
..++++.|.-+..+..+ .-+|+|+|+|+|.=+.+.+++.=..+- ..-.|.|++= .+.. -+-+= ..|.+|+.
T Consensus 519 ~~~~e~~~~~~~sw~~~g~~y~~~~v~i~N~s~~pi~~l~~~~~~l~g~lwgl~~~~~---~~~y-~~p~~~~tl~~g~~ 594 (614)
T PLN02340 519 GAPVEFVHSITNTWTAGGTTYYRHKVIIKNKSQKPITDLKLVIEDLSGPIWGLNPTKE---KNTY-ELPQWQKVLQPGSQ 594 (614)
T ss_pred CCchhhhhhheeeeecCCceEEEEEEEEEeCCCCCchhhhhhhhhcccchhcceeccc---cCCc-cCchhhhccCCCCe
Confidence 44567777777666654 678999999999999999988764443 2222333211 2332 33332 47888999
Q ss_pred EEEEee
Q 040068 101 LSFNYA 106 (123)
Q Consensus 101 v~F~Ya 106 (123)
++|.|-
T Consensus 595 ~~f~yi 600 (614)
T PLN02340 595 LSFVYV 600 (614)
T ss_pred eEEEec
Confidence 999998
No 8
>PF06483 ChiC: Chitinase C; InterPro: IPR009470 This ~170 aa region is found at the C-terminal to the catalytic domain (IPR001223 from INTERPRO) found in members of glycoside hydrolase family 18.
Probab=46.90 E-value=37 Score=26.64 Aligned_cols=50 Identities=26% Similarity=0.452 Sum_probs=37.3
Q ss_pred ccccEEEEcCCcc---cccccCCCceEEeCCeeEEccCCcccCCCCeEEEEeecCCccce
Q 040068 57 TQLELKLSCKGFQ---TVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYAWDTSFPF 113 (123)
Q Consensus 57 ~~~~V~l~C~gF~---S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~~~f~~ 113 (123)
..-||.++=+||+ +-=||+|++-= .=|.++.|+.|..++|.|+-+.+-.+
T Consensus 34 ~~ldv~v~~~gf~~GD~NYPI~Pkl~i-------TNns~~~iPGGt~~~FD~ptSa~~~~ 86 (180)
T PF06483_consen 34 EALDVSVSFTGFKLGDSNYPINPKLTI-------TNNSGQTIPGGTEFEFDYPTSAPDNA 86 (180)
T ss_pred ceEEEEEEeCCcccCCCCCCcCCcEEE-------EcCCCcccCCccEEEEccccCCcccc
Confidence 3457788888996 55788887641 33567899999999999997776543
No 9
>PF07705 CARDB: CARDB; InterPro: IPR011635 The APHP (acidic peptide-dependent hydrolases/peptidase) domain is found in a variety of different proteins.; PDB: 2KUT_A 2L0D_A 3IDU_A 2KL6_A.
Probab=45.45 E-value=79 Score=20.13 Aligned_cols=69 Identities=9% Similarity=0.164 Sum_probs=33.8
Q ss_pred CceeE--EEeecCCCCCCeeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCcccCCCCeEEEE
Q 040068 27 DDIKV--SQSQTGKTVPNKQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFN 104 (123)
Q Consensus 27 sdi~V--~Q~~tg~~~~G~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~ 104 (123)
-||.| ...+.-...+..-+..|+|.|.=.-...++.+. .+.+...+ +.- -| ..|.+|++..++
T Consensus 2 pDL~v~~~~~~~~~~~g~~~~i~~~V~N~G~~~~~~~~v~--~~~~~~~~---------~~~-~i---~~L~~g~~~~v~ 66 (101)
T PF07705_consen 2 PDLTVSITVSPSNVVPGEPVTITVTVKNNGTADAENVTVR--LYLDGNSV---------STV-TI---PSLAPGESETVT 66 (101)
T ss_dssp --EEE-EEEC-SEEETTSEEEEEEEEEE-SSS-BEEEEEE--EEETTEEE---------EEE-EE---SEB-TTEEEEEE
T ss_pred CCEEEEEeeCCCcccCCCEEEEEEEEEECCCCCCCCEEEE--EEECCcee---------ccE-EE---CCcCCCcEEEEE
Confidence 36666 222333334446679999999876556666665 11111111 111 22 477888886666
Q ss_pred eecCCc
Q 040068 105 YAWDTS 110 (123)
Q Consensus 105 Yaw~~~ 110 (123)
+.|..+
T Consensus 67 ~~~~~~ 72 (101)
T PF07705_consen 67 FTWTPP 72 (101)
T ss_dssp EEEE-S
T ss_pred EEEEeC
Confidence 666543
No 10
>COG3900 Predicted periplasmic protein [Function unknown]
Probab=43.20 E-value=15 Score=30.20 Aligned_cols=26 Identities=31% Similarity=0.474 Sum_probs=21.6
Q ss_pred eeEEEee---------cCCCCCCeeeEEEEEeecc
Q 040068 29 IKVSQSQ---------TGKTVPNKQEWRLTLNNTC 54 (123)
Q Consensus 29 i~V~Q~~---------tg~~~~G~p~~~VtI~N~C 54 (123)
|=|+|.. |.+.+.|-|||.|++.|.=
T Consensus 190 IWIsqGeqpvp~k~VITsk~v~g~PqYtv~fsnwk 224 (262)
T COG3900 190 IWISQGEQPVPLKYVITSKDVPGEPQYTVVFSNWK 224 (262)
T ss_pred EEeecCCCCcceeEEEEecccCCCCcEEEEEcccc
Confidence 6667764 7888999999999999964
No 11
>KOG3358 consensus Uncharacterized secreted protein SDF2 (Stromal cell-derived factor 2), contains MIR domains [General function prediction only]
Probab=38.32 E-value=24 Score=28.10 Aligned_cols=40 Identities=23% Similarity=0.409 Sum_probs=29.5
Q ss_pred CCcccccccCC--CceEEeCCeeEEccCCcccCCCCeEEEEeecC
Q 040068 66 KGFQTVEPIDP--SIIAISGDECTVVNNGNPFYGFTTLSFNYAWD 108 (123)
Q Consensus 66 ~gF~S~~~VdP--~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~ 108 (123)
.||..++.+|. .|....+..| +.|.||.-|++|+.+--..
T Consensus 60 Tgv~~~dD~NSyW~Ik~~~~~~c---~rG~pikcG~~iRL~H~~T 101 (211)
T KOG3358|consen 60 TGVEGVDDSNSYWRIKPVSGTTC---ERGDPIKCGQTIRLTHLKT 101 (211)
T ss_pred ecccccccCcceEEEecCCCCcc---cCCCccccCCeEEEEEeec
Confidence 46777776666 3333447778 8999999999999987643
No 12
>PF05753 TRAP_beta: Translocon-associated protein beta (TRAPB); InterPro: IPR008856 This family consists of several eukaryotic translocon-associated protein beta (TRAPB) or signal sequence receptor beta subunit (SSR-beta) proteins. The normal translocation of nascent polypeptides into the lumen of the endoplasmic reticulum (ER) is thought to be aided in part by a translocon-associated protein (TRAP) complex consisting of 4 protein subunits. The association of mature proteins with the ER and Golgi, or other intracellular locales, such as lysosomes, depends on the initial targeting of the nascent polypeptide to the ER membrane. A similar scenario must also exist for proteins destined for secretion [].; GO: 0005783 endoplasmic reticulum, 0016021 integral to membrane
Probab=36.56 E-value=2e+02 Score=22.13 Aligned_cols=62 Identities=15% Similarity=0.142 Sum_probs=42.3
Q ss_pred CCCCCC-eeeEEEEEeeccccccccEEEEcCCcccccccCCCceEEeCCeeEEccCCcccCCCCeEEEEee
Q 040068 37 GKTVPN-KQEWRLTLNNTCICTQLELKLSCKGFQTVEPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYA 106 (123)
Q Consensus 37 g~~~~G-~p~~~VtI~N~C~C~~~~V~l~C~gF~S~~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Ya 106 (123)
.-++.| .-+.+++|.|.=.=+-.||.|.=++|. |.-|....+. +=-.=..|++|+.++..|.
T Consensus 32 ~~~v~g~~v~V~~~iyN~G~~~A~dV~l~D~~fp------~~~F~lvsG~--~s~~~~~i~pg~~vsh~~v 94 (181)
T PF05753_consen 32 KYLVEGEDVTVTYTIYNVGSSAAYDVKLTDDSFP------PEDFELVSGS--LSASWERIPPGENVSHSYV 94 (181)
T ss_pred ccccCCcEEEEEEEEEECCCCeEEEEEEECCCCC------ccccEeccCc--eEEEEEEECCCCeEEEEEE
Confidence 334444 667999999999999999999887774 4556644222 1111247888888887776
No 13
>PF14016 DUF4232: Protein of unknown function (DUF4232)
Probab=32.79 E-value=83 Score=22.15 Aligned_cols=73 Identities=19% Similarity=0.277 Sum_probs=45.8
Q ss_pred CCCCceeEEEeecCCCCCCeeeEEEEEeecc--ccccccEEEEcCCcccccccCC-------CceEEeCCeeEEccCCcc
Q 040068 24 CTLDDIKVSQSQTGKTVPNKQEWRLTLNNTC--ICTQLELKLSCKGFQTVEPIDP-------SIIAISGDECTVVNNGNP 94 (123)
Q Consensus 24 C~~sdi~V~Q~~tg~~~~G~p~~~VtI~N~C--~C~~~~V~l~C~gF~S~~~VdP-------~i~r~~~d~C~Lvn~G~p 94 (123)
|...|++++-..... ..|...+.|+++|+= .|... ||..+..+|. ..-+... -..--.
T Consensus 1 C~~~~L~~~~~~~~~-~~g~~~~~l~~tN~s~~~C~l~-------G~P~v~~~~~~g~~~~~~~~~~~~-----~~~~vt 67 (131)
T PF14016_consen 1 CTAADLSVTVGPVDA-GAGQRHATLTFTNTSDTPCTLY-------GYPGVALVDADGAPLGVPAVREGP-----PPRPVT 67 (131)
T ss_pred CCcccEEEEEecccC-CCCccEEEEEEEECCCCcEEec-------cCCcEEEECCCCCcCCccccccCC-----CCCcEE
Confidence 888999998876543 568889999999966 37654 4444444433 2222211 011125
Q ss_pred cCCCCeEEEEeecCC
Q 040068 95 FYGFTTLSFNYAWDT 109 (123)
Q Consensus 95 i~~~~~v~F~Yaw~~ 109 (123)
|.+|++..|.=.|..
T Consensus 68 L~PG~sA~a~l~~~~ 82 (131)
T PF14016_consen 68 LAPGGSAYAGLRWSN 82 (131)
T ss_pred ECCCCEEEEEEEEec
Confidence 678888888877755
No 14
>PF01345 DUF11: Domain of unknown function DUF11; InterPro: IPR001434 This group of sequences is represented by a conserved region of about 53 amino acids shared between regions, usually repeated, of proteins from a small number of phylogenetically distant prokaryotes. Examples include a 132-residue region found repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue repeat in a cell wall-anchored protein of Enterococcus faecalis (Streptococcus faecalis), and a 120-residue repeat in Methanobacterium thermoautotrophicum. A similar region is found in some Chlamydia trachomatis outer membrane proteins. In C. trachomatis, three cysteine-rich proteins (also believed to be lipoproteins), MOMP, OMP6 and OMP3, make up the extracellular matrix of the outer membrane []. They are involved in the essential structural integrity of both the elementary body (EB) and recticulate body (RB) phase. They are thought to be involved in porin formation and, as these bacteria lack the peptidoglycan layer common to most Gram-negative microbes, such proteins are highly important in the pathogenicity of the organism.; GO: 0005727 extrachromosomal circular DNA
Probab=30.05 E-value=1.4e+02 Score=18.80 Aligned_cols=30 Identities=10% Similarity=0.127 Sum_probs=23.5
Q ss_pred cCCCCCCeeeEEEEEeeccccccccEEEEc
Q 040068 36 TGKTVPNKQEWRLTLNNTCICTQLELKLSC 65 (123)
Q Consensus 36 tg~~~~G~p~~~VtI~N~C~C~~~~V~l~C 65 (123)
....++..-+|.++|+|.=.-+-.+|.|.-
T Consensus 35 ~~~~~Gd~v~ytitvtN~G~~~a~nv~v~D 64 (76)
T PF01345_consen 35 STANPGDTVTYTITVTNTGPAPATNVVVTD 64 (76)
T ss_pred CcccCCCEEEEEEEEEECCCCeeEeEEEEE
Confidence 344455677899999999988888888864
No 15
>PF09640 DUF2027: Domain of unknown function (DUF2027); InterPro: IPR018598 This protein domain is of unknown function. though putatively involved in DNA mismatch repair. It is associated with IPR002625 from INTERPRO. ; PDB: 2HUH_A.
Probab=29.71 E-value=30 Score=26.70 Aligned_cols=32 Identities=25% Similarity=0.333 Sum_probs=19.4
Q ss_pred cccCCCceEEeCCeeEEccCCcccCCCCeEEEEeecC
Q 040068 72 EPIDPSIIAISGDECTVVNNGNPFYGFTTLSFNYAWD 108 (123)
Q Consensus 72 ~~VdP~i~r~~~d~C~Lvn~G~pi~~~~~v~F~Yaw~ 108 (123)
+|+|.+-+.-..=.|||||| ++--+.|.|...
T Consensus 8 vP~d~k~l~~T~fE~YlVND-----SNYy~~y~y~~~ 39 (162)
T PF09640_consen 8 VPEDIKNLSTTRFECYLVND-----SNYYLHYTYLTA 39 (162)
T ss_dssp EES-TT-TTT--EEEEEEE------SSSEEEEEEEEE
T ss_pred cccCccccCCCceEEEEEec-----CccEEEEEEEec
Confidence 46676666544335889999 566799999743
No 16
>PF10633 NPCBM_assoc: NPCBM-associated, NEW3 domain of alpha-galactosidase; InterPro: IPR018905 This domain has been named NEW3, but its function is not known. It is found on proteins which are bacterial galactosidases [].; PDB: 1EUT_A 2BZD_A 1WCQ_C 2BER_A 1W8O_A 1EUU_A 1W8N_A.
Probab=27.77 E-value=87 Score=19.98 Aligned_cols=25 Identities=24% Similarity=0.171 Sum_probs=16.8
Q ss_pred CCeeeEEEEEeeccccccccEEEEc
Q 040068 41 PNKQEWRLTLNNTCICTQLELKLSC 65 (123)
Q Consensus 41 ~G~p~~~VtI~N~C~C~~~~V~l~C 65 (123)
+..-+++++|+|...-+..++.|+-
T Consensus 4 G~~~~~~~tv~N~g~~~~~~v~~~l 28 (78)
T PF10633_consen 4 GETVTVTLTVTNTGTAPLTNVSLSL 28 (78)
T ss_dssp TEEEEEEEEEE--SSS-BSS-EEEE
T ss_pred CCEEEEEEEEEECCCCceeeEEEEE
Confidence 3456799999999988888888875
No 17
>PF14345 GDYXXLXY: GDYXXLXY protein
Probab=27.50 E-value=46 Score=24.11 Aligned_cols=10 Identities=30% Similarity=0.949 Sum_probs=7.0
Q ss_pred ccccCC-CceE
Q 040068 71 VEPIDP-SIIA 80 (123)
Q Consensus 71 ~~~VdP-~i~r 80 (123)
.+|||| ++||
T Consensus 27 ~~PvDPRdllr 37 (144)
T PF14345_consen 27 TAPVDPRDLLR 37 (144)
T ss_pred ecccCcccccc
Confidence 368999 5665
No 18
>PF06682 DUF1183: Protein of unknown function (DUF1183); InterPro: IPR009567 This family consists of several eukaryotic proteins of around 360 residues in length. The function of this family is unknown.
Probab=24.23 E-value=1.2e+02 Score=25.74 Aligned_cols=23 Identities=26% Similarity=0.512 Sum_probs=19.2
Q ss_pred cccEEEEcCCcccccccCCCceEEe
Q 040068 58 QLELKLSCKGFQTVEPIDPSIIAIS 82 (123)
Q Consensus 58 ~~~V~l~C~gF~S~~~VdP~i~r~~ 82 (123)
...|.|.|.|+.+.+ ||=|||-+
T Consensus 93 lG~~~V~CEGY~~pd--DpyvLkGS 115 (318)
T PF06682_consen 93 LGSTDVSCEGYDYPD--DPYVLKGS 115 (318)
T ss_pred ecceEEeeecccCCC--CceecCCc
Confidence 456889999999965 99999955
No 19
>PF03293 Pox_RNA_pol: Poxvirus DNA-directed RNA polymerase, 18 kD subunit; InterPro: IPR004973 DNA-directed RNA polymerases 2.7.7.6 from EC (also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme []. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length []. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel. RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise: RNA polymerase I: located in the nucleoli, synthesises precursors of most ribosomal RNAs. RNA polymerase II: occurs in the nucleoplasm, synthesises mRNA precursors. RNA polymerase III: also occurs in the nucleoplasm, synthesises the precursors of 5S ribosomal RNA, the tRNAs, and a variety of other small nuclear and cytosolic RNAs. Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kDa, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits. The Poxvirus DNA-directed RNA polymerase (2.7.7.6 from EC) catalyses DNA-template-directed extension of the 3'-end of an RNA strand by one nucleotide at a time. The enzyme consists of at least eight subunits, this is the 18 kDa subunit.; GO: 0003677 DNA binding, 0003899 DNA-directed RNA polymerase activity, 0019083 viral transcription
Probab=22.66 E-value=1.4e+02 Score=22.82 Aligned_cols=48 Identities=13% Similarity=0.282 Sum_probs=31.1
Q ss_pred ccccEEEEcCCcccccccCCCceEEe-CCeeEEccCCcccCCCCeEEEEe
Q 040068 57 TQLELKLSCKGFQTVEPIDPSIIAIS-GDECTVVNNGNPFYGFTTLSFNY 105 (123)
Q Consensus 57 ~~~~V~l~C~gF~S~~~VdP~i~r~~-~d~C~Lvn~G~pi~~~~~v~F~Y 105 (123)
..+||.+.|+..-=-..=|..-.... ..-| ++.||..-..|+.|+-.-
T Consensus 93 dESni~V~CgDLiCkl~rdsGtVSf~dsKYC-firNg~vY~ngs~Vsv~L 141 (160)
T PF03293_consen 93 DESNITVQCGDLICKLSRDSGTVSFNDSKYC-FIRNGVVYDNGSEVSVVL 141 (160)
T ss_pred ccCceEEEcCcEEEEeeccCCeEEecCceEE-EEECCEEecCCCEEEEEe
Confidence 36889999987543222233323322 2349 999999999999987654
No 20
>KOG4063 consensus Major epididymal secretory protein HE1 [Function unknown]
Probab=21.93 E-value=1.5e+02 Score=22.87 Aligned_cols=34 Identities=21% Similarity=0.312 Sum_probs=19.9
Q ss_pred hhHHHHHHHHHHHhhc-c----CccCCCCCceeEEEeec
Q 040068 3 AILKFLAAIMLFTIIT-K----GNCQCTLDDIKVSQSQT 36 (123)
Q Consensus 3 ~~~k~l~~~l~l~l~~-~----g~~~C~~sdi~V~Q~~t 36 (123)
+.+|.++++++|.+.+ | +..+|+.+|-.+.+.+.
T Consensus 4 s~~~~v~l~alls~a~aq~~~t~~k~C~ss~g~~~~V~i 42 (158)
T KOG4063|consen 4 SFLKTVILLALLSLAAAQAISTGVKQCGSSDGTPLEVKI 42 (158)
T ss_pred HHHHHHHHHHHHHHhhhcccCcccccccCCCCcceEEEe
Confidence 4455544444444443 3 23579988887777664
No 21
>PF00856 SET: SET domain; InterPro: IPR001214 The SET domain appears generally as one part of a larger multidomain protein, and recently there were described three structures of very different proteins with distinct domain compositions: Neurospora crassa DIM-5, a member of the Su(var) family of HKMTs which methylate histone H3 on lysine 9,human SET7 (also called SET9), which methylates H3 on lysine 4 and garden pea Rubisco LSMT, an enzyme that does not modify histones, but instead methylates lysine 14 in the flexible tail of the large subunit of the enzyme Rubisco. The SET domain itself turned out to be an uncommon structure. Although in all three studies, electron density maps revealed the location of the AdoMet or AdoHcy cofactor, the SET domain bears no similarity at all to the canonical/AdoMet-dependent methyltransferase fold. Strictly conserved in the C-terminal motif of the SET domain tyrosine could be involved in abstracting a proton from the protonated amino group of the substrate lysine, promoting its nucleophilic attack on the sulphonium methyl group of the AdoMet cofactor. In contrast to the AdoMet-dependent protein methyltranferases of the classical type, which tend to bind their polypeptide substrates on top of the cofactor, it is noted from the Rubisco LSMT structure that the AdoMet seems to bind in a separate cleft, suggesting how a polypeptide substrate could be subjected to multiple rounds of methylation without having to be released from the enzyme. In contrast, SET7/9 is able to add only a single methyl group to its substrate. It has been demonstrated that association of SET domain and myotubularin-related proteins modulates growth control []. The SET domain-containing Drosophila melanogaster (Fruit fly) protein, enhancer of zeste, has a function in segment determination and the mammalian homologue may be involved in the regulation of gene transcription and chromatin structure. Histone lysine methylation is part of the histone code that regulated chromatin function and epigenetic control of gene function. Histone lysine methyltransferases (HMTase) differ both in their substrate specificity for the various acceptor lysines as well as in their product specificity for the number of methyl groups (one, two, or three) they transfer. With just one exception [], the HMTases belong to SET family that can be classified according to the sequences surrounding the SET domain [, ]. Structural studies on the human SET7/9, a mono-methylase, have revealed the molecular basis for the specificity of the enzyme for the histone-target and the roles of the invariant residues in the SET domain in determining the methylation specificities []. The pre-SET domain, as found in the SUV39 SET family, contains nine invariant cysteine residues that are grouped into two segments separated by a region of variable length. These 9 cysteines coordinate 3 zinc ions to form to form a triangular cluster, where each of the zinc ions is coordinated by 4 four cysteines to give a tetrahedral configuration. The function of this domain is structural, holding together 2 long segments of random coils. The C-terminal region including the post-SET domain is disordered when not interacting with a histone tail and in the absence of zinc. The three conserved cysteines in the post-SET domain form a zinc-binding site when coupled to a fourth conserved cysteine in the knot-like structure close to the SET domain active site []. The structured post-SET region brings in the C-terminal residues that participate in S-adenosylmethine-binding and histone tail interactions. The three conserved cysteine residues are essential for HMTase activity, as replacement with serine abolishes HMTase activity [], []. ; GO: 0005515 protein binding; PDB: 3TG5_A 3S7F_A 3RIB_B 3TG4_A 3S7J_A 3S7D_A 3S7B_A 3H6L_A 3SMT_A 3K5K_A ....
Probab=21.78 E-value=89 Score=20.88 Aligned_cols=22 Identities=18% Similarity=0.258 Sum_probs=17.1
Q ss_pred CeeEEccCCcccCCCCeEEEEe
Q 040068 84 DECTVVNNGNPFYGFTTLSFNY 105 (123)
Q Consensus 84 d~C~Lvn~G~pi~~~~~v~F~Y 105 (123)
+.++.+.-.++|.+|+.|...|
T Consensus 140 ~~~~~~~a~r~I~~GeEi~isY 161 (162)
T PF00856_consen 140 GGCLVVRATRDIKKGEEIFISY 161 (162)
T ss_dssp TTEEEEEESS-B-TTSBEEEES
T ss_pred cceEEEEECCccCCCCEEEEEE
Confidence 4555888899999999999998
No 22
>TIGR01451 B_ant_repeat conserved repeat domain. This model represents the conserved region of about 53 amino acids shared between regions, usually repeated, of proteins from a small number of phylogenetically distant prokaryotes. Examples include a 132-residue region found repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue repeat in a cell wall-anchored protein of Enterococcus faecalis, and a 120-residue repeat in Methanobacterium thermoautotrophicum. A similar region is found in some Chlamydial outer membrane proteins.
Probab=21.20 E-value=2e+02 Score=17.38 Aligned_cols=25 Identities=8% Similarity=0.211 Sum_probs=18.8
Q ss_pred CCCeeeEEEEEeeccccccccEEEE
Q 040068 40 VPNKQEWRLTLNNTCICTQLELKLS 64 (123)
Q Consensus 40 ~~G~p~~~VtI~N~C~C~~~~V~l~ 64 (123)
++-.-+|+++|.|+-.=+..+|.|.
T Consensus 10 ~Gd~v~Yti~v~N~g~~~a~~v~v~ 34 (53)
T TIGR01451 10 IGDTITYTITVTNNGNVPATNVVVT 34 (53)
T ss_pred CCCEEEEEEEEEECCCCceEeEEEE
Confidence 4557789999999876666666654
No 23
>PRK15301 hypothetical protein; Provisional
Probab=20.41 E-value=2.8e+02 Score=21.89 Aligned_cols=12 Identities=17% Similarity=0.429 Sum_probs=6.7
Q ss_pred eeEEEEEeeccc
Q 040068 44 QEWRLTLNNTCI 55 (123)
Q Consensus 44 p~~~VtI~N~C~ 55 (123)
+|=+|+|.=.|+
T Consensus 56 ~~R~v~vsV~Cp 67 (186)
T PRK15301 56 PEREVNVSVSCP 67 (186)
T ss_pred cceeEEEEEECC
Confidence 445566655554
Done!