Query T0541 MvR254A, , 106 residues Match_columns 106 No_of_seqs 115 out of 562 Neff 8.1 Searched_HMMs 11830 Date Fri May 21 18:05:13 2010 Command /home/syshi_2/2008/ferredoxin/manualcheck/update/HHsearch/bin/hhsearch -i /home/syshi_3/CASP9/HHsearch4Targetseq/pfamAsearch/hhm/T0541.hhm -d /home/syshi_2/2008/ferredoxin/manualcheck/update/HHsearch/database/pfamA_24_hhmdb -o /home/syshi_3/CASP9/HHsearch4Targetseq/pfamAsearch/hhm/T0541.hhr No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 PF07705 CARDB: CARDB; InterP 99.9 4.3E-23 3.6E-27 137.3 12.3 99 3-102 2-101 (101) 2 PF10633 NPCBM_assoc: NPCBM-as 98.0 2.9E-06 2.5E-10 49.3 5.5 71 15-85 1-76 (78) 3 PF05753 TRAP_beta: Translocon 97.2 0.001 8.4E-08 36.0 8.2 77 5-84 23-109 (181) 4 PF09624 DUF2393: Protein of u 95.9 0.019 1.6E-06 29.3 8.1 79 3-85 32-114 (119) 5 PF01345 DUF11: Domain of unkn 95.4 0.012 1E-06 30.4 5.3 43 2-44 23-66 (76) 6 PF00927 Transglut_C: Transglu 95.3 0.042 3.6E-06 27.5 7.8 89 3-95 1-98 (106) 7 PF01835 A2M_N: MG2 domain; I 95.3 0.056 4.8E-06 26.9 8.6 75 13-87 9-87 (98) 8 PF01186 Lysyl_oxidase: Lysyl 94.6 0.015 1.3E-06 29.8 4.1 50 55-104 133-188 (205) 9 PF11797 DUF3324: Protein of u 92.8 0.19 1.6E-05 24.1 9.5 86 2-89 27-119 (140) 10 PF07760 DUF1616: Protein of u 87.5 0.58 4.9E-05 21.6 8.2 75 11-87 188-265 (279) 11 PF11906 DUF3426: Protein of u 85.8 0.73 6.2E-05 21.1 10.2 79 4-82 52-148 (149) 12 PF03422 CBM_6: Carbohydrate b 83.9 0.9 7.6E-05 20.6 8.4 65 19-88 45-112 (125) 13 PF02102 Peptidase_M35: Deuter 83.7 0.087 7.4E-06 25.9 0.0 67 17-83 36-131 (359) 14 PF08441 Integrin_alpha2: Inte 81.7 1.1 9.4E-05 20.1 9.6 102 2-104 167-306 (458) 15 PF06832 BiPBP_C: Penicillin-B 76.7 1.6 0.00014 19.2 7.6 42 39-91 44-85 (89) 16 PF04744 Monooxygenase_B: Mone 76.5 1.3 0.00011 19.8 4.0 55 15-69 259-333 (381) 17 PF06159 DUF974: Protein of un 75.3 1.8 0.00015 19.0 7.5 76 13-89 8-92 (242) 18 PF05506 DUF756: Domain of unk 73.4 2 0.00017 18.8 7.4 61 22-87 21-81 (89) 19 PF00553 CBM_2: Cellulose bind 70.1 2.4 0.0002 18.3 5.5 54 18-71 12-84 (101) 20 PF07610 DUF1573: Protein of u 60.9 3.7 0.00031 17.4 4.2 44 25-69 2-45 (45) 21 PF06586 TraK: TraK protein; 59.2 1.9 0.00016 18.9 2.0 24 65-89 102-125 (234) 22 PF02221 E1_DerP2_DerF2: ML do 57.0 4.4 0.00037 17.0 9.3 83 6-88 19-121 (133) 23 PF00207 A2M: Alpha-2-macroglo 53.9 5 0.00042 16.7 4.4 19 12-30 63-81 (92) 24 PF10989 DUF2808: Protein of u 51.1 5.5 0.00047 16.5 8.6 51 39-89 81-132 (146) 25 PF02014 Reeler: Reeler domain 46.6 6.5 0.00055 16.1 7.3 74 11-86 26-124 (132) 26 PF11611 TRF2: Telomeric repea 46.3 6.6 0.00056 16.0 5.2 66 18-83 35-113 (123) 27 PF07919 DUF1683: Protein of u 45.2 6.9 0.00058 15.9 8.2 70 5-79 39-114 (125) 28 PF06280 DUF1034: Fn3-like dom 44.6 7.1 0.0006 15.9 6.9 55 18-73 7-82 (112) 29 PF01917 Arch_flagellin: Archa 41.9 7.8 0.00066 15.7 11.2 43 4-46 50-95 (191) 30 PF05688 DUF824: Salmonella re 33.5 11 0.00092 14.9 4.2 35 11-45 5-39 (47) 31 PF03170 BcsB: Bacterial cellu 30.9 12 0.001 14.7 9.2 65 20-85 45-110 (607) 32 PF06483 ChiC: Chitinase C; I 29.6 13 0.0011 14.6 7.9 30 55-85 126-155 (180) 33 PF10342 Drmip_Hesp: Developme 26.7 14 0.0012 14.3 7.2 75 13-88 9-86 (97) 34 PF00801 PKD: PKD domain; Int 25.9 15 0.0013 14.2 5.1 58 11-85 4-61 (69) 35 PF12389 Peptidase_M73: Camely 24.8 16 0.0013 14.1 6.5 29 12-40 58-86 (199) 36 PF07679 I-set: Immunoglobulin 23.9 16 0.0014 14.0 3.9 70 12-86 9-79 (90) 37 PF06030 DUF916: Bacterial pro 23.4 17 0.0014 13.9 6.1 62 14-78 22-113 (122) 38 PF07732 Cu-oxidase_3: Multico 22.3 18 0.0015 13.8 3.5 66 19-86 33-102 (119) 39 PF00345 Pili_assembly_N: Gram 22.3 18 0.0015 13.8 7.2 86 18-104 13-108 (122) 40 PF11896 DUF3416: Domain of un 21.0 19 0.0016 13.7 8.0 84 1-88 1-94 (187) 41 PF09118 DUF1929: Domain of un 20.9 19 0.0016 13.7 4.0 70 11-85 7-86 (97) No 1 >PF07705 CARDB: CARDB; InterPro: IPR011635 The APHP (acidic peptide-dependent hydrolases/peptidase) domain is found in a variety of different proteins. Probab=99.89 E-value=4.3e-23 Score=137.32 Aligned_cols=99 Identities=41% Similarity=0.667 Sum_probs=93.8 Q ss_pred CCEEEE-EEECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEE Q ss_conf 860786-6025878899717999888855797568638999839937662464773899648898864217778964899 Q T0541 3 PDLVPV-SLTPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLT 81 (106) Q Consensus 3 PDL~v~-~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~ 81 (106) |||.|. .+.|..+.+|+.++++++|+|+|..++.++.++||++|..+.+..++.|++|++++++|+|.++ ..|.|++. T Consensus 2 pDL~I~~~~~~~~~~~g~~~~i~~~V~N~G~~~a~~~~v~~~~~~~~~~~~~i~~L~~g~~~~v~~~~~~~-~~G~~~l~ 80 (101) T PF07705_consen 2 PDLTISISVSPSSPTPGESVTITVTVKNQGTADAENVTVSFYLDGDLVSTVTIPSLAPGESATVTFTWTPP-TSGNYTLT 80 (101) T ss_pred CCEEEEEECCCCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCCCCCCEEECCCCCCCEEEEEEEEEEC-CCCEEEEE T ss_conf 97899960178855689889999999977877666589999989982067794448899689999998717-89819999 Q ss_pred EEECCCCCEEECCCCCCEEEE Q ss_conf 997599958412178855888 Q T0541 82 VNVDPENAVNEGNESNNTLTA 102 (106) Q Consensus 82 v~vD~~n~v~E~~e~NN~~t~ 102 (106) +++|++|.+.|++|+||.+|+ T Consensus 81 ~~iD~~n~v~E~~e~NN~~s~ 101 (101) T PF07705_consen 81 AVIDPDNSVSESNEDNNSFSR 101 (101) T ss_pred EEEECCCCEEECCCCCCCCCC T ss_conf 999599958323467742109 No 2 >PF10633 NPCBM_assoc: NPCBM-associated, NEW3 domain of alpha-galactosidase; PDB: 1w8n_A 1eut_A 1euu_A 2bzd_A 1wcq_B 2ber_A 1w8o_A. Probab=98.04 E-value=2.9e-06 Score=49.27 Aligned_cols=71 Identities=30% Similarity=0.396 Sum_probs=55.2 Q ss_pred CCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEE---EEEEECCCCCCCCEEEEEEEEEEC--CCCCCEEEEEEEC Q ss_conf 8899717999888855797568638999839937---662464773899648898864217--7789648999975 Q T0541 15 VVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIV---VDTQTVTSLESENSTNVDFHWTLD--GTANSYTLTVNVD 85 (106) Q Consensus 15 ~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~---~~~~~v~~L~~G~s~tv~~~~~~~--~~~G~~ti~v~vD 85 (106) ..+|+.++++++|+|.|.....+..+++....-+ .....+..|+||++.+++|...++ +.+|.|.|.+.+- T Consensus 1 v~~G~~~~~~vtv~N~g~~~~~~v~~~l~~P~GW~~~~~~~~~~~l~~G~s~~~t~~V~vp~~a~~G~y~v~v~a~ 76 (78) T PF10633_consen 1 VTPGETVTVTVTVTNTGSAPATNVTLSLELPEGWSVSVSPASVPSLAPGESATVTFTVTVPANAAPGTYPVTVTAR 76 (78) T ss_dssp -----EEEE--EEE------BSS-EEE----TTSE---EE-----B----EE---EEEE---------EEE-EEEE T ss_pred CCCCCEEEEEEEEEECCCCCEEEEEEEEECCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCCCEEEEEEEEE T ss_conf 9799889999999979987122179999799993103586644208989989999999889998995489999999 No 3 >PF05753 TRAP_beta: Translocon-associated protein beta (TRAPB); InterPro: IPR008856 This family consists of several eukaryotic translocon-associated protein beta (TRAPB) or signal sequence receptor beta subunit (SSR-beta) proteins. The normal translocation of nascent polypeptides into the lumen of the endoplasmic reticulum (ER) is thought to be aided in part by a translocon-associated protein (TRAP) complex consisting of 4 protein subunits. The association of mature proteins with the ER and Golgi, or other intracellular locales, such as lysosomes, depends on the initial targeting of the nascent polypeptide to the ER membrane. A similar scenario must also exist for proteins destined for secretion .; GO: 0005783 endoplasmic reticulum, 0016021 integral to membrane Probab=97.16 E-value=0.001 Score=36.03 Aligned_cols=77 Identities=16% Similarity=0.135 Sum_probs=54.3 Q ss_pred EEEE-EEECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEE---------EEEEECCCCCCCCEEEEEEEEEECCC Q ss_conf 0786-6025878899717999888855797568638999839937---------66246477389964889886421777 Q T0541 5 LVPV-SLTPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIV---------VDTQTVTSLESENSTNVDFHWTLDGT 74 (106) Q Consensus 5 L~v~-~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~---------~~~~~v~~L~~G~s~tv~~~~~~~~~ 74 (106) |.+. .+...-...|+.+++.++|.|+|..+|.+ |.+..|+-. ..+.....|+||+..+..+...|. . T Consensus 23 l~v~K~il~~~~v~g~~v~V~~~iyN~G~s~A~d--V~i~D~~~p~~~F~lvsG~~s~~~~~l~pgs~vsh~~vl~p~-~ 99 (181) T PF05753_consen 23 LLVSKSILNRYLVEGEDVTVSYTIYNVGSSPAYD--VSITDDSFPPDDFELVSGSLSASWERLPPGSNVSHSYVLRPK-K 99 (181) T ss_pred EEEEEEECCCCCCCCCEEEEEEEEEECCCCCEEE--EEEECCCCCCCCEEEECCCEEEEEEEECCCCEEEEEEEEEEE-E T ss_conf 9999622444454785799999999779871688--899789999443099748425689985899737899999985-3 Q ss_pred CCCEEEEEEE Q ss_conf 8964899997 Q T0541 75 ANSYTLTVNV 84 (106) Q Consensus 75 ~G~~ti~v~v 84 (106) .|.|.+.... T Consensus 100 ~G~f~~~~A~ 109 (181) T PF05753_consen 100 SGYFNFSPAE 109 (181) T ss_pred EEEEECCEEE T ss_conf 6689734099 No 4 >PF09624 DUF2393: Protein of unknown function (DUF2393) Probab=95.95 E-value=0.019 Score=29.35 Aligned_cols=79 Identities=24% Similarity=0.223 Sum_probs=55.4 Q ss_pred CCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCEEEE--EEECCEEEE--EEECCCCCCCCEEEEEEEEEECCCCCCE Q ss_conf 860786602587889971799988885579756863899--983993766--2464773899648898864217778964 Q T0541 3 PDLVPVSLTPVTVVPNTVNTMTATIENQGNKDSTSFNVS--LLVDGIVVD--TQTVTSLESENSTNVDFHWTLDGTANSY 78 (106) Q Consensus 3 PDL~v~~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~--~y~~g~~~~--~~~v~~L~~G~s~tv~~~~~~~~~~G~~ 78 (106) |+|.+..-. .-..++.+-+-++|+|.|...++.+.|. |+.++.... ...++-|+.++.+.-.|-+..+-.. + T Consensus 32 p~l~v~~~~--i~~~~gqyyVpF~V~N~g~~TAasV~V~geL~~~~~~~e~~e~tiDfl~g~e~~~G~~IF~~dP~~--g 107 (119) T PF09624_consen 32 PILSVSVAQ--IRQVEGQYYVPFTVTNDGGQTAASVQVIGELRQGGGVVEEGEQTIDFLPGGEEAKGAFIFTHDPRD--G 107 (119) T ss_pred CEEEEEEHH--EEEECCEEEEEEEEEECCCCEEEEEEEEEEECCCCCCEEEEEEEEEECCCCCEEEEEEEECCCCCC--C T ss_conf 639997237--699777689999999788775778999999941897047305899974799747689997269445--8 Q ss_pred EEEEEEC Q ss_conf 8999975 Q T0541 79 TLTVNVD 85 (106) Q Consensus 79 ti~v~vD 85 (106) .+++.|- T Consensus 108 ~L~irv~ 114 (119) T PF09624_consen 108 ELRIRVA 114 (119) T ss_pred EEEEEEE T ss_conf 6999998 No 5 >PF01345 DUF11: Domain of unknown function DUF11; InterPro: IPR001434 This group of sequences is represented by a conserved region of about 53 amino acids shared between regions, usually repeated, of proteins from a small number of phylogenetically distant prokaryotes. Examples include a 132-residue region found repeated in three of the five longest proteins of Bacillus anthracis, a 131-residue repeat in a cell wall-anchored protein of Enterococcus faecalis, and a 120-residue repeat in Methanobacterium thermoautotrophicum. A similar region is found in some Chlamydia trachomatis outer membrane proteins. In C. trachomatis, three cysteine-rich proteins (also believed to be lipoproteins), MOMP, OMP6 and OMP3, make up the extracellular matrix of the outer membrane . They are involved in the essential structural integrity of both the elementary body (EB) and recticulate body (RB) phase. They are thought to be involved in porin formation and, as these bacteria lack the peptidoglycan layer common to most Gram-negative microbes, such proteins are highly important in the pathogenicity of the organism.; GO: 0005727 extrachromosomal circular DNA Probab=95.36 E-value=0.012 Score=30.37 Aligned_cols=43 Identities=28% Similarity=0.306 Sum_probs=35.1 Q ss_pred CCCEEEEEE-ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEE Q ss_conf 986078660-2587889971799988885579756863899983 Q T0541 2 IPDLVPVSL-TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLV 44 (106) Q Consensus 2 lPDL~v~~i-~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~ 44 (106) -+||.+... .+..+.+|+.++++++|+|.|...+.+..+.-.+ T Consensus 23 ~~~l~~~kt~~~~~~~~Gd~v~YtitvtN~G~~~a~nV~v~D~l 66 (76) T PF01345_consen 23 SADLSVTKTADPATANPGDTVTYTITVTNTGPSPATNVVVTDTL 66 (76) T ss_pred CCCEEEEEEECCCCCCCCCEEEEEEEEEECCCCCEEEEEEEECC T ss_conf 88679998508881489998999999998899806858999769 No 6 >PF00927 Transglut_C: Transglutaminase family, C-terminal ig like domain; InterPro: IPR008958 Synonym(s): Protein-glutamine gamma-glutamyltransferase, Fibrinoligase, TGase Transglutaminases catalyse the post-translational modification of proteins at glutamine residues, with formation of isopeptide bonds. Members of the transglutaminase family usually have three domains: N-terminal (IPR001102 from INTERPRO), middle (IPR013808 from INTERPRO) and C-terminal. The middle domain is usually well conserved, but family members can display major differences in their N- and C-terminal domains, although their overall structure is conserved . This entry represents the C-terminal domain found in transglutaminases, which consists of an immunoglobulin-like beta-sandwich consisting of seven strands in two sheets with a Greek key topology. The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains, thus stabilizing the fibrin clot. Protein-glutamine gamma-glutamyltransferases (2.3.2.13 from EC) are calcium-dependent enzymes that catalyse the cross-linking of proteins by promoting the formation of isopeptide bonds between the gamma-carboxyl group of a glutamine in one polypeptide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyse the conjugation of polyamines to proteins , .; GO: 0003810 protein-glutamine gamma-glutamyltransferase activity, 0018149 peptide cross-linking; PDB: 1g0d_A 2q3z_A 1kv3_F 1l9m_B 1rle_A 1nuf_A 1l9n_A 1nud_B 1nug_A 1sgx_A .... Probab=95.26 E-value=0.042 Score=27.52 Aligned_cols=89 Identities=16% Similarity=0.160 Sum_probs=59.5 Q ss_pred CCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCEEEEE-----EECCEEE----EEEECCCCCCCCEEEEEEEEEECC Q ss_conf 8607866025878899717999888855797568638999-----8399376----624647738996488988642177 Q T0541 3 PDLVPVSLTPVTVVPNTVNTMTATIENQGNKDSTSFNVSL-----LVDGIVV----DTQTVTSLESENSTNVDFHWTLDG 73 (106) Q Consensus 3 PDL~v~~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~-----y~~g~~~----~~~~v~~L~~G~s~tv~~~~~~~~ 73 (106) |||.+.-. ..+..|+.+.++++++|.-.....+..+.+ +-.|... .......|+||++.++++...+. T Consensus 1 p~l~i~~~--~~~~vG~~~~v~v~~~N~~~~~l~~v~~~~~~~~v~y~G~~~~~~~~~~~~~~l~P~~~~~~~~~i~p~- 77 (106) T PF00927_consen 1 PDLKIKVL--GEAVVGQDFDVTVSFTNPLSEPLRNVTLHLCAFTVEYTGLLRGEVPKKKFEVTLPPGETVTVTVTITPS- 77 (106) T ss_dssp EEEEEEEE--SEEBTTS-EEEEEEEEE-SSS-BEEEEEEEEEEEEETT-BEEECEEEEEEEEEE-TTEEEEEEEEE-HH- T ss_pred CEEEEEEC--CCCCCCCCEEEEEEEECCCCCCCCCCCCCEEEEEEEECCCEEEEEECEECCEEECCCCEEEEEEEEEEC- T ss_conf 93999969--986479989999999889955042453211578997178120110000032578999989999999707- Q ss_pred CCCCEEEEEEECCCCCEEECCC Q ss_conf 7896489999759995841217 Q T0541 74 TANSYTLTVNVDPENAVNEGNE 95 (106) Q Consensus 74 ~~G~~ti~v~vD~~n~v~E~~e 95 (106) ..|.+.+.+..+.. .+....+ T Consensus 78 ~yG~~~~l~~~~~~-~l~~V~~ 98 (106) T PF00927_consen 78 KYGPRKLLVDFNSD-QLADVKG 98 (106) T ss_dssp HH--EEEEEEEEES-SEEEECC T ss_pred CCCCEEEEEEEEEH-HHCCCCC T ss_conf 45552488998512-6245026 No 7 >PF01835 A2M_N: MG2 domain; InterPro: IPR002890 The proteinase-binding alpha-macroglobulins (A2M) are large glycoproteins found in the plasma of vertebrates, in the hemolymph of some invertebrates and in reptilian and avian egg white. A2M-like proteins are able to inhibit all four classes of proteinases by a 'trapping' mechanism. They have a peptide stretch, called the 'bait region', which contains specific cleavage sites for different proteinases. When a proteinase cleaves the bait region, a conformational change is induced in the protein, thus trapping the proteinase. The entrapped enzyme remains active against low molecular weight substrates, whilst its activity toward larger substrates is greatly reduced, due to steric hindrance. Following cleavage in the bait region, a thiol ester bond, formed between the side chains of a cysteine and a glutamine, is cleaved and mediates the covalent binding of the A2M-like protein to the proteinase. This family includes the N-terminal region of the alpha-2-macroglobulin family. The inhibitor domains belong to MEROPS inhibitor family I39.; GO: 0004866 endopeptidase inhibitor activity; PDB: 2wii_A 3g6j_C 2win_E 2a74_A 2icf_A 2hr0_A 2a73_A 2ice_D 2qki_D 2i07_A .... Probab=95.25 E-value=0.056 Score=26.87 Aligned_cols=75 Identities=17% Similarity=0.164 Sum_probs=53.2 Q ss_pred CCCCCCCEEEEEEEEEECC-CCCCCCEEEEEEE---CCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECCC Q ss_conf 8788997179998888557-9756863899983---9937662464773899648898864217778964899997599 Q T0541 13 VTVVPNTVNTMTATIENQG-NKDSTSFNVSLLV---DGIVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDPE 87 (106) Q Consensus 13 ~~~~~g~~~tv~vtV~N~G-~~~a~~~~v~~y~---~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~~ 87 (106) +.-.||+.+.+++-+++.. .....+..+.+.+ +|..+.................|.++..+..|.|++.+..+.. T Consensus 9 ~iYrPGetV~~~~~~~~~~~~~~~~~~~v~v~l~dp~G~~v~~~~~~~~~~~G~~~~~~~lp~~~~~G~y~l~~~~~~~ 87 (98) T PF01835_consen 9 PIYRPGETVHFKGIVRDQDNFRPPPGTPVTVELEDPNGNEVFRWTVSVTDDFGFFSGSFPLPDDAPTGTYTLEAYTDDA 87 (98) T ss_dssp SEE-TT-EEEEEEEEEEEETTSEESSEEEEEEEE-TT--EEEEEEEECTTT--TEEEEEE--SS-----EEEEEEETTE T ss_pred CCCCCCCEEEEEEEEECCCCCCCCCCCCEEEEEECCCCCEEEEEEEECCCCCCEEEEEEECCCCCCCEEEEEEEEECCC T ss_conf 7547999999999991464556788975899999699999999993001899889999899998886617999998568 No 8 >PF01186 Lysyl_oxidase: Lysyl oxidase ; InterPro: IPR001695 Lysyl oxidase (1.4.3.13 from EC) (LOX) is an extracellular copper-dependent enzyme that catalyzes the oxidative deamination of peptidyl lysine residues in precursors of various collagens and elastins, yielding alpha-aminoadipic-delta-semialdehyde. The deaminated lysines are then able to form semialdehyde cross-links, resulting in the formation of insoluble collagen and elastin fibres in the extracellular matrix . The active site of LOX resides towards the C terminus: this region also binds a single copper atom in an octahedral coordination complex involving at least 3 His residues . Four histidine residues are clustered in a central region of the enzyme. This region is thought to be involved in cooper-binding and is called the 'copper-talon' .; GO: 0004720 protein-lysine 6-oxidase activity, 0005507 copper ion binding Probab=94.59 E-value=0.015 Score=29.83 Aligned_cols=50 Identities=26% Similarity=0.370 Sum_probs=36.7 Q ss_pred CCCCCCCEEEEE----EEEE--ECCCCCCEEEEEEECCCCCEEECCCCCCEEEEEE Q ss_conf 773899648898----8642--1777896489999759995841217885588884 Q T0541 55 TSLESENSTNVD----FHWT--LDGTANSYTLTVNVDPENAVNEGNESNNTLTALV 104 (106) Q Consensus 55 ~~L~~G~s~tv~----~~~~--~~~~~G~~ti~v~vD~~n~v~E~~e~NN~~t~~v 104 (106) .+|.+|-.-+.. -.|. .+.++|+|+|.|.++|...+.|.+-+||++.--+ T Consensus 133 Qgis~Gc~DtY~~~idCQwiDITdvp~G~Y~lqV~vNP~~~v~E~~~~NN~~~c~~ 188 (205) T PF01186_consen 133 QGISPGCWDTYRHDIDCQWIDITDVPPGTYILQVTVNPEYRVAESDFDNNVARCDV 188 (205) T ss_pred CCCCCCCCCCCCCCCCCCEEEECCCCCCCEEEEEEECCCCCCCCCCCCCCEEEEEE T ss_conf 62068941121688884058812789976899998083303231033487699988 No 9 >PF11797 DUF3324: Protein of unknown function C-terminal (DUF3324) Probab=92.83 E-value=0.19 Score=24.13 Aligned_cols=86 Identities=21% Similarity=0.193 Sum_probs=60.9 Q ss_pred CCCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEE--CC--EEEEEEE--CCCCCCCCEEEEEEEEEE-CCC Q ss_conf 9860786602587889971799988885579756863899983--99--3766246--477389964889886421-777 Q T0541 2 IPDLVPVSLTPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLV--DG--IVVDTQT--VTSLESENSTNVDFHWTL-DGT 74 (106) Q Consensus 2 lPDL~v~~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~--~g--~~~~~~~--v~~L~~G~s~tv~~~~~~-~~~ 74 (106) .|+|....+.|.... ....|.+.++|..........+...+ .| ..+.... --.++|...-.+.+.|.. ... T Consensus 27 ~p~LkL~~v~~~~~n--~~~~V~a~l~N~~~~~~~~~~~~a~V~~~~~~k~l~~~~~~~~~~APNS~f~~~i~~~~~~lk 104 (140) T PF11797_consen 27 PPDLKLNKVKPGQIN--GRNVVQANLQNPQPAILKQVTVDAKVTKKGSKKVLYTYKKENMSMAPNSNFNFPISLGGKRLK 104 (140) T ss_pred CCCCEEEEEEEEEEC--CEEEEEEEEECCCCHHCCCCEEEEEEEECCCCEEEEEEECCCCEECCCCEEEEEECCCCCCCC T ss_conf 865276022676798--917999999989851205638999999899975999970258668878678867457987205 Q ss_pred CCCEEEEEEECCCCC Q ss_conf 896489999759995 Q T0541 75 ANSYTLTVNVDPENA 89 (106) Q Consensus 75 ~G~~ti~v~vD~~n~ 89 (106) +|.|++.+.+-.... T Consensus 105 pG~Y~l~~~~~~~~~ 119 (140) T PF11797_consen 105 PGKYTLKVTAKSGKQ 119 (140) T ss_pred CCEEEEEEEEECCCE T ss_conf 958899999978943 No 10 >PF07760 DUF1616: Protein of unknown function (DUF1616); InterPro: IPR011674 This is a group of sequences from hypothetical archaeal proteins. The region in question is approximately 330 amino acid residues long. Probab=87.53 E-value=0.58 Score=21.55 Aligned_cols=75 Identities=12% Similarity=0.224 Sum_probs=53.7 Q ss_pred ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEE---EEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECCC Q ss_conf 2587889971799988885579756863899983993766---2464773899648898864217778964899997599 Q T0541 11 TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVD---TQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDPE 87 (106) Q Consensus 11 ~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~---~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~~ 87 (106) -|.....|++.++.+.|.|+...... .++..++++.... +..+ .|+.|++.+..+++.++.++.+..+.+..=.+ T Consensus 188 YPt~~~~Ge~~~v~VgI~NhE~~~v~-Ytv~v~l~~~~~~~~~~~~i-~L~~~et~e~~~~~t~~~~G~~~~Le~lLy~~ 265 (279) T PF07760_consen 188 YPTEFTLGESGTVIVGIVNHEGRPVN-YTVEVWLQNVTLSPLNTTRI-TLADNETWEQPYTFTPPEPGDNQRLEFLLYKG 265 (279) T ss_pred CCEEEECCCCEEEEEEEECCCCCCEE-EEEEEEEECCCCCCCCCEEE-EECCCCEEEEEEEEECCCCCCCEEEEEEEECC T ss_conf 97148559967999999868899524-89999994222576550479-96799779998998417899846999999879 No 11 >PF11906 DUF3426: Protein of unknown function (DUF3426) Probab=85.84 E-value=0.73 Score=21.05 Aligned_cols=79 Identities=15% Similarity=0.151 Sum_probs=51.5 Q ss_pred CEEEEEEECCC-CCCCCEEEEEEEEEECCCCCCC--CEEEEEEE-CCEEEEEEEC-------------CCCCCCCEEEEE Q ss_conf 60786602587-8899717999888855797568--63899983-9937662464-------------773899648898 Q T0541 4 DLVPVSLTPVT-VVPNTVNTMTATIENQGNKDST--SFNVSLLV-DGIVVDTQTV-------------TSLESENSTNVD 66 (106) Q Consensus 4 DL~v~~i~p~~-~~~g~~~tv~vtV~N~G~~~a~--~~~v~~y~-~g~~~~~~~v-------------~~L~~G~s~tv~ 66 (106) ++.+++..... +..++.+.++.+++|.+..... ...+.|+. +|..+..+.. ..|+||++..+. T Consensus 52 ~l~i~~~~~~~~~~~~~~~~i~g~l~N~~~~~~~~P~l~l~L~D~~g~~v~~r~~~P~eyl~~~~~~~~~l~pg~~~~f~ 131 (149) T PF11906_consen 52 ALRIESSSLRQHPNGGDVLVISGTLRNRADFPQAWPALELTLTDAQGQPVARRVFTPAEYLPPALANQAGLPPGQSVPFR 131 (149) T ss_pred EEEEEEEEEEECCCCCCEEEEEEEEEECCCCCCCCCEEEEEEECCCCCEEEEEEECHHHHCCCCCCCCCCCCCCCEEEEE T ss_conf 07986007775278896799999999389875347459999998999999999977578446433323444999868999 Q ss_pred EEEE-ECCCCCCEEEEE Q ss_conf 8642-177789648999 Q T0541 67 FHWT-LDGTANSYTLTV 82 (106) Q Consensus 67 ~~~~-~~~~~G~~ti~v 82 (106) +.+. ++...-.|.+.+ T Consensus 132 ~~~~~~~~~a~~y~v~~ 148 (149) T PF11906_consen 132 VVFEDPPPNAAGYRVEF 148 (149) T ss_pred EEECCCCCCCCEEEEEE T ss_conf 99407998631589997 No 12 >PF03422 CBM_6: Carbohydrate binding module (family 6); InterPro: IPR005084 The carbohydrate-binding module, family 6 CBM6 from CAZY was previously known as cellulose-binding domain family VI (CBD VI). The cellulose-binding function has been demonstrated in one case on amorphous cellulose and xylan. Some of these modules also bind beta-1,3-glucan.; GO: 0030246 carbohydrate binding; PDB: 2w47_A 2w1w_B 2vzp_A 2vzq_A 2vzr_B 2w87_A 2w46_A 1w0n_A 1ux7_A 2dcj_A .... Probab=83.92 E-value=0.9 Score=20.57 Aligned_cols=65 Identities=20% Similarity=0.163 Sum_probs=39.1 Q ss_pred CEEEEEEEEEECCCCCCCCEEEEEEECC---EEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECCCC Q ss_conf 7179998888557975686389998399---376624647738996488988642177789648999975999 Q T0541 19 TVNTMTATIENQGNKDSTSFNVSLLVDG---IVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDPEN 88 (106) Q Consensus 19 ~~~tv~vtV~N~G~~~a~~~~v~~y~~g---~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~~n 88 (106) ..+++++.+.|.+.. ..+.+++|+ ..+++..++. ..+-....+++.......|.|+|+++....+ T Consensus 45 g~y~~~~~~a~~~~~----~~i~l~vd~~~g~~~~~~~~p~-tg~w~~~~~~~~~v~l~~G~h~i~l~~~~~~ 112 (125) T PF03422_consen 45 GTYTVTFRYANGGSG----GSIELRVDGPDGPLIGTVPVPP-TGGWNTWTTVSVPVNLPAGTHTIYLVFKGGD 112 (125) T ss_dssp EEEEEEEEEECSSSS----EEEEEEESS---EBEEEEEE------TT-EEEEEEEEEB----EEEEEEESS-T T ss_pred CEEEEEEEEECCCCC----EEEEEEECCCCCCEEEEEEECC-CCCCCEEEEEEEEEEECCCEEEEEEEEECCC T ss_conf 725899999789998----0799998998983989999789-9984170899999980886279999998798 No 13 >PF02102 Peptidase_M35: Deuterolysin metalloprotease (M35) family; InterPro: IPR001384 Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site . The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This group of metallopeptidases belong to MEROPS peptidase family M35 (deuterolysin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA. Deuterolysin is a microbial zinc-containing metalloprotease that shows some similarity to thermolysin . The protein is expressed with a possible 19-residue signal sequence, a 155-residue propeptide, and an active peptide of 177 residues . The latter contains an HEXXH motif towards the C-terminus, but the other zinc ligands are as yet undetermined , .; GO: 0004222 metalloendopeptidase activity, 0006508 proteolysis; PDB: 1eb6_A. Probab=83.65 E-value=0.087 Score=25.87 Aligned_cols=67 Identities=19% Similarity=0.307 Sum_probs=42.1 Q ss_pred CCCEEEEEEEEEECCCCCCCCEEEEEE------------ECCEEEEE--------------EECCCCCCCCEEEEEEEEE Q ss_conf 997179998888557975686389998------------39937662--------------4647738996488988642 Q T0541 17 PNTVNTMTATIENQGNKDSTSFNVSLL------------VDGIVVDT--------------QTVTSLESENSTNVDFHWT 70 (106) Q Consensus 17 ~g~~~tv~vtV~N~G~~~a~~~~v~~y------------~~g~~~~~--------------~~v~~L~~G~s~tv~~~~~ 70 (106) .-++-.|+++|+|.|.....-+...++ -+|..+.- .....|+||++.+.+|..- T Consensus 36 ~~~Nt~vKA~iTNtg~~~l~~lk~nt~~D~~p~~Kv~v~~~g~~V~F~G~~~r~~~~~L~~d~f~~LapG~sve~~fDiA 115 (359) T PF02102_consen 36 SVGNTRVKAVITNTGDEELNLLKFNTFLDSAPVKKVSVYKDGKEVPFTGIRVRYKTSGLPDDAFQTLAPGESVEDEFDIA 115 (359) T ss_dssp -------------------------------------------------------------------------------- T ss_pred ECCCEEEEEEEEECCCCCEEEEEECCCCCCCCCEEEEEECCCCCCCCCCEEEEEECCCCCHHHCEECCCCCEEEEEEEEE T ss_conf 54883899999846987458998655367575037998438964662574888523789989952349998479999722 Q ss_pred --ECC-CCCCEEEEEE Q ss_conf --177-7896489999 Q T0541 71 --LDG-TANSYTLTVN 83 (106) Q Consensus 71 --~~~-~~G~~ti~v~ 83 (106) .+. .+|.|+|.+. T Consensus 116 ~t~DLS~gG~~~i~a~ 131 (359) T PF02102_consen 116 ETYDLSEGGTYTISAQ 131 (359) T ss_dssp ---------------- T ss_pred EEEECCCCCCEEEEEE T ss_conf 1342479973899981 No 14 >PF08441 Integrin_alpha2: Integrin alpha; InterPro: IPR013649 This domain is found in integrin alpha and integrin alpha precursors to the C terminus of a number of IPR013517 from INTERPRO repeats and to the N terminus of the IPR013513 from INTERPRO cytoplasmic region. ; PDB: 2vdr_A 1tye_E 3fcu_C 2vdp_A 2vdq_A 3fcs_A 2vdk_A 2vc2_A 2vdn_A 2vdm_A .... Probab=81.74 E-value=1.1 Score=20.10 Aligned_cols=102 Identities=10% Similarity=0.186 Sum_probs=56.8 Q ss_pred CCCEEEEEEE--CC-----CCC-CCCEEEEEEEEEECCCCCCCCEEEEEEECC------EE------------------- Q ss_conf 9860786602--58-----788-997179998888557975686389998399------37------------------- Q T0541 2 IPDLVPVSLT--PV-----TVV-PNTVNTMTATIENQGNKDSTSFNVSLLVDG------IV------------------- 48 (106) Q Consensus 2 lPDL~v~~i~--p~-----~~~-~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g------~~------------------- 48 (106) .|||.++.-. +. .+- ....+.+.++|+|.|. +|....+.+.... .. T Consensus 167 ~~DL~L~~~~~~~~~~~~lvlG~~~~~l~l~vtv~N~GE-~AY~a~l~v~~P~~L~y~~v~~~~~~~~~~~C~~~~~~~~ 245 (458) T PF08441_consen 167 VSDLQLSAKFSLSETFQVLVLGSSDKELTLEVTVTNKGE-DAYEAQLNVTYPPGLSYSGVERNQKSDKPISCESNNENSS 245 (458) T ss_dssp ---EEEEEEECT-E----EE-----EEEEEEEEEEE------TT-EEEEE--TTEEEEEEE-SSSSSC---EEEEESSST T ss_pred CCCCEEEEEECCCCCEEEEEEEECCCEEEEEEEEEECCC-CCCCCEEEEECCCCCCEEEEECCCCCCCCCCCCCCCCCCC T ss_conf 457089987078874168999517853899999898998-8401179998499984676663678875540674888873 Q ss_pred -EEEEECC-CCCCCCEEEEEEEEEECCCCC---CEEEEEEECCCCCEEECCCCCCEEEEEE Q ss_conf -6624647-738996488988642177789---6489999759995841217885588884 Q T0541 49 -VDTQTVT-SLESENSTNVDFHWTLDGTAN---SYTLTVNVDPENAVNEGNESNNTLTALV 104 (106) Q Consensus 49 -~~~~~v~-~L~~G~s~tv~~~~~~~~~~G---~~ti~v~vD~~n~v~E~~e~NN~~t~~v 104 (106) .....++ +|..|+..++.+.|....-.+ ...|.+.+...+.-......+|.++..+ T Consensus 246 ~~~~C~lgnP~~~~~~~~f~l~f~~~~~~~~~~~l~~~l~~~Sts~~~~~~~~dn~~~~~i 306 (458) T PF08441_consen 246 GTVSCDLGNPMKRGSKVTFTLRFDVSSLSGWTDSLEFDLQANSTSEENNPTLEDNSVQLTI 306 (458) T ss_dssp EEEEEE------TTEEEEEEEEEEE---TTSSSEEEEEEEEE-S-TTT----C---EEEEE T ss_pred EEEEEECCCCCCCCCEEEEEEEEECCCCCCCCCEEEEEEEEEECCCCCCCCCCCCEEEEEE T ss_conf 7999878990007986899999963424678742999999982787767655787389999 No 15 >PF06832 BiPBP_C: Penicillin-Binding Protein C-terminus Family; InterPro: IPR009647 This conserved region of approximately 90 residues is found in a sub-group of bacterial Penicillin-Binding Proteins (PBPs). A variable length loop region separates this region from the transpeptidase unit (IPR001460 from INTERPRO). It is predicted to be a beta fold. Probab=76.68 E-value=1.6 Score=19.22 Aligned_cols=42 Identities=21% Similarity=0.317 Sum_probs=27.9 Q ss_pred EEEEEECCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECCCCCEE Q ss_conf 89998399376624647738996488988642177789648999975999584 Q T0541 39 NVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDPENAVN 91 (106) Q Consensus 39 ~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~~n~v~ 91 (106) .+.||+||.+++...- + -++.|.+ ...|.|.|.+ +|..+... T Consensus 44 ~~~W~ldg~~l~~~~~-----~----~~~~~~~-~~~G~h~l~v-~D~~G~~~ 85 (89) T PF06832_consen 44 PVYWFLDGRPLGQTQP-----G----HSLFWQP-DSPGFHQLTV-VDDQGRSD 85 (89) T ss_pred CEEEEECCEECCCCCC-----C----CEEEECC-CCCEEEEEEE-ECCCCCEE T ss_conf 6899999989635788-----8----7278668-9986589999-98999998 No 16 >PF04744 Monooxygenase_B: Monellin Monooxygenase subunit B protein; InterPro: IPR006833 Ammonia monooxygenase and the particulate methane monooxygenase are both integral membrane proteins, occurring in ammonia oxidisers and methanotrophs respectively, which are thought to be evolutionarily related . These enzymes have a relatively wide substrate specificity and can catalyse the oxidation of a range of substrates including ammonia, methane, halogenated hydrocarbons and aromatic molecules . These enzymes are composed of 3 subunits - A (IPR003393 from INTERPRO), B (IPR006833 from INTERPRO) and C (IPR006980 from INTERPRO) - and contain various metal centres, including copper. Particulate methane monooxygenase from Methylococcus capsulatus (Bath) is an ABC homotrimer, which contains mononuclear and dinuclear copper metal centres, and a third metal centre containing a metal ion whose identity in vivo is not certain. The soluble regions of particulate methane monooxygenase from Methylococcus capsulatus (Bath) derive primarily from the B subunit. This subunit forms two antiparallel beta sheets and contains the mono- and di- nuclear copper metal centres .; PDB: 4mon_D 1fa3_A 3mon_F 1krl_D 2o9u_X 1iv7_B 1mol_B 1iv9_A 1fuw_A 1m9g_A .... Probab=76.48 E-value=1.3 Score=19.82 Aligned_cols=55 Identities=20% Similarity=0.332 Sum_probs=36.3 Q ss_pred CCCCCEEEEEEEEEECCCCCCC--CE---EEEEEECC----------EE-----EEEEECCCCCCCCEEEEEEEE Q ss_conf 8899717999888855797568--63---89998399----------37-----662464773899648898864 Q T0541 15 VVPNTVNTMTATIENQGNKDST--SF---NVSLLVDG----------IV-----VDTQTVTSLESENSTNVDFHW 69 (106) Q Consensus 15 ~~~g~~~tv~vtV~N~G~~~a~--~~---~v~~y~~g----------~~-----~~~~~v~~L~~G~s~tv~~~~ 69 (106) -.+|...+++..|+|.|....+ .| .++|.-.. .+ +....-..|+|||++++.+.. T Consensus 259 ~VPGR~l~~~~~VTN~g~~pv~lgEF~tA~vRFln~~v~~~~~~yp~~lla~~GL~v~~~~pI~PGETk~v~v~a 333 (381) T PF04744_consen 259 RVPGRALRMTLKVTNNGDEPVRLGEFNTANVRFLNPDVPTDDPNYPDELLAERGLSVSDNSPIAPGETKTVEVEA 333 (381) T ss_dssp E----EEEEEEEEEE-----EE---EE-SS-EE--TTT--------GCCEE----EES--S-B----EEEEEEEE T ss_pred ECCCCEEEEEEEEECCCCCCEEEEEEEECCEEEECCCCCCCCCCCCHHHCCCCCCCCCCCCCCCCCCCEEEEEEE T ss_conf 348817999999974898646887665043677578666688899445515677151898876999625899996 No 17 >PF06159 DUF974: Protein of unknown function (DUF974); InterPro: IPR010378 This is a family of uncharacterised eukaryotic proteins. Probab=75.30 E-value=1.8 Score=19.02 Aligned_cols=76 Identities=13% Similarity=0.117 Sum_probs=50.3 Q ss_pred CCCCCCCEEEEEEEEEECCCCCCCCEEEEEEE--CCE-E------EEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEE Q ss_conf 87889971799988885579756863899983--993-7------66246477389964889886421777896489999 Q T0541 13 VTVVPNTVNTMTATIENQGNKDSTSFNVSLLV--DGI-V------VDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVN 83 (106) Q Consensus 13 ~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~--~g~-~------~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~ 83 (106) .....|+.+...+.+.|..+.......+..-+ ... . -....+..|.||++....+.... .+.|.|++.+. T Consensus 8 G~iylGETFs~~i~~~N~s~~~v~~v~ikvemqT~s~r~~L~~~~~~~~~~~~l~pg~~~~~iv~~~l-kE~G~h~L~c~ 86 (242) T PF06159_consen 8 GSIYLGETFSCYICVNNSSNYPVRSVRIKVEMQTPSQRLPLSPNSEEDSPVETLEPGESLDFIVSHEL-KELGQHILVCT 86 (242) T ss_pred CCEECCCCEEEEEEEECCCCCCEEEEEEEEEEECCCEEEECCCCCCCCCCCCCCCCCCEEEEEEEEEE-CCCCCEEEEEE T ss_conf 87763687899999516888726889999998669714604776555532233289980747999980-55886899999 Q ss_pred ECCCCC Q ss_conf 759995 Q T0541 84 VDPENA 89 (106) Q Consensus 84 vD~~n~ 89 (106) |.+... T Consensus 87 V~Y~~~ 92 (242) T PF06159_consen 87 VSYTDP 92 (242) T ss_pred EEEECC T ss_conf 999889 No 18 >PF05506 DUF756: Domain of unknown function (DUF756); InterPro: IPR008475 This domain is found, normally as a tandem repeat, at the C terminus of bacterial phospholipase C proteins.; GO: 0004629 phospholipase C activity, 0016042 lipid catabolic process Probab=73.41 E-value=2 Score=18.76 Aligned_cols=61 Identities=13% Similarity=0.170 Sum_probs=36.2 Q ss_pred EEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECCC Q ss_conf 999888855797568638999839937662464773899648898864217778964899997599 Q T0541 22 TMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDPE 87 (106) Q Consensus 22 tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~~ 87 (106) .+..++.|.|... ..+.+|...-.-....-..+.+|++.+..+.. ....|.|.|.|..+.+ T Consensus 21 ~l~L~l~N~G~~~---~~~~v~~~~~~~~~~~~~tv~aG~~~~~~~~l--~~~~gwYDltV~~png 81 (89) T PF05506_consen 21 NLRLTLANSGSAG---ATFTVYDNAYRGDGPRRYTVAAGQTLSDTWDL--AASGGWYDLTVTGPNG 81 (89) T ss_pred EEEEEEEECCCCC---EEEEEEECCCCCCCCEEEEECCCCEEEEEEEC--CCCCCCEEEEEECCCC T ss_conf 8999999589875---89999928878999999998999999999822--7789838999991898 No 19 >PF00553 CBM_2: Cellulose binding domain; InterPro: IPR001919 The microbial degradation of cellulose and xylans requires several types of enzyme such as endoglucanases (3.2.1.4 from EC), cellobiohydrolases (3.2.1.91 from EC) (exoglucanases), or xylanases (3.2.1.8 from EC) . Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it is shown in the following schematic representation, there are two conserved cysteines in this CBD domain - one at each extremity of the domain - which have been shown to be involved in a disulphide bond. There are also four conserved tryptophan, two are involved in cellulose binding. The CBD of a number of bacterial cellulases has been shown to consist of about 105 amino acid residues , . ; GO: 0004553 hydrolase activity, hydrolyzing O-glycosyl compounds, 0030246 carbohydrate binding, 0005975 carbohydrate metabolic process; PDB: 1exg_A 1exh_A 1heh_C 1hej_C 2xbd_A 1e5b_A 1e5c_A 1xbd_A 2cwr_A 2czn_A. Probab=70.13 E-value=2.4 Score=18.34 Aligned_cols=54 Identities=13% Similarity=0.160 Sum_probs=35.2 Q ss_pred CCEEEEEEEEEECCCCCCCCEEEEEEEC-CEEEEEE------------------ECCCCCCCCEEEEEEEEEE Q ss_conf 9717999888855797568638999839-9376624------------------6477389964889886421 Q T0541 18 NTVNTMTATIENQGNKDSTSFNVSLLVD-GIVVDTQ------------------TVTSLESENSTNVDFHWTL 71 (106) Q Consensus 18 g~~~tv~vtV~N~G~~~a~~~~v~~y~~-g~~~~~~------------------~v~~L~~G~s~tv~~~~~~ 71 (106) +..+...++|+|.|.....+..+.|-++ +..+... .-..|+||++.++.|.... T Consensus 12 ~~Gf~~~v~VtN~~~~~i~~W~v~~~~~~g~~i~~~Wna~~s~sG~~~~~~~~~wn~~i~pG~s~~~Gf~~~~ 84 (101) T PF00553_consen 12 GGGFQANVTVTNTGSAPINGWTVTFDFPGGQTITSSWNATVSQSGNTVTVTPASWNGTIAPGGSVSFGFQGSG 84 (101) T ss_dssp SSEEEEEEEEEEESSTTB--EEEEEEE----BBEE-SSCEEEE---EEEEEE-TT--EE-E--ECEEEEEEE- T ss_pred CCCCEEEEEEEECCCCCCCCEEEEEECCCCCEEEEEEEEEEEECCCEEEEECCCCCCCCCCCCEEEEEEEEEC T ss_conf 9981899999979998418879999968998885345239980699899981873870199988998799867 No 20 >PF07610 DUF1573: Protein of unknown function (DUF1573); InterPro: IPR011467 These hypothetical proteins from bacteria, such as Rhodopirellula baltica, Bacteroides thetaiotaomicron and Porphyromonas gingivalis, share a region of conserved sequence towards their N termini. Probab=60.88 E-value=3.7 Score=17.35 Aligned_cols=44 Identities=11% Similarity=0.190 Sum_probs=25.5 Q ss_pred EEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEE Q ss_conf 888855797568638999839937662464773899648898864 Q T0541 25 ATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHW 69 (106) Q Consensus 25 vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~ 69 (106) ..++|.|.....=..+.-.- |=......-..|+||++..+.++| T Consensus 2 F~~~N~g~~pl~I~~v~tsC-gCt~~~~~k~~i~PGes~~i~v~y 45 (45) T PF07610_consen 2 FKFTNTGDSPLVITDVTTSC-GCTTAEYSKKPIAPGESGEIKVTY 45 (45) T ss_pred EEEEECCCCCEEEEEEEECC-CCEEECCCCCEECCCCEEEEEEEC T ss_conf 89998889978999862652-668605776408899988999989 No 21 >PF06586 TraK: TraK protein; InterPro: IPR010563 This family consists of several TraK proteins from Escherichia coli, Salmonella typhi and Salmonella typhimurium. TraK is known to be essential for pilus assembly but its exact role in this process is unknown .; GO: 0000746 conjugation, 0019867 outer membrane Probab=59.19 E-value=1.9 Score=18.86 Aligned_cols=24 Identities=13% Similarity=0.189 Sum_probs=9.3 Q ss_pred EEEEEEECCCCCCEEEEEEECCCCC Q ss_conf 9886421777896489999759995 Q T0541 65 VDFHWTLDGTANSYTLTVNVDPENA 89 (106) Q Consensus 65 v~~~~~~~~~~G~~ti~v~vD~~n~ 89 (106) .++...|...+| -++.+..++... T Consensus 102 ysl~l~P~~~p~-~ti~l~~~~~~~ 125 (234) T PF06586_consen 102 YSLTLVPKAIPA-QTIFLTSDPAGK 125 (234) T ss_pred EEEEEEECCCCC-EEEEEEECCCCC T ss_conf 999998747898-189998266655 No 22 >PF02221 E1_DerP2_DerF2: ML domain; InterPro: IPR003172 The MD-2-related lipid-recognition (ML) domain is implicated in lipid recognition, particularly in the recognition of pathogen related products. It has an immunoglobulin-like beta-sandwich fold similar to that of E-set Ig domains. This domain is present in the following proteins: Epididymal secretory protein E1 (also known as Niemann-Pick C2 protein), which is known to bind cholesterol. Niemann-Pick disease type C2 is a fatal hereditary disease characterised by accumulation of low-density lipoprotein-derived cholesterol in lysosomes . House-dust mite allergen proteins such as Der f 2 from Dermatophagoides farinae and Der p 2 from Dermatophagoides pteronyssinus . ; PDB: 2hka_A 1nep_A 2f08_B 1ahk_A 1xwv_A 1wrf_A 1ahm_A 1ktj_A 1a9v_A 2z64_C .... Probab=56.96 E-value=4.4 Score=16.98 Aligned_cols=83 Identities=16% Similarity=0.172 Sum_probs=52.0 Q ss_pred EEEEE--ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEEE----------E----C-CCCCCCCEEEEEEE Q ss_conf 78660--258788997179998888557975686389998399376624----------6----4-77389964889886 Q T0541 6 VPVSL--TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQ----------T----V-TSLESENSTNVDFH 68 (106) Q Consensus 6 ~v~~i--~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~----------~----v-~~L~~G~s~tv~~~ 68 (106) .|+.. .|-.+..|+.+++++.-...-........+...+.|..+... . . =+|.+|+..+..++ T Consensus 19 ~I~~c~~~Pc~~~rG~~~~i~~~f~~~~~~~~~~~~v~~~~~gv~~~~~~l~~~~~d~C~~~~~~~CPl~~G~~~~~~~~ 98 (133) T PF02221_consen 19 RISGCDSSPCPPKRGTNVTITIDFTANQDITSLKTVVVAKLNGVKIPFPTLPNESYDACKSLGGVSCPLKKGETYTYTLT 98 (133) T ss_dssp EEESEEESSEEEE---EEEEEEEEE-SS-BST-EEEEEEEE--EEETTEE-S-EECEGGGTTSSCGSSB-TT-EEEEEEE T ss_pred EEEECCCCCCCCCCCCCEEEEEEEEECCCEEEEEEEEEEEECCEEEEEEECCCCCCCHHHCCCCCCCCCCCCEEEEEEEE T ss_conf 98105389980138997899999990744306899999999137888676454344211065788433879977999998 Q ss_pred EEE--CCCCCCEEEEEEE-CCCC Q ss_conf 421--7778964899997-5999 Q T0541 69 WTL--DGTANSYTLTVNV-DPEN 88 (106) Q Consensus 69 ~~~--~~~~G~~ti~v~v-D~~n 88 (106) +.. ..+.|.|++.+.. |.++ T Consensus 99 ~~i~~~~P~~~~~v~~~l~d~~~ 121 (133) T PF02221_consen 99 FPIPKIYPPGSYTVEWELTDQDG 121 (133) T ss_dssp EE-STTSSSBCEEEEEEEEETTS T ss_pred EECCCCCCCEEEEEEEEEECCCC T ss_conf 78455567467999999991899 No 23 >PF00207 A2M: Alpha-2-macroglobulin family; InterPro: IPR001599 This family contains serum complement C3 and C4 precursors and alpha-macrogrobulins. The alpha-macroglobulin (aM) family of proteins includes protease inhibitors , typified by the human tetrameric a2-macroglobulin (a2M); they belong to the MEROPS proteinase inhibitor family I39, clan IL. These protease inhibitors share several defining properties, which include (i) the ability to inhibit proteases from all catalytic classes, (ii) the presence of a 'bait region' and a thiol ester, (iii) a similar protease inhibitory mechanism and (iv) the inactivation of the inhibitory capacity by reaction of the thiol ester with small primary amines. aM protease inhibitors inhibit by steric hindrance . The mechanism involves protease cleavage of the bait region, a segment of the aM that is particularly susceptible to proteolytic cleavage, which initiates a conformational change such that the aM collapses about the protease. In the resulting aMprotease complex, the active site of the protease is sterically shielded, thus substantially decreasing access to protein substrates. Two additional events occur as a consequence of bait region cleavage, namely (i) the h-cysteinyl-g-glutamyl thiol ester becomes highly reactive and (ii) a major conformational change exposes a conserved COOH-terminal receptor binding domain (RBD). RBD exposure allows the aM protease complex to bind to clearance receptors and be removed from circulation . Tetrameric, dimeric, and, more recently, monomeric aM protease inhibitors have been identified , . ; GO: 0004866 endopeptidase inhibitor activity; PDB: 2pn5_A 3cu7_A 3frp_G 2b39_B 2hr0_B 2wii_B 2win_D 2qki_E 2a74_E 3g6j_D .... Probab=53.86 E-value=5 Score=16.70 Aligned_cols=19 Identities=21% Similarity=0.127 Sum_probs=7.8 Q ss_pred CCCCCCCCEEEEEEEEEEC Q ss_conf 5878899717999888855 Q T0541 12 PVTVVPNTVNTMTATIENQ 30 (106) Q Consensus 12 p~~~~~g~~~tv~vtV~N~ 30 (106) |.....|+.+.+.+.|.|. T Consensus 63 P~~~~~gd~~~i~v~v~N~ 81 (92) T PF00207_consen 63 PRFVRRGDQVQIPVTVFNY 81 (92) T ss_dssp -SEEETTSEEEEEEEEEE- T ss_pred CCEECCCCEEEEEEEEECC T ss_conf 8033279999999999948 No 24 >PF10989 DUF2808: Protein of unknown function (DUF2808) Probab=51.12 E-value=5.5 Score=16.46 Aligned_cols=51 Identities=14% Similarity=0.127 Sum_probs=37.9 Q ss_pred EEEEEECCEEEEEEECCCCCCCCEEEEEE-EEEECCCCCCEEEEEEECCCCC Q ss_conf 89998399376624647738996488988-6421777896489999759995 Q T0541 39 NVSLLVDGIVVDTQTVTSLESENSTNVDF-HWTLDGTANSYTLTVNVDPENA 89 (106) Q Consensus 39 ~v~~y~~g~~~~~~~v~~L~~G~s~tv~~-~~~~~~~~G~~ti~v~vD~~n~ 89 (106) .+.+-.++..+.-..-.+++||++.++.+ .+.=+..+|.|.+.+.+=+.+. T Consensus 81 ~v~~d~~~~~i~I~f~~PV~pg~tv~V~l~~v~NP~~~G~Y~f~v~a~p~G~ 132 (146) T PF10989_consen 81 EVEWDEDGRSITIFFDEPVPPGTTVTVVLSPVRNPRSGGTYQFNVTAFPPGD 132 (146) T ss_pred EEEECCCCCEEEEEECCCCCCCCEEEEEEEEECCCCCCCEEEEEEEEECCCC T ss_conf 7898788988999938997989999999970008998973899999986898 No 25 >PF02014 Reeler: Reeler domain Schematic picture including Reeler domain; InterPro: IPR002861 Extracellular matrix (ECM) proteins play an important role in early cortical development, specifically in the formation of neural connections and in controlling the cyto-architecture of the central nervous system. The product of the reeler gene in mouse is reelin,a large extracellular protein secreted by pioneer neurons that coordinates cell positioning during neurodevelopment . F-spondin and mindin are a family of matrix-attached adhesion molecules that share structural similarities and overlapping domains of expression. Both F-spondin and mindin promote adhesion and outgrowth of hippocampal embryonic neurons and bind to a putative receptor(s) expressed on both hippocampal and sensory neurons . This domain of unknown function is found at the N terminus of reelin and F-spondin.; PDB: 3coo_A 2zot_C 2zou_A. Probab=46.65 E-value=6.5 Score=16.07 Aligned_cols=74 Identities=24% Similarity=0.306 Sum_probs=45.9 Q ss_pred ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEE----EEEEECCC-------CC-------------CCCEEEEE Q ss_conf 25878899717999888855797568638999839937----66246477-------38-------------99648898 Q T0541 11 TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIV----VDTQTVTS-------LE-------------SENSTNVD 66 (106) Q Consensus 11 ~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~----~~~~~v~~-------L~-------------~G~s~tv~ 66 (106) .+....+|+.++|++ .+.+...-++|.+.....+.. ++...... +. ......++ T Consensus 26 ~~~~y~pG~~~~Vtl--~~~~~~~F~GFllqAr~~~~~~~~~vG~F~~~~~~~~~~~~~C~~~~~~avTH~~~~~K~~v~ 103 (132) T PF02014_consen 26 SPETYVPGQTYTVTL--SNSSSDPFRGFLLQARDADNPGPGIVGTFQLPPDSDTTQLLNCSGGTPNAVTHSNTSPKTSVT 103 (132) T ss_dssp --SSB----EEEEEE--EETTTEEB---EEEEEETT-----B---EEES-TTTEEEETTE----EEEEEESS-S-BSEEE T ss_pred CCCEECCCCEEEEEE--ECCCCCEEEEEEEEEECCCCCCCCCCCEEEECCCCCCEEECCCCCCCCCEEEECCCCCCCEEE T ss_conf 786066999999999--678998587899998727888763124038579523547035534570207975899862789 Q ss_pred EEEEECCC-CCCEEEEEEECC Q ss_conf 86421777-896489999759 Q T0541 67 FHWTLDGT-ANSYTLTVNVDP 86 (106) Q Consensus 67 ~~~~~~~~-~G~~ti~v~vD~ 86 (106) +.|.++.. .|.-.|++.|=. T Consensus 104 ~~W~AP~~~~g~V~f~aTVv~ 124 (132) T PF02014_consen 104 FTWTAPSDGSGCVTFRATVVQ 124 (132) T ss_dssp EEEE--------EEEEEEEES T ss_pred EEEECCCCCCCCEEEEEEEEE T ss_conf 999399999875899999994 No 26 >PF11611 TRF2: Telomeric repeat-binding factor 2; PDB: 3cfu_A. Probab=46.26 E-value=6.6 Score=16.04 Aligned_cols=66 Identities=17% Similarity=0.234 Sum_probs=38.7 Q ss_pred CCEEEEEEEEEECCCCCCCC--EEEEEEE-CCEEEEE----------EECCCCCCCCEEEEEEEEEECCCCCCEEEEEE Q ss_conf 97179998888557975686--3899983-9937662----------46477389964889886421777896489999 Q T0541 18 NTVNTMTATIENQGNKDSTS--FNVSLLV-DGIVVDT----------QTVTSLESENSTNVDFHWTLDGTANSYTLTVN 83 (106) Q Consensus 18 g~~~tv~vtV~N~G~~~a~~--~~v~~y~-~g~~~~~----------~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~ 83 (106) +.-+.+.++|+|.|.....- ....|+- +|..... .....|.||++.+..+-+..+...-.+.|... T Consensus 35 ~~fvvV~v~v~N~~~e~~~~~~~~f~L~d~~g~~y~~~~~~~~~~~~~~~~~l~pG~~~~g~ivF~vp~~~~~~~L~~~ 113 (123) T PF11611_consen 35 GKFVVVDVTVKNNGDEPISFSPSDFKLYDDDGKEYDPDFSASSDPDNFFSGELKPGESVEGKIVFEVPKDSQPYELEYD 113 (123) T ss_dssp SEEEEEEEEEEE-----B-B-----EEE-TT--B--EEE-CCC---------B----EE---EEEEE----GG-EEEE- T ss_pred CEEEEEEEEEEECCCCCEEECCCCEEEEECCCCEECCCCCCCCCCCCCCCEEECCCCEEEEEEEEEECCCCCCEEEEEE T ss_conf 9899999999999999577575719999499979814433100115545349999998999999998999945799992 No 27 >PF07919 DUF1683: Protein of unknown function (DUF1683); InterPro: IPR012880 The proteins featured in this family are all hypothetical eukaryotic proteins of unknown function. The region in question is approximately 150 residues long. Probab=45.16 E-value=6.9 Score=15.94 Aligned_cols=70 Identities=19% Similarity=0.147 Sum_probs=43.5 Q ss_pred EEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEE------ECCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCE Q ss_conf 078660258788997179998888557975686389998------39937662464773899648898864217778964 Q T0541 5 LVPVSLTPVTVVPNTVNTMTATIENQGNKDSTSFNVSLL------VDGIVVDTQTVTSLESENSTNVDFHWTLDGTANSY 78 (106) Q Consensus 5 L~v~~i~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y------~~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ 78 (106) |.|..--|+....|+.+++.+.++|.-.. .-.+.+.+- .+|...... .|.|++..++.+.+.+ ...|.. T Consensus 39 l~V~~~~p~~~~~~~~~~l~~~l~N~T~~-~~~~~~~l~~s~~F~fSG~k~~~~---~vlP~s~~~v~~~L~p-l~~G~~ 113 (125) T PF07919_consen 39 LRVLAEAPSSAIVGEPFTLDYTLENPTMH-FQEFELSLEPSDNFMFSGPKQLTL---QVLPGSRHTVRYNLYP-LKAGWW 113 (125) T ss_pred CEEEEECCCCCCCCCCEEEEEEEECCCCC-CEEEEEEECCCCCEEEECCCCCEE---EECCCCCEEEEEEEEE-CCCCCE T ss_conf 49999848744059869999999959997-499999967679789968873427---9789975799999995-747838 Q ss_pred E Q ss_conf 8 Q T0541 79 T 79 (106) Q Consensus 79 t 79 (106) . T Consensus 114 ~ 114 (125) T PF07919_consen 114 I 114 (125) T ss_pred E T ss_conf 7 No 28 >PF06280 DUF1034: Fn3-like domain (DUF1034); InterPro: IPR010435 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This domain of unknown function is present in bacterial and plant peptidases belonging to MEROPS peptidase family S8 (subfamily S8A subtilisin, clan SB). It is C-terminal to and adjacent to the S8 peptidase domain and can be found in conjunction with the PA (Protease associated) domain (IPR003137 from INTERPRO) and additionally in Gram-positive bacteria with the surface protein anchor domain (IPR001899 from INTERPRO).; GO: 0004252 serine-type endopeptidase activity, 0005618 cell wall, 0016020 membrane; PDB: 1xf1_A 3eif_A. Probab=44.57 E-value=7.1 Score=15.89 Aligned_cols=55 Identities=27% Similarity=0.364 Sum_probs=31.8 Q ss_pred CCEEEEEEEEEECCCCCCCCEEEEE------EE---CCEEEEE--------E----ECCCCCCCCEEEEEEEEEECC Q ss_conf 9717999888855797568638999------83---9937662--------4----647738996488988642177 Q T0541 18 NTVNTMTATIENQGNKDSTSFNVSL------LV---DGIVVDT--------Q----TVTSLESENSTNVDFHWTLDG 73 (106) Q Consensus 18 g~~~tv~vtV~N~G~~~a~~~~v~~------y~---~g~~~~~--------~----~v~~L~~G~s~tv~~~~~~~~ 73 (106) +...+++++|+|.|..+.. +.+.- +. +|..... . ..=.++||++.++++++.++. T Consensus 7 ~~~~~~tvtl~N~g~~~~t-Y~~~~~~~~T~~~~~~~g~~~~~~~~~~~~~~~~~~~~vTV~ag~s~~v~vt~~~p~ 82 (112) T PF06280_consen 7 GNFFTFTVTLHNTGNKDKT-YTLSHVGVLTDQTDKNDGYFTLPPIAPGAASVTFSPNTVTVPAGGSKTVTVTFTPPS 82 (112) T ss_dssp -SEEEEEEEEEE-SSS-EE-EEEEEE-EEEEEE-----BEEEEEEE----EEE---EEEEE-TTEEEEEEEEEE--G T ss_pred CCCEEEEEEEEECCCCCEE-EEEEEEEEEEEEEECCCCCCCCCCCCCEEEEEEECCCEEEECCCCEEEEEEEEEECC T ss_conf 7848999999958999889-999406887789722577113565542025666379849999999899999997631 No 29 >PF01917 Arch_flagellin: Archaebacterial flagellin; InterPro: IPR002774 Members of this family are the proteins that form the flagella in archaebacteria . Each bacterium has multiple members of this family.; GO: 0005198 structural molecule activity, 0006928 cell motility Probab=41.88 E-value=7.8 Score=15.66 Aligned_cols=43 Identities=16% Similarity=0.247 Sum_probs=27.2 Q ss_pred CEEEEEEECCCCCCCCEEEEEEEEE-ECCCCC--CCCEEEEEEECC Q ss_conf 6078660258788997179998888-557975--686389998399 Q T0541 4 DLVPVSLTPVTVVPNTVNTMTATIE-NQGNKD--STSFNVSLLVDG 46 (106) Q Consensus 4 DL~v~~i~p~~~~~g~~~tv~vtV~-N~G~~~--a~~~~v~~y~~g 46 (106) +|.|.+.....+..+.--.+.+.|+ |.|... .....+.+..+| T Consensus 50 ~i~I~~~~g~~~~~~~i~~l~i~V~~n~Gs~~Idl~~~~i~v~~~~ 95 (191) T PF01917_consen 50 GIEIISDVGSSPNSGTIDNLTIYVKPNAGSSPIDLSQTTITVSDDG 95 (191) T ss_pred CEEEEEEECCCCCCCCEEEEEEEEECCCCCCCCCCCCCEEEEEECC T ss_conf 7599966633688885369999998079998646664289999699 No 30 >PF05688 DUF824: Salmonella repeat of unknown function (DUF824); InterPro: IPR008542 This family consists of a series of repeated sequences (of around 180 residues) which are found in Salmonella typhimurium, Salmonella typhi and Escherichia coli. These repeats are almost always found with this entry. The repeats are associated with RatA and RatB, the coding sequences of which are found in the pathogeneicity island of Salmonella. The sequences may be determinants of pathogenicity , . Probab=33.49 E-value=11 Score=14.92 Aligned_cols=35 Identities=9% Similarity=0.084 Sum_probs=24.8 Q ss_pred ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEEC Q ss_conf 25878899717999888855797568638999839 Q T0541 11 TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVD 45 (106) Q Consensus 11 ~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~ 45 (106) .-.....|+.+.++++++|....+.++....+.-+ T Consensus 5 ~aakaK~GEsi~ltVt~kda~G~p~~~~~f~l~r~ 39 (47) T PF05688_consen 5 NAAKAKVGESIPLTVTVKDANGNPVPNTPFTLTRG 39 (47) T ss_pred CCCEEECCCEEEEEEEEECCCCCCCCCCEEEEEEC T ss_conf 02126527648899999868999969844999945 No 31 >PF03170 BcsB: Bacterial cellulose synthase subunit; InterPro: IPR003920 An operon encoding 4 proteins required for bacterial cellulose biosynthesis (bcs) in Acetobacter xylinum has been isolated via genetic complementation with strains lacking cellulose synthase activity . Nucleotide sequence analysis showed the cellulose synthase operon to consist of 4 genes, designated bcsA, bcsB, bcsC and bcsD, all of which are required for maximal bacterial cellulose synthesis in A. xylinum. The calculated molecular mass of the protein encoded by bcsB is 85.3kDa . BcsB encodes the catalytic subunit of cellulose synthase. The protein polymerises uridine 5'-diphosphate glucose to cellulose: UDP-glucose + (1,4-beta-D-glucosyl)(N) = UDP + (1,4-beta-D-glucosyl)(N+1). The enzyme is specifically activated by the nucleotide cyclic diguanylic acid. Sequence analysis suggests that BcsB contains several transmembrane (TM) domains, and shares a high degree of similarity with Escherichia coli YhjN.; GO: 0006011 UDP-glucose metabolic process, 0016020 membrane Probab=30.85 E-value=12 Score=14.68 Aligned_cols=65 Identities=8% Similarity=-0.004 Sum_probs=29.8 Q ss_pred EEEEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEEEEC-CCCCCEEEEEEEC Q ss_conf 17999888855797568638999839937662464773899648898864217-7789648999975 Q T0541 20 VNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLD-GTANSYTLTVNVD 85 (106) Q Consensus 20 ~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~-~~~G~~ti~v~vD 85 (106) ..++....++.-........+.+++||..+++..+..-.+ +..++++..++. ...|...+++.+. T Consensus 45 ~A~L~L~y~~S~~l~~~~S~l~V~lNg~~v~s~~l~~~~~-~~~~~~i~Ip~~~l~~g~N~l~~~~~ 110 (607) T PF03170_consen 45 GARLNLDYTYSPSLLPERSTLTVSLNGEPVGSIPLDPEQG-EKQTVTIPIPPRLLITGFNRLTLEFI 110 (607) T ss_pred CEEEEEEEEECCCCCCCCCEEEEEECCEEEEEEECCCCCC-CCEEEEEECCHHHHCCCCCEEEEEEE T ss_conf 3299999876766578766699999999989984687778-72579996481661377635999998 No 32 >PF06483 ChiC: Chitinase C; InterPro: IPR009470 This ~170 aa region is found at the C terminal to the catalytic domain (IPR001223 from INTERPRO) found in members of glycoside hydrolase family 18. Probab=29.60 E-value=13 Score=14.56 Aligned_cols=30 Identities=27% Similarity=0.319 Sum_probs=18.7 Q ss_pred CCCCCCCEEEEEEEEEECCCCCCEEEEEEEC Q ss_conf 7738996488988642177789648999975 Q T0541 55 TSLESENSTNVDFHWTLDGTANSYTLTVNVD 85 (106) Q Consensus 55 ~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD 85 (106) ..|+||++.++.+.+..+... --.+++.++ T Consensus 126 qslapGasv~~~~~YyLPiS~-PsN~tv~~~ 155 (180) T PF06483_consen 126 QSLAPGASVELDMVYYLPISG-PSNFTVNFG 155 (180) T ss_pred CCCCCCCEEEEEEEEEEECCC-CCCEEEEEC T ss_conf 213899638998999851268-753799989 No 33 >PF10342 Drmip_Hesp: Developmentally Regulated MAPK Interacting Protein Probab=26.74 E-value=14 Score=14.28 Aligned_cols=75 Identities=15% Similarity=0.116 Sum_probs=36.8 Q ss_pred CCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEE--EEEECCCCCCCCEEEEEEEEEECCC-CCCEEEEEEECCCC Q ss_conf 8788997179998888557975686389998399376--6246477389964889886421777-89648999975999 Q T0541 13 VTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVV--DTQTVTSLESENSTNVDFHWTLDGT-ANSYTLTVNVDPEN 88 (106) Q Consensus 13 ~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~--~~~~v~~L~~G~s~tv~~~~~~~~~-~G~~ti~v~vD~~n 88 (106) +....|+.++|++...+..... ..+.+.|+-++... ....+...........++..+.... .+.|.|.++....+ T Consensus 9 ~~~~~G~~~~I~W~~~~~~~~~-~~v~i~L~~~~~~~~~~~~~la~~~~~~~gs~~~~vp~~~~~~~~Y~i~~~~~~~~ 86 (97) T PF10342_consen 9 EVWVAGETYTITWTDSGTDPPQ-SKVSIYLSNGGNSTIQCVDTLASGVPASDGSYSWTVPSDLPTGGDYFIQIVANSDN 86 (97) T ss_pred CEEECCCEEEEEEECCCCCCCC-CEEEEEEECCCCCCCCCCEEEEECCCCCCCEEEEEECCCCCCCCCEEEEEEECCCC T ss_conf 9887996599798739999875-48999999199876431103241216988779999488779998799999988899 No 34 >PF00801 PKD: PKD domain; InterPro: IPR000601 The PKD domain was first identified in the Polycystic kidney disease protein, polycystin-1 (PDK1 gene), and contains an Ig-like fold consisting of a beta-sandwich of seven strands in two sheets with a Greek key topology, although some members have additional strands . Polycystin-1 is a large cell-surface glycoprotein involved in adhesive protein-protein and protein-carbohydrate interactions; however it is not clear if the PKD domain mediates any of these interactions. PKD domains are also found in other proteins, usually in the extracellular parts of proteins involved in interactions with other proteins. For example, domains with a PKD-type fold are found in archaeal surface layer proteins that protect the cell from extreme environments , and in the human VPS10 domain-containing receptor SorCS2 .; PDB: 2c4x_A 2c26_A 1b4r_A 1wgo_A 1l0q_B. Probab=25.86 E-value=15 Score=14.19 Aligned_cols=58 Identities=24% Similarity=0.255 Sum_probs=33.5 Q ss_pred ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEEEECCCCCCEEEEEEEC Q ss_conf 258788997179998888557975686389998399376624647738996488988642177789648999975 Q T0541 11 TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLDGTANSYTLTVNVD 85 (106) Q Consensus 11 ~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD 85 (106) +|..+..|+.++|.+... ........|..++ ......+...+.. + ..+|.|++++.|- T Consensus 4 s~~~~~~g~~v~f~a~~~-----~g~~~~y~W~fgd-------~~~~~~~~~~~ht--y---~~~G~y~V~ltv~ 61 (69) T PF00801_consen 4 SPTTVPVGETVTFTASSP-----DGSIVTYSWDFGD-------GGTTSSGPSVTHT--Y---TSPGTYTVTLTVT 61 (69) T ss_dssp ---SEBTT-EEEEEECT-------CSECEEEEE----------ECCEESSSEEEEE------S----EEEEEEEE T ss_pred CCCEECCCCEEEEEEECC-----CCCCEEEEEEECC-------CCEEECCCCEEEE--E---CCCEEEEEEEEEE T ss_conf 784488998899999958-----9996899999669-------9823846767999--6---8987899999999 No 35 >PF12389 Peptidase_M73: Camelysin metallo-endopeptidase Probab=24.82 E-value=16 Score=14.09 Aligned_cols=29 Identities=17% Similarity=0.204 Sum_probs=23.2 Q ss_pred CCCCCCCCEEEEEEEEEECCCCCCCCEEE Q ss_conf 58788997179998888557975686389 Q T0541 12 PVTVVPNTVNTMTATIENQGNKDSTSFNV 40 (106) Q Consensus 12 p~~~~~g~~~tv~vtV~N~G~~~a~~~~v 40 (106) -....||+.++-.+.++|.|+.+.....+ T Consensus 58 v~nlkPGDtv~k~f~l~N~Gtldi~~V~l 86 (199) T PF12389_consen 58 VSNLKPGDTVEKEFTLKNSGTLDIKDVLL 86 (199) T ss_pred CCCCCCCCCEEEEEEEEECCEEEEEEEEE T ss_conf 11568997278889999701002138999 No 36 >PF07679 I-set: Immunoglobulin I-set domain; InterPro: IPR013098 The basic structure of immunoglobulin (Ig) molecules is a tetramer of two light chains and two heavy chains linked by disulphide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain (CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains (CH1 to CH4). Ig molecules are highly modular proteins, in which the variable and constant domains have clear, conserved sequence patterns. The domains in Ig and Ig-like molecules are grouped into four types: V-set (variable; IPR013106 from INTERPRO), C1-set (constant-1; IPR003597 from INTERPRO), C2-set (constant-2; IPR008424 from INTERPRO) and I-set (intermediate; IPR013098 from INTERPRO) . Structural studies have shown that these domains share a common core Greek-key beta-sandwich structure, with the types differing in the number of strands in the beta-sheets as well as in their sequence patterns , . Immunoglobulin-like domains that are related in both sequence and structure can be found in several diverse protein families. Ig-like domains are involved in a variety of functions, including cell-cell recognition, cell-surface receptors, muscle structure and the immune system . This entry represents I-set domains, which are found in several cell adhesion molecules, including vascular (VCAM), intercellular (ICAM), neural (NCAM) and mucosal addressin (MADCAM) cell adhesion molecules, as well as junction adhesion molecules (JAM). I-set domains are also present in several other diverse protein families, including several tyrosine-protein kinase receptors, the hemolymph protein hemolin, the muscle proteins titin, telokin, and twitchin, the neuronal adhesion molecule axonin-1 , and the signalling molecule semaphorin 4D that is involved in axonal guidance, immune function and angiogenesis .; PDB: 1hcf_X 1wwb_X 1wwc_A 3dmk_A 2v9q_A 2iep_A 2id5_B 2vra_D 2vr9_A 2v9t_A .... Probab=23.85 E-value=16 Score=13.98 Aligned_cols=70 Identities=21% Similarity=0.265 Sum_probs=42.3 Q ss_pred CCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCC-CCCEEEEEEEEEECCCCCCEEEEEEECC Q ss_conf 58788997179998888557975686389998399376624647738-9964889886421777896489999759 Q T0541 12 PVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLE-SENSTNVDFHWTLDGTANSYTLTVNVDP 86 (106) Q Consensus 12 p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~-~G~s~tv~~~~~~~~~~G~~ti~v~vD~ 86 (106) .-....|+.+++...+. |. +...+.|+.+|..+....-..+. -+...++.+.-.-..-.|.|+..|.-+. T Consensus 9 ~~~~~~G~~~~l~C~v~--~~---p~~~v~W~~~~~~l~~~~~~~~~~~~~~~~L~I~~v~~~D~G~Y~C~a~n~~ 79 (90) T PF07679_consen 9 DVTVKEGESVTLTCEVS--GN---PPPSVTWFKNGQPLSSSSRYSISNSGNSSTLTIKNVQRSDSGEYTCVASNEF 79 (90) T ss_dssp EEEEETTSEEEEEEEEE--SS---SSSEEEEEETTEBEETSSSEEEEEECTEEEEEESSCSGGGHEEEEEEEEESS T ss_pred CEEEECCCCEEEEEEEE--EC---CCCEEEEEEEEECCCCCCEEEEEEEEEEEEEEECCCCHHHCEEEEEEEEECC T ss_conf 99996798299999998--24---8987999872100245301478720101699858999553999999999899 No 37 >PF06030 DUF916: Bacterial protein of unknown function (DUF916); InterPro: IPR010317 This family consists of putative cell surface proteins, from Firmicutes, of unknown function. Probab=23.37 E-value=17 Score=13.93 Aligned_cols=62 Identities=19% Similarity=0.296 Sum_probs=37.9 Q ss_pred CCCCCCEEEEEEEEEECCCCCCCCEEEEEEE-------CC--------EE------------EEEEEC-CCCCCCCEEEE Q ss_conf 7889971799988885579756863899983-------99--------37------------662464-77389964889 Q T0541 14 TVVPNTVNTMTATIENQGNKDSTSFNVSLLV-------DG--------IV------------VDTQTV-TSLESENSTNV 65 (106) Q Consensus 14 ~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~-------~g--------~~------------~~~~~v-~~L~~G~s~tv 65 (106) ...+|+..++.+.|+|.....- ++..++ +| .. +....- -.|+|+++.++ T Consensus 22 ~~~P~q~qtl~v~v~N~t~~~i---tv~v~~~~A~Tn~nG~idY~~~~~~~d~sl~~~~~~~v~~~~~~Vtl~~~~sk~V 98 (122) T PF06030_consen 22 KVKPGQTQTLQVRVTNNTDKPI---TVKVSANNATTNDNGVIDYSPSTKKKDSSLKYPFSDLVKIPKEEVTLPANSSKTV 98 (122) T ss_pred EECCCCEEEEEEEEECCCCCCE---EEEEEEEEEEECCCEEEEECCCCCCCCCCCCCCHHHHCCCCCCEEEECCCCEEEE T ss_conf 9689995999999992899968---9999971657568878996678877464348467996126887699899987999 Q ss_pred EEEEEECCC--CCCE Q ss_conf 886421777--8964 Q T0541 66 DFHWTLDGT--ANSY 78 (106) Q Consensus 66 ~~~~~~~~~--~G~~ 78 (106) .+....+.. .|.+ T Consensus 99 ~~~lk~P~~~f~G~i 113 (122) T PF06030_consen 99 TFTLKMPKKAFDGVI 113 (122) T ss_pred EEEEECCCCCCCCEE T ss_conf 999986887779779 No 38 >PF07732 Cu-oxidase_3: Multicopper oxidase; InterPro: IPR011707 Copper is one of the most prevalent transition metals in living organisms and its biological function is intimately related to its redox properties. Since free copper is toxic, even at very low concentrations, its homeostasis in living organisms is tightly controlled by subtle molecular mechanisms. In eukaryotes, before being transported inside the cell via the high-affinity copper transporters of the CTR family, the copper (II) ion is reduced to copper (I). In blue copper proteins such as cupredoxin, the copper (I) ion form is stabilised by a constrained His2Cys coordination environment. Multicopper oxidases oxidise their substrate by accepting electrons at a mononuclear copper centre and transferring them to a trinuclear copper centre; dioxygen binds to the trinuclear centre and, following the transfer of four electrons, is reduced to two molecules of water . There are three spectroscopically different copper centres found in multicopper oxidases: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear) , . Multicopper oxidases consist of 2, 3 or 6 of these homologous domains, which also share homology to the cupredoxins azurin and plastocyanin. Structurally, these domains consist of a cupredoxin-like fold, a beta-sandwich consisting of 7 strands in 2 beta-sheets, arranged in a Greek-key beta-barrel . Multicopper oxidases include: Ceruloplasmin (1.16.3.1 from EC) (ferroxidase), a 6-domain enzyme found in the serum of mammals and birds that oxidizes different inorganic and organic substances; exhibits internal sequence homology that appears to have evolved from the triplication of a Cu-binding domain similar to that of laccase and ascorbate oxidase. Laccase (1.10.3.2 from EC) (urishiol oxidase), a 3-domain enzyme found in fungi and plants, which oxidizes different phenols and diamines. CueO is a laccase found in Escherichia coli that is involved in copper-resistance . Ascorbate oxidase (1.10.3.3 from EC), a 3-domain enzyme found in higher plants. Nitrite reductase (1.7.2.1 from EC), a 2-domain enzyme containing type-1 and type-2 copper centres , . In addition to the above enzymes there are a number of other proteins that are similar to the multi-copper oxidases in terms of structure and sequence, some of which have lost the ability to bind copper. These include: copper resistance protein A (copA) from a plasmid in Pseudomonas syringae; domain A of (non-copper binding) blood coagulation factors V (Fa V) and VIII (Fa VIII) ; yeast FET3 required for ferrous iron uptake ; yeast hypothetical protein YFL041w; and the fission yeast homologue SpAC1F7.08. This entry represents multicopper oxidase type 3 (or coupled binuclear) domains. ; GO: 0005507 copper ion binding, 0016491 oxidoreductase activity; PDB: 2hrh_A 2hrg_A 3fpx_A 2hzh_A 1kya_C 1gyc_A 1v10_A 1hfu_A 1a65_A 1gw0_A .... Probab=22.33 E-value=18 Score=13.82 Aligned_cols=66 Identities=15% Similarity=0.091 Sum_probs=34.3 Q ss_pred CEEEEEEEEEECCCCCCCCEEEEEEECCE--EEEE--EECCCCCCCCEEEEEEEEEECCCCCCEEEEEEECC Q ss_conf 71799988885579756863899983993--7662--46477389964889886421777896489999759 Q T0541 19 TVNTMTATIENQGNKDSTSFNVSLLVDGI--VVDT--QTVTSLESENSTNVDFHWTLDGTANSYTLTVNVDP 86 (106) Q Consensus 19 ~~~tv~vtV~N~G~~~a~~~~v~~y~~g~--~~~~--~~v~~L~~G~s~tv~~~~~~~~~~G~~ti~v~vD~ 86 (106) +.=++.++++|.......=..-.+..... .-+. .....++||++.+..|... .++|+|-..--+.. T Consensus 33 ~Gd~v~i~~~N~l~~~~siH~HG~~~~~~~~~DG~~~~~~~~i~pG~s~~y~~~~~--~~~Gt~wYH~h~~~ 102 (119) T PF07732_consen 33 EGDTVRITLTNRLDEPTSIHWHGLHQPPASWMDGVPGVTQCPIAPGESFTYRFTAP--QQAGTYWYHSHVHG 102 (119) T ss_dssp TTEEEEEEEEEESSSSBSEEEETEBTCTTGGG---TTTSTSCBBTTEEEEEEEEES--STSEEBEEEE-STT T ss_pred CCCEEEEEEECCCCCCEEEEEECEEEEECCCCCCCEECEECCCCCCCEEEEECCCC--CCCCCEEEECCCCC T ss_conf 99989999990886521046501067532427886411010259996578403356--63201487037981 No 39 >PF00345 Pili_assembly_N: Gram-negative pili assembly chaperone, N-terminal domain; InterPro: IPR016147 Most Gram-negative bacteria possess a supramolecular structure - the pili - on their surface, which mediates attachment to specific receptors. Many interactive subunits are required to assemble pili, but their assembly only takes place after translocation across the cytoplasmic membrane. Periplasmic chaperones assist pili assembly by binding to the subunits, thereby preventing premature aggregation , . Pili chaperones are structurally, and possibly evolutionarily, related to the immunoglobulin superfamily , : they contain two globular domains, with a topology identical to an immunoglobulin fold. This entry represents the N-terminal domain of pili assembly chaperone, and has a beta-sandwich fold consisting of seven strands in two sheets with a Greek key topology.; GO: 0005515 protein binding, 0007047 cell wall organization and biogenesis, 0030288 outer membrane-bounded periplasmic space; PDB: 1l4i_A 1kiu_I 1klf_G 1qun_K 3bwu_C 1bf8_A 1ze3_C 3f65_H 3f6l_A 3f6i_A .... Probab=22.25 E-value=18 Score=13.81 Aligned_cols=86 Identities=14% Similarity=0.074 Sum_probs=44.5 Q ss_pred CCEEEEEEEEEECCCCCCCCEEEEEEECCE---E------EEEEECCCCCCCCEEEEEEEEEECC-CCCCEEEEEEECCC Q ss_conf 971799988885579756863899983993---7------6624647738996488988642177-78964899997599 Q T0541 18 NTVNTMTATIENQGNKDSTSFNVSLLVDGI---V------VDTQTVTSLESENSTNVDFHWTLDG-TANSYTLTVNVDPE 87 (106) Q Consensus 18 g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~---~------~~~~~v~~L~~G~s~tv~~~~~~~~-~~G~~ti~v~vD~~ 87 (106) ++.-..+++|+|.|.... -..+.++.++. . +.+=..-.|+||++.++.+...... ..+++.+.+.++.- T Consensus 13 ~~~~~~~~~v~N~~~~~~-~vq~~v~~~~~~~~~~~~~~fiv~Pp~~~L~p~~~q~vRi~~~~~lp~d~E~~y~l~~~~I 91 (122) T PF00345_consen 13 EDQRSASVTVTNNSDEPY-LVQVWVDDGDEEDEDEPTDPFIVTPPLFRLEPGESQTVRIYRGNPLPQDRESLYRLNFREI 91 (122) T ss_dssp TT-SEEEEEEEESSSS-E-EEEEEEEETTSTTCECSS-SEEEESSEEEE-TTEEEEEEEEEGGGS-SSS-EEEEEEEEEE T ss_pred CCCCEEEEEEEECCCCCE-EEEEEEEECCCCCCCCCCCCEEEECCCEEECCCCCEEEEEEECCCCCCCCCEEEEEEEEEC T ss_conf 899778999994989949-9999997325676766645389829607858998189999818999988128999999963 Q ss_pred CCEEECCCCCCEEEEEE Q ss_conf 95841217885588884 Q T0541 88 NAVNEGNESNNTLTALV 104 (106) Q Consensus 88 n~v~E~~e~NN~~t~~v 104 (106) -...+.++..|.+...+ T Consensus 92 P~~~~~~~~~~~l~v~~ 108 (122) T PF00345_consen 92 PPKEEESKEGNQLQVAI 108 (122) T ss_dssp ESCCTT-SSSSEEEEEE T ss_pred CCCCCCCCCCCEEEEEE T ss_conf 89754566553589987 No 40 >PF11896 DUF3416: Domain of unknown function (DUF3416) Probab=21.01 E-value=19 Score=13.67 Aligned_cols=84 Identities=14% Similarity=0.122 Sum_probs=49.9 Q ss_pred CCCCEEEEEEECC--------CCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEEEECCCCCCCCEEEEEEEEEEC Q ss_conf 9986078660258--------78899717999888855797568638999839937662464773899648898864217 Q T0541 1 MIPDLVPVSLTPV--------TVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDTQTVTSLESENSTNVDFHWTLD 72 (106) Q Consensus 1 ~lPDL~v~~i~p~--------~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~~~v~~L~~G~s~tv~~~~~~~ 72 (106) |-|-+.|.+++|. ....|+.+.+.+.|-=-|...... .+.|.-.+... +..++ |.+...--....+.+. T Consensus 1 m~~Ri~Ie~V~P~Vd~GrfpaKrvvGe~v~V~AdVf~dGHD~l~A-~l~~r~~~~~~-w~~vp-M~~~gnDrW~a~f~~~ 77 (187) T PF11896_consen 1 MRGRIVIEDVSPEVDGGRFPAKRVVGEPVPVEADVFRDGHDALAA-ELVWRPPDGRE-WHEVP-MRPLGNDRWEASFTPD 77 (187) T ss_pred CCCCEEEEECCEEECCCCCEEEEEECCCEEEEEEEECCCCHHHEE-EEEEECCCCCC-CEEEC-CCCCCCCEEEEEEECC T ss_conf 998567862345034984511576089169999998258435427-99998899973-23676-7379898879999689 Q ss_pred CCCCCEE--EEEEECCCC Q ss_conf 7789648--999975999 Q T0541 73 GTANSYT--LTVNVDPEN 88 (106) Q Consensus 73 ~~~G~~t--i~v~vD~~n 88 (106) ..|.|. |.+..|+.. T Consensus 78 -~~G~~~f~VeAW~D~f~ 94 (187) T PF11896_consen 78 -RPGRYEFRVEAWSDPFA 94 (187) T ss_pred -CCEEEEEEEEEEECCHH T ss_conf -96638999999965388 No 41 >PF09118 DUF1929: Domain of unknown function (DUF1929); InterPro: IPR015202 This domain adopts a secondary structure consisting of a bundle of seven, mostly antiparallel, beta-strands surrounding a hydrophobic core. The 7 strands are arranged in 2 sheets, in a Greek-key topology. Their precise function, has not, as yet, been defined, though they are mostly found in sugar-utilising enzymes, such as galactose oxidase . ; PDB: 2eid_A 1t2x_A 1gog_A 2vz3_A 1k3i_A 1gof_A 2eic_A 2jkx_A 2eib_A 2vz1_A .... Probab=20.88 E-value=19 Score=13.65 Aligned_cols=70 Identities=23% Similarity=0.258 Sum_probs=36.1 Q ss_pred ECCCCCCCCEEEEEEEEEECCCCCCCCEEEEEEECCEEEEE-----EECCCCCC--CCEEEEEEEEEEC---CCCCCEEE Q ss_conf 25878899717999888855797568638999839937662-----46477389--9648898864217---77896489 Q T0541 11 TPVTVVPNTVNTMTATIENQGNKDSTSFNVSLLVDGIVVDT-----QTVTSLES--ENSTNVDFHWTLD---GTANSYTL 80 (106) Q Consensus 11 ~p~~~~~g~~~tv~vtV~N~G~~~a~~~~v~~y~~g~~~~~-----~~v~~L~~--G~s~tv~~~~~~~---~~~G~~ti 80 (106) .|.....|+.++++++... .+....+.++-.+....+ +.+ .|.. +...+++++.++. +++|.|-+ T Consensus 7 ~p~~i~yg~~ftv~~~~~~----~~~~~~v~L~~~~~~THs~~~~QR~v-~L~~~~~~~~~~~v~~Pp~~~vaPPG~YmL 81 (97) T PF09118_consen 7 APSTISYGQTFTVTFTVPG----TAGIVKVTLVRPGFVTHSFNMGQRRV-PLEFSSGGGYSVTVTAPPNPNVAPPGYYML 81 (97) T ss_dssp S-SEEE----EEEEE--SS-------ESEEEEEE---EETTB-SS-EEE-EE-EEE----EEEE---S---------EEE T ss_pred CCCEEECCCEEEEEEECCC----CCCCEEEEEEECCCCCCCCCCCCCEE-ECEEECCCCCEEEEECCCCCCCCCCCCEEE T ss_conf 8876844999999997687----56515999991885330677986679-551246778799998799987279867799 Q ss_pred EEEEC Q ss_conf 99975 Q T0541 81 TVNVD 85 (106) Q Consensus 81 ~v~vD 85 (106) .++.+ T Consensus 82 Fvv~~ 86 (97) T PF09118_consen 82 FVVNN 86 (97) T ss_dssp EEEE- T ss_pred EEECC T ss_conf 99879 Done!